Compare commits

...

99 Commits

Author SHA1 Message Date
hsiegeln
c5b6f2bbad fix(dirty-state): exclude live-pushed fields from deploy diff
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
SonarQube / sonarqube (push) Successful in 4m17s
Live-pushed config fields (taps, tapVersion, tracedProcessors,
routeRecording) apply via SSE CONFIG_UPDATE — they take effect on
running agents without a redeploy and are fetched on agent restart
from application_config. They must not contribute to the
"pending deploy" diff against the last-successful-deployment snapshot.

Before this fix, applying a tap from the process diagram correctly
rolled out in real time but then marked the app "Pending Deploy (1)"
because DirtyStateCalculator compared every agentConfig field. This
also contradicted the UI rule (ui.md) that the live tabs "never mark
dirty".

Adds taps, tapVersion, tracedProcessors, routeRecording to
AGENT_CONFIG_IGNORED_KEYS. Updates the nested-path test to use a
staged field (sensitiveKeys) and adds a new test asserting that
divergent live-push fields keep dirty=false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:42:07 +02:00
83c3ac3ef3 Merge pull request 'feat(ui): show deployment status + rich pending-deploy tooltip on app header' (#151) from feature/deployment-status-badge into main
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m20s
CI / docker (push) Successful in 23s
CI / deploy (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
Reviewed-on: #151
2026-04-24 13:50:00 +02:00
7dd7317cb8 Merge branch 'main' into feature/deployment-status-badge
Some checks failed
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m7s
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m6s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 1m48s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 2m19s
2026-04-24 13:49:51 +02:00
2654271494 Merge pull request 'feature/cmdk-attribute-filter' (#150) from feature/cmdk-attribute-filter into main
Some checks failed
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
CI / build (push) Has been cancelled
Reviewed-on: #150
2026-04-24 13:49:24 +02:00
hsiegeln
888f589934 feat(ui): show deployment status + rich pending-deploy tooltip on app header
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / docker (push) Successful in 1m12s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m6s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
Add a StatusDot + colored Badge next to the app name in the deployment
page header, showing the latest deployment's status (RUNNING / STARTING
/ FAILED / STOPPED / DEGRADED / STOPPING). The existing "Pending
deploy" badge now carries a tooltip explaining *why*: either a list of
local unsaved edits, or a per-field diff against the last successful
deploy's snapshot (field, staged vs deployed values). When server-side
differences exist, the badge shows the count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:47:04 +02:00
hsiegeln
9aad2f3871 docs(rules): document AttributeFilter + SearchController attr param
All checks were successful
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m50s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:22:27 +02:00
hsiegeln
cbaac2bfa5 feat(cmdk): Enter on 'key: value' query submits as attribute facet 2026-04-24 11:21:12 +02:00
hsiegeln
7529a9ce99 feat(cmdk): synthetic facet result when query matches key: value 2026-04-24 11:18:13 +02:00
hsiegeln
09309de982 fix(cmdk): attribute clicks filter the exchange list via ?attr= instead of opening one exchange 2026-04-24 11:13:28 +02:00
hsiegeln
56c41814fc fix(ui): gate AUTO badge on attributeFilters too 2026-04-24 11:11:26 +02:00
hsiegeln
68704e15b4 feat(ui): exchange list reads ?attr= URL params and renders filter chips
(carries forward pre-existing attribute-badge color-by-key tweak)
2026-04-24 11:05:50 +02:00
hsiegeln
510206c752 feat(ui): add attribute-filter URL and facet parsing helpers 2026-04-24 10:58:35 +02:00
hsiegeln
58e9695b4c chore(ui): regenerate openapi types with AttributeFilter 2026-04-24 10:39:45 +02:00
hsiegeln
f27a0044f1 refactor(search): align ResponseStatusException imports + add wildcard HTTP test 2026-04-24 10:30:42 +02:00
hsiegeln
5c9323cfed feat(search): accept attr= multi-value query param on /executions GET
Add a repeatable attr query parameter to the GET /executions endpoint that
parses key-only (exists check) and key:value (exact or wildcard-via-*)
filters. Invalid keys are mapped to HTTP 400 via ResponseStatusException.
The POST /executions/search path already honoured attributeFilters from
the request body via the Jackson canonical ctor; an IT now proves it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:23:52 +02:00
hsiegeln
2dcbd5a772 feat(search): push AttributeFilter list into ClickHouse WHERE clause
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:13:30 +02:00
hsiegeln
f9b5f235cc feat(search): extend SearchRequest with attributeFilters (legacy ctor preserved) 2026-04-24 09:59:05 +02:00
hsiegeln
0b419db9f1 feat(search): add AttributeFilter record with key regex + wildcard pattern translation 2026-04-24 09:51:28 +02:00
hsiegeln
5f6f9e523d chore(gitnexus): sync indexed symbol count
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:20:25 +02:00
hsiegeln
35319dc666 refactor(ui): server metrics page uses global time range
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Drop the page-local DS Select window picker. Drive from() / to() off
useGlobalFilters().timeRange so the dashboard tracks the same TopBar range
as Exchanges / Dashboard / Runtime. Bucket size auto-scales via
stepSecondsFor(windowSeconds) (10 s for ≤30 min → 1 h for >48 h). Query
hooks now take ServerMetricsRange = { from: Date; to: Date } instead of a
windowSeconds number, so they support arbitrary absolute or rolling ranges
the TopBar may supply (not just "now − N"). Toolbar collapses to just the
server-instance badges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:19:20 +02:00
hsiegeln
3c2409ed6e docs(server-metrics): document the built-in admin dashboard
SERVER-CAPABILITIES.md now lists the two consumption paths (UI + REST API)
side-by-side with visibility rules; the dashboard-builder doc leads with a
"Built-in admin dashboard" section and a 2026-04-24 changelog entry so
first-time readers know they don't have to build anything before seeing
server health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:05:22 +02:00
hsiegeln
ca401363ec chore(gitnexus): sync indexed symbol count
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m16s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 45s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:01:48 +02:00
hsiegeln
b5ee9e1d1f feat(ui): server metrics admin dashboard
Adds /admin/server-metrics page mirroring the Database/ClickHouse visibility
rules: sidebar entry gated on capabilities.infrastructureEndpoints, backend
controller now has @ConditionalOnProperty(infrastructureendpoints) and
class-level @PreAuthorize('hasRole(ADMIN)'). Dashboard panels are driven
from docs/server-self-metrics.md via the generic
/api/v1/admin/server-metrics/{catalog,instances,query} API — Server Health,
JVM, HTTP & DB pools, and conditionally Alerting + Deployments when their
metrics appear in the catalog. ThemedChart / Line / Area from the design
system; hooks in ui/src/api/queries/admin/serverMetrics.ts. Not yet
browser-verified against a running dev server — backend IT covers the API
end-to-end (8 tests), UI typecheck + production bundle both clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:00:14 +02:00
hsiegeln
75a41929c4 chore(gitnexus): sync indexed symbol count
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m34s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
SonarQube / sonarqube (push) Successful in 4m54s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:42:26 +02:00
hsiegeln
d58c8cde2e feat(server): REST API over server_metrics for SaaS dashboards
Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control
planes can build the server-health dashboard without direct ClickHouse
access. One generic /query endpoint covers every panel in the
server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag,
filter-by-tag, counter-delta mode with per-server_instance_id rotation
handling, and a derived 'mean' statistic for timers. Regex-validated
identifiers, parameterised literals, 31-day range cap, 500-series response
cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated:
all 17 suggested panels now expressed as single-endpoint queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:41:02 +02:00
hsiegeln
64608a7677 chore(gitnexus): sync indexed symbol count
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m18s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:22:20 +02:00
hsiegeln
48ce75bf38 feat(server): persist server self-metrics into ClickHouse
Snapshot the full Micrometer registry (cameleer business metrics, alerting
metrics, and Spring Boot Actuator defaults) every 60s into a new
server_metrics table so server health survives restarts without an external
Prometheus. Includes a dashboard-builder reference for the SaaS team.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:20:45 +02:00
hsiegeln
0bbe5d6623 chore(gitnexus): sync indexed symbol count
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:18:49 +02:00
hsiegeln
e1ac896a6e chore(gitnexus): refresh indexed symbol count
Second analyze pass after pushing showed a slightly different symbol
count. Counts-only bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:17:45 +02:00
hsiegeln
58009d7c23 chore(gitnexus): refresh indexed symbol/relationship counts
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Auto-bumped by `npx gitnexus analyze --embeddings` after the diagram
refactor landed. No content changes — counts only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:15:08 +02:00
hsiegeln
b799d55835 fix(ui): sidebar catalog counts follow global time range
useCatalog now accepts optional from/to query params and LayoutShell
threads the TopBar time range through, so the per-app exchange counts
shown in the sidebar align with the Exchanges tab window. Previously
the sidebar relied on the backend's 24h default — 73.5k in the sidebar
coexisted with 0 hits in a 1h Exchanges search, confusing users.

Other useCatalog callers stay on the default (no time range), matching
their existing behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:15:01 +02:00
hsiegeln
166568edea fix(ui): preserve environment selection across logout
handleLogout explicitly cleared the env from localStorage, forcing the
env switcher modal to re-open on every login. Drop that clear so the
last selected env is restored from localStorage on the next session —
the expected behavior for a personal-preference store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:14:30 +02:00
hsiegeln
f049a0a6a0 docs(rules): capture new DiagramStore method and registry-free lookup
- app-classes: DiagramRenderController by-route endpoint no longer
  depends on the agent registry; points at findLatestContentHashForAppRoute
  and cross-refs the exchange viewer's content-hash path.
- core-classes: document the new DiagramStore method and note why the
  agent-scoped findContentHashForRoute stays for the ingest path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:11:45 +02:00
hsiegeln
f8e382c217 test(diagrams): add removed-route + point-in-time coverage
Store-level: assert findLatestContentHashForAppRoute picks the newest
hash across publishing instances (proves the lookup survives agent
removal), isolates by (app, env), and returns empty for blank inputs.

Controller-level: assert the env-scoped /routes/{routeId}/diagram
endpoint resolves without a registry prerequisite, 404s for unknown
routes, and that an execution's stored diagramContentHash stays pinned
to the point-in-time version after a newer diagram is stored — the
"latest" endpoint flips to v2, the by-hash render remains byte-stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:11:06 +02:00
hsiegeln
c7e5c7fa2d refactor(diagrams): retire findContentHashForRouteByAgents
All production callers migrated to findLatestContentHashForAppRoute in
the preceding commits. The agent-scoped lookup adds no coverage beyond
the latest-per-(app,env,route) resolver, so the dead API is removed
along with its test coverage and unused imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:02:47 +02:00
hsiegeln
0995ab35c4 fix(catalog): preserve fromEndpointUri for removed routes
Both catalog controllers resolved the from-endpoint URI via
findContentHashForRouteByAgents, which filtered by the currently-live
agent instance_ids. Routes removed between app versions therefore lost
their fromUri even though the diagram row still exists.

Route through findLatestContentHashForAppRoute so resolution depends
only on (app, env, route) — stays populated for historical routes.
CatalogController now resolves the per-row env slug up-front so the
fromUri lookup works even for cross-env queries against managed apps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:01:19 +02:00
hsiegeln
480a53c80c fix(diagrams): by-route lookup no longer requires live agents
The env-scoped /routes/{routeId}/diagram endpoint filtered diagrams by
the currently-live agent instance_ids. Routes removed between app
versions have no live publisher, so the lookup returned 404 even though
the historical diagram row still exists in route_diagrams. Sidebar
entries for removed routes showed "no diagram" as a result.

Switch to findLatestContentHashForAppRoute which resolves directly off
(applicationId, environment, routeId) + created_at DESC, independent of
the agent registry. The controller no longer depends on
AgentRegistryService.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:59:43 +02:00
hsiegeln
d3ce5e861b feat(diagrams): add findLatestContentHashForAppRoute with app-route cache
Agent-scoped lookups miss diagrams from routes whose publishing agents
have been redeployed or removed. The new method resolves by
(applicationId, environment, routeId) + created_at DESC, independent of
the agent registry. An in-memory cache mirrors the existing hashCache
pattern, warm-loaded at startup via argMax.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:58:49 +02:00
hsiegeln
e5c8fff0f9 docs(HOWTO): document CAMELEER_SERVER_RUNTIME_CERTRESOLVER env var
Added the new Traefik TLS cert resolver setting to the runtime env var
table. Blank default matches how ACME-less dev/local installs want the
`tls.certresolver` label omitted entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:22:27 +02:00
hsiegeln
21db92ff00 fix(traefik): make TLS cert resolver configurable, omit when unset
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m15s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on
every router. That assumes a resolver literally named `default` exists
in the Traefik static config — true for ACME-backed installs, false for
dev/local installs that use a file-based TLS store. Traefik logs
"Router uses a nonexistent certificate resolver" for the bogus resolver
on every managed app, and any future attempt to define a differently-
named real resolver would silently skip these routers.

Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by
default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver`
into `ResolvedContainerConfig.certResolver`. When blank the
`tls.certresolver` label is omitted entirely; `tls=true` is still
emitted so Traefik serves the default TLS-store cert. When set, the
label is emitted with the configured resolver name.

Not per-app/per-env configurable: there is one Traefik per server
instance and one resolver config; app-level override would only let
users break their own routers.

TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank).
Full unit suite 211/0/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:18:47 +02:00
hsiegeln
165c9f10e3 feat(deploy): externalRouting toggle to keep apps off Traefik
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Adds a boolean `externalRouting` flag (default `true`) on
ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only
the identity labels (`managed-by`, `cameleer.*`) and skips every
`traefik.*` label, so the container is not published by Traefik.
Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}`
can still reach it via Docker DNS on whatever port the app listens on.

TDD: new TraefikLabelBuilderTest covers enabled (default labels present),
disabled (zero traefik.* labels), and disabled (identity labels retained)
cases. Full module unit suite: 208/0/0.

Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form
state, Resources tab toggle, POST payload, and snapshot-to-form mapping.
Rule files updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:03:48 +02:00
hsiegeln
ade1733418 ui(deploy): remove Exposed Ports field from Resources tab
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m25s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
The field was cosmetic — `containerConfig.exposedPorts` only fed Docker's
`Config.ExposedPorts` metadata via `withExposedPorts(...)`. It never
published a host port and Traefik routing uses `appPort` from the label
builder, not this list. Users reading the label "Exposed Ports" reasonably
expected it to expose their port externally; removing it until real
multi-port Traefik routing lands (tracked in #149).

Backend DTOs (`ContainerRequest.exposedPorts`, `ConfigMerger.intList
("exposedPorts")`) are left in place so existing containerConfig JSONB
rows continue to deserialize. New writes from the UI will no longer
include the field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:51:46 +02:00
hsiegeln
0cf64b2928 fix(audit): exclude env-scoped executions/search from safety-net log
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
The exclusion list still named the legacy flat `/api/v1/search/executions`
URL, which no longer exists — the endpoint moved to env-scoped
`/api/v1/environments/{envSlug}/executions/search`. Exact-match Set
lookup never matched, so every UI search POST produced an audit row.

Switch to AntPathMatcher over a pattern list so the dynamic envSlug is
handled correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:35:44 +02:00
hsiegeln
0fc9c8cb4c docs(rules): checkpoints live inside Identity grid; HistoryDisclosure retired
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:15:05 +02:00
hsiegeln
fe4a6dbf24 ui(deploy): remove redundant HistoryDisclosure from Deployment tab 2026-04-23 17:13:45 +02:00
hsiegeln
9cfe3985d0 refactor(ui): route CheckpointsTable via IdentitySection.checkpointsSlot 2026-04-23 17:12:12 +02:00
hsiegeln
18da187960 refactor(ui): checkpoints in-grid styles + drop retired row-list/history CSS 2026-04-23 17:10:42 +02:00
hsiegeln
9c1bd24f16 test(ui): CheckpointsTable covers fragment layout + locale sub-line 2026-04-23 17:08:57 +02:00
hsiegeln
177673ba62 feat(ui): CheckpointsTable emits grid fragment + locale sub-line 2026-04-23 17:03:31 +02:00
hsiegeln
77f5c82dfe feat(ui): IdentitySection accepts checkpointsSlot rendered inside configGrid 2026-04-23 17:01:52 +02:00
hsiegeln
663a6624a7 docs(plan): checkpoints grid row + locale time + remove History (7 TDD tasks)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:54:42 +02:00
hsiegeln
cc3cd610b2 docs(spec): checkpoints into identity grid + locale time + remove History
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:51:08 +02:00
hsiegeln
b6239bdb6b docs(rules): reflect deployment page polish (upload-in-button, sort/refresh, collapsible checkpoints, DS Select, tab reorder)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:16:52 +02:00
hsiegeln
0ae27ad9ed ui(drawer): reorder tabs Config first, default to Config 2026-04-23 16:15:29 +02:00
hsiegeln
e00848dc65 refactor(ui): drawer replica filter uses DS Select 2026-04-23 16:13:54 +02:00
hsiegeln
f31975e0ef feat(ui): checkpoints table collapsible, default collapsed 2026-04-23 16:09:28 +02:00
hsiegeln
2c0cf7dc9c fix(ui): StartupLogPanel — defensive scrollTo + disable buttons while fetching 2026-04-23 16:05:35 +02:00
hsiegeln
fb7b15f539 feat(ui): startup logs — sort toggle + refresh button + desc default 2026-04-23 16:00:44 +02:00
hsiegeln
1d7009d69c feat(ui): useStartupLogs accepts sort parameter (default desc) 2026-04-23 15:58:02 +02:00
hsiegeln
99a91a57be feat(ui): wire JAR upload progress into the primary action button 2026-04-23 15:54:23 +02:00
hsiegeln
427988bcc8 feat(ui): PrimaryActionButton gains uploading mode + progress overlay 2026-04-23 15:49:27 +02:00
hsiegeln
a208f2eec7 feat(ui): useUploadJar uses XHR and exposes onProgress 2026-04-23 15:44:50 +02:00
hsiegeln
13f218d522 docs(plan): deployment page polish (9 TDD tasks)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:42:06 +02:00
hsiegeln
900fba5af6 docs(spec): deployment page polish (upload-in-button, sort/refresh, collapsible checkpoints, DS Select, tab reorder)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:57 +02:00
hsiegeln
b3d1dd377d ui(deploy): hide CheckpointsTable when no past deployments exist
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:34:09 +02:00
hsiegeln
e36c82c4db test(deploy): scope schema ITs to current_schema + clear deployments FK in teardown
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Surface from the Task 0 testcontainers.reuse enable: when the same Postgres
container is reused across `mvn verify` runs, Flyway migrates both `public`
and `tenant_default` schemas (the app.yml default URL uses
?currentSchema=tenant_default; AbstractPostgresIT overrides to public).
Schema-introspection assertions saw duplicate rows/indexes/enums.

Plus: OutboundConnectionAdminControllerIT's AfterEach couldn't delete its
test users because sibling deployment ITs (Task 4) left deployments.created_by
references — FK blocks the DELETE. Clear referencing deployments first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:06:56 +02:00
hsiegeln
d192f6b57c docs(rules): deployment audit + checkpoints table + SideDrawer + log instanceIds
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:51:22 +02:00
hsiegeln
fe1681e6e8 ui(audit): surface DEPLOYMENT category in admin filter dropdown 2026-04-23 13:49:31 +02:00
hsiegeln
571f85cd0f feat(ui): wire CheckpointsTable + Drawer into IdentitySection (delete old Checkpoints)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:46:31 +02:00
hsiegeln
25d2a3014a refactor(ui): DiffView CSS module + drop duplicate snapshot type 2026-04-23 13:43:15 +02:00
hsiegeln
1a97e2146e feat(ui): ConfigPanel snapshot+diff modes; extract snapshotToForm helper
- Extract inline handleRestore mapping into snapshotToForm(snapshot, defaults) helper
- Export defaultForm from useDeploymentPageState for use in ConfigPanel
- Replace ConfigPanel stub with real read-only snapshot renderer + Snapshot/Diff toggle
- Add fieldDiff deep-equal field-walk helper with nested object + array support
- Forward optional currentForm prop through CheckpointDetailDrawer to ConfigPanel
- 13 new tests across diff.test.ts, snapshotToForm.test.ts, ConfigPanel.test.tsx (all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:38:22 +02:00
hsiegeln
d1150e5dd8 refactor(ui): drawer CSS module + narrow LogsPanel memo deps
Extract 14 inline style blocks from CheckpointDetailDrawer index.tsx and
LogsPanel.tsx into a shared CSS module using DS CSS variables throughout.
Narrow the LogsPanel useMemo dep array from the full deployment object to
deployment.id + deployment.replicaStates to prevent spurious query
invalidation on every TanStack Query poll.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:30:48 +02:00
hsiegeln
b0995d84bc feat(ui): CheckpointDetailDrawer container + LogsPanel
Adds the CheckpointDetailDrawer with Logs/Config tabs. LogsPanel scopes
logs to a deployment's replicas via instanceIds derived from replicaStates
+ generation suffix. Stub ConfigPanel placeholder for Task 11.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:25:55 +02:00
hsiegeln
9756a20223 fix(ui): dim archived checkpoint rows + safer outcome class lookup + cleaner cap 2026-04-23 13:19:06 +02:00
hsiegeln
1b4b522233 feat(ui): CheckpointsTable component (replaces row list)
Full-width table with Version / JAR / Deployed-by / Deployed / Strategy /
Outcome columns, pagination cap (jarRetentionCount, default 10), pruned-JAR
archived state, empty state, and row-click onSelect handler. 8/8 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:15:30 +02:00
hsiegeln
48217e0034 test(deploy): contract test — ConfigTabs disabled gates all inputs 2026-04-23 13:10:17 +02:00
hsiegeln
c3ecff9d45 feat(ui): add SideDrawer component (project-local)
Right-sliding panel with portal, ESC + backdrop close, sticky header/footer,
three width sizes (md/lg/xl), transparent click-blocking backdrop, and DS token colors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:05:36 +02:00
hsiegeln
07099357af chore(api): regenerate UI types — Deployment.createdBy + logs instanceIds
- Fetched fresh openapi.json from local backend (Tasks 3-5 changes)
- Regenerated schema.d.ts via openapi-typescript
- Added createdBy: string | null to Deployment interface in apps.ts
- Added instanceIds?: string[] to UseInfiniteApplicationLogsArgs with sort/serialize/queryKey/URLSearchParams wiring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:00:16 +02:00
hsiegeln
ed0e616109 refactor(logs): drop dead null guards on instanceIds filter (record normalizes) 2026-04-23 12:52:18 +02:00
hsiegeln
382e1801a7 feat(logs): add instanceIds multi-value filter to /logs endpoint
Adds List<String> instanceIds to LogSearchRequest (null-normalized to
List.of() in compact ctor) and generates an IN clause in both
ClickHouseLogStore.search() and countLogs(), mirroring the existing
sources pattern. LogQueryController parses ?instanceIds= as a
comma-split list. All existing LogSearchRequest call sites updated.
New ClickHouseLogStoreInstanceIdsIT covers: multi-value filter, empty
filter (all rows), null filter (all rows), single-value filter, and
coexistence with the singular instanceId field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 12:41:09 +02:00
hsiegeln
2312a7304d fix(deploy): widen promote FAILURE audit detail + clean up test envs 2026-04-23 12:29:46 +02:00
hsiegeln
47d5611462 feat(audit): audit deploy/stop/promote with DEPLOYMENT category
Wires AuditService and AppVersionRepository into DeploymentController.
Replaces null createdBy placeholder with currentUserId() on createDeployment/promote.
Adds audit log entries (SUCCESS + FAILURE) for deploy_app, stop_deployment,
and promote_deployment actions. Fixes FK violations in affected ITs by
seeding the test-operator and alice users into the users table before deploy calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 12:24:27 +02:00
hsiegeln
9043dc00b0 test(deploy): clean up seeded users + document null createdBy placeholder
Fix Issue 1: Add @AfterEach cleanup for alice/bob users in PostgresDeploymentRepositoryCreatedByIT to prevent test leakage (FK order: deployments -> app_versions -> apps, then users).

Fix Issue 2: Add comment at first create(..., null) call site in PostgresDeploymentRepositoryIT documenting the null placeholder for pre-V4 rows where createdBy is nullable.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-23 12:10:21 +02:00
hsiegeln
a141e99a07 feat(deploy): cascade createdBy through Deployment record + service + repo
Appends String createdBy to the Deployment record (after createdAt), updates
both with-er methods to pass it through, threads the parameter through
DeploymentRepository.create, DeploymentService.createDeployment/promote, and
PostgresDeploymentRepository (INSERT + SELECT_COLS + mapRow). DeploymentController
passes null as placeholder (Task 4 will resolve from SecurityContextHolder).
Covers with PostgresDeploymentRepositoryCreatedByIT verifying round-trip via
both createDeployment and promote.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 12:04:15 +02:00
hsiegeln
15d00f039c feat(audit): add DEPLOYMENT audit category 2026-04-23 11:51:28 +02:00
hsiegeln
064c302073 docs(plan): V2 → V4 migration filename (V2/V3 already taken) 2026-04-23 11:49:12 +02:00
hsiegeln
35748ea7a1 feat(deploy): V4 migration — add created_by to deployments 2026-04-23 11:44:05 +02:00
hsiegeln
e558494f8d plan(deploy): checkpoints table redesign + audit gap
15 tasks across 5 phases (backend foundation → SideDrawer →
ConfigTabs readOnly → CheckpointsTable + DetailDrawer → polish).
TDD throughout with per-task commits. Backend phase ships
independently to close the audit gap as quickly as possible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:39:11 +02:00
hsiegeln
1f0ab002d6 spec(deploy): checkpoints table redesign + deployment audit gap
Replaces the cramped Checkpoints disclosure with a real DataTable + a
side drawer (Logs / Config with snapshot/diff modes) and closes the
audit-log gap discovered in DeploymentController (deploy/stop/promote
currently make zero auditService.log calls).

Cap visible checkpoints at Environment.jarRetentionCount — beyond that,
JARs are pruned and rows aren't restorable. Logs scoped per-deployment
via instance_id IN (...) computed from replicaStates (no time window
needed). Compare folded into Config as a view-mode toggle. Two-phase
rollout (backend ships first to close the audit gap immediately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:31:50 +02:00
hsiegeln
242ef1f0af perf(build): faster Maven + UI + CI pipelines
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m43s
CI / docker (push) Successful in 4m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Maven: enable useIncrementalCompilation; Surefire forkCount=1C +
  reuseForks=true so unit-test JVMs are reused per CPU core instead of
  spawning per class (205 tests pass under the new strategy).
- Testcontainers: opt-in reuse via .withReuse(true) on Postgres +
  ClickHouse base; per-developer enable via ~/.testcontainers.properties.
- UI: drop redundant `tsc --noEmit` from `npm run build` (Vite already
  type-checks); split into a dedicated `npm run typecheck` script.
- CI: cache ~/.npm and ui/node_modules/.vite alongside Maven; npm ci with
  --prefer-offline --no-audit --fund=false; paths-ignore for docs-only,
  .planning/ and .claude/ changes so doc-only pushes skip the pipeline.
- Docs: CLAUDE.md + .claude/rules/cicd.md updated with the new build
  knobs and the Testcontainers reuse opt-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:48:34 +02:00
hsiegeln
c6aef5ab35 fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m4s
CI / docker (push) Successful in 1m15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment.
  STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty.
  Now only FAILED rows are pruned; STOPPED deployments are retained as restorable
  checkpoints (they still carry deployed_config_snapshot from their RUNNING window).
- UI filter: any deployment with a snapshot is a checkpoint (was RUNNING|DEGRADED only,
  which excluded the main case — the previous blue/green deployment now in STOPPED).
- UI placement: Checkpoints disclosure now renders inside IdentitySection, matching
  the design spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:26:46 +02:00
hsiegeln
007597715a docs(rules): deployment strategies + generation suffix
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m8s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 46s
Refresh the three rules files to match the new executor behavior:

- docker-orchestration.md: rewrite DeploymentExecutor Details with
  container naming scheme ({...}-{replica}-{generation}), strategy
  dispatch (blue-green vs rolling), and the new DEGRADED semantics
  (post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder
  bullets for the generation suffix + new cameleer.generation label.
- app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets
  mirror the same.
- core-classes.md: add DeploymentStrategy enum; note DEGRADED is now
  post-deploy-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:02:51 +02:00
hsiegeln
b6e54db6ec ui(deploy): strategy hint on Resources tab + indicator on StatusCard
Resources tab: add a hint under the Deploy Strategy dropdown that
explains the blue-green vs rolling trade-off (resource peak, failure
semantics), switching text based on the current selection.

StatusCard: show the active deployment's strategy inline in the info
grid so users can tell at a glance which path was taken for a given
deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:00:44 +02:00
hsiegeln
e9f523f2b8 test(deploy): blue-green + rolling strategy ITs
Four ITs covering strategy behavior:
- BlueGreenStrategyIT#blueGreen_allHealthy_stopsOldAfterNew:
  old is stopped only after all new replicas are healthy.
- BlueGreenStrategyIT#blueGreen_partialHealthy_preservesOldAndMarksFailed:
  strict all-healthy — one starting replica aborts the deploy and
  leaves the previous deployment RUNNING untouched.
- RollingStrategyIT#rolling_allHealthy_replacesOneByOne:
  InOrder on stopContainer confirms old-0 stops before old-1 (the
  interleaving that distinguishes rolling from blue-green).
- RollingStrategyIT#rolling_failsMidRollout_preservesRemainingOld:
  mid-rollout health failure stops only the in-flight new containers
  and the already-replaced old-0; old-1 stays untouched.

Shortens healthchecktimeout to 2s via @TestPropertySource so failure
paths complete in ~25s instead of ~60s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:00:00 +02:00
hsiegeln
653f983a08 deploy: rolling strategy (per-replica replacement)
Replace the Phase 3 stub with a working rolling implementation.

Flow:
- Capture previous deployment's per-index container ids up front.
- For i = 0..replicas-1:
  - Start new[i] (gen-suffixed name, coexists with old[i]).
  - Wait for new[i] healthy (new waitForOneHealthy helper).
  - On success: stop old[i] if present, continue.
  - On failure: stop in-flight new[0..i], leave un-replaced old[i+1..N]
    running, mark FAILED. Already-replaced old replicas are not
    restored — rolling is not reversible; user redeploys to recover.
- After the loop: sweep any leftover old replicas (when replica count
  shrank) and mark the old deployment STOPPED.

Resource peak: replicas + 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:53:52 +02:00
hsiegeln
459cdfe427 deploy: blue-green strategy (start → health-all → stop old)
Phase 3 of deployment-strategies plan. Refactor executeAsync to
dispatch on DeploymentStrategy.fromWire(config.deploymentStrategy()).

Blue-green (default):
- Start all N new replicas (gen-suffixed names coexist with old).
- Wait for ALL healthy (strict — partial-healthy = FAILED, preserves
  previous deployment untouched).
- Only then find + stop the previous deployment.
- Final status is always RUNNING; DEGRADED is now reserved for
  post-deploy replica crashes (set by DockerEventMonitor).

Rolling: stub — throws UnsupportedOperationException for now, gets
its real implementation in Phase 4.

Refactor details:
- Extract DeployCtx record to carry 13 per-deploy values around.
- Extract startReplica(ctx, i, stateOut) — shared by both strategy paths.
- Extract persistSnapshotAndMarkRunning(ctx, primaryCid) — shared finalizer.
- Rename waitForAnyHealthy → waitForAllHealthy (the name was misleading;
  the method already waited for all, just returned partial on timeout).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:51:24 +02:00
hsiegeln
652346dcd4 deploy: gen-suffixed container names + cameleer.generation label
Append an 8-char generation id (first 8 chars of deployment UUID) to:
- container name: {tenant}-{env}-{app}-{replica}-{gen}
- CAMELEER_AGENT_INSTANCEID (so old+new agents are distinct in the registry)
- Traefik cameleer.instance-id label

And emit a new standalone cameleer.generation label so dashboards
(Prometheus/Grafana) can pin deploy boundaries without regex on
instance-id.

Strategy branching comes next — this commit is foundation only; the
interim destroy-then-start flow still runs regardless of strategy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:45:44 +02:00
hsiegeln
5304c8ee01 core(deploy): DeploymentStrategy enum with safe wire conversion
Typed enum (BLUE_GREEN, ROLLING) with fromWire/toWire kebab-case
translation. fromWire falls back to BLUE_GREEN for unknown or null
input so the executor dispatch site never null-checks and no
misconfigured container-config can throw at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:42:35 +02:00
hsiegeln
2c82f29aef docs(plans): deployment strategies (blue-green + rolling) plan
7-phase plan to replace the interim destroy-then-start flow (f8dccaae)
with a strategy-aware executor. Adds gen-suffixed container names so
old + new replicas can coexist, plus a cameleer.generation label for
Prometheus/Grafana deploy-boundary annotations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:41:43 +02:00
143 changed files with 13242 additions and 683 deletions

View File

@@ -54,17 +54,17 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
### Env-scoped (user-facing data & config)
- `AppController``/api/v1/environments/{envSlug}/apps`. GET list / POST create / GET `{appSlug}` / DELETE `{appSlug}` / GET `{appSlug}/versions` / POST `{appSlug}/versions` (JAR upload) / PUT `{appSlug}/container-config` / GET `{appSlug}/dirty-state` (returns `DirtyStateResponse{dirty, lastSuccessfulDeploymentId, differences}` — compares current JAR+config against last RUNNING deployment snapshot; dirty=true when no snapshot exists). App slug uniqueness is per-env (`(env, app_slug)` is the natural key). `CreateAppRequest` body has no env (path), validates slug regex. Injects `DirtyStateCalculator` bean (registered in `RuntimeBeanConfig`, requires `ObjectMapper` with `JavaTimeModule`).
- `DeploymentController``/api/v1/environments/{envSlug}/apps/{appSlug}/deployments`. GET list / POST create (body `{ appVersionId }`) / POST `{id}/stop` / POST `{id}/promote` (body `{ targetEnvironment: slug }` — target app slug must exist in target env) / GET `{id}/logs`.
- `DeploymentController``/api/v1/environments/{envSlug}/apps/{appSlug}/deployments`. GET list / POST create (body `{ appVersionId }`) / POST `{id}/stop` / POST `{id}/promote` (body `{ targetEnvironment: slug }` — target app slug must exist in target env) / GET `{id}/logs`. All lifecycle ops (`POST /` deploy, `POST /{id}/stop`, `POST /{id}/promote`) audited under `AuditCategory.DEPLOYMENT`. Action codes: `deploy_app`, `stop_deployment`, `promote_deployment`. Acting user resolved via the `user:` prefix-strip convention; both SUCCESS and FAILURE branches write audit rows. `created_by` (TEXT, nullable) populated from `SecurityContextHolder` and surfaced on the `Deployment` DTO.
- `ApplicationConfigController``/api/v1/environments/{envSlug}`. GET `/config` (list), GET/PUT `/apps/{appSlug}/config`, GET `/apps/{appSlug}/processor-routes`, POST `/apps/{appSlug}/config/test-expression`. PUT accepts `?apply=staged|live` (default `live`). `live` saves to DB and pushes `CONFIG_UPDATE` SSE to live agents in this env (existing behavior); `staged` saves to DB only, skipping the SSE push — used by the unified app deployment page. Audit action is `stage_app_config` for staged writes, `update_app_config` for live. Invalid `apply` values return 400.
- `AppSettingsController``/api/v1/environments/{envSlug}`. GET `/app-settings` (list), GET/PUT/DELETE `/apps/{appSlug}/settings`. ADMIN/OPERATOR only.
- `SearchController``/api/v1/environments/{envSlug}`. GET `/executions`, POST `/executions/search`, GET `/stats`, `/stats/timeseries`, `/stats/timeseries/by-app`, `/stats/timeseries/by-route`, `/stats/punchcard`, `/attributes/keys`, `/errors/top`.
- `LogQueryController` — GET `/api/v1/environments/{envSlug}/logs` (filters: source (multi, comma-split, OR-joined), level (multi, comma-split, OR-joined), application, agentId, exchangeId, logger, q, time range; sort asc/desc). Cursor-paginated, returns `{ data, nextCursor, hasMore, levelCounts }`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — same-millisecond tiebreak via the `insert_id` UUID column on `logs`.
- `SearchController``/api/v1/environments/{envSlug}`. GET `/executions`, POST `/executions/search`, GET `/stats`, `/stats/timeseries`, `/stats/timeseries/by-app`, `/stats/timeseries/by-route`, `/stats/punchcard`, `/attributes/keys`, `/errors/top`. GET `/executions` accepts repeat `attr` query params: `attr=order` (key-exists), `attr=order:47` (exact), `attr=order:4*` (wildcard — `*` maps to SQL LIKE `%`). First `:` splits key/value; later colons stay in the value. Invalid keys → 400. POST `/executions/search` accepts the same filters via `SearchRequest.attributeFilters` in the body.
- `LogQueryController` — GET `/api/v1/environments/{envSlug}/logs` (filters: source (multi, comma-split, OR-joined), level (multi, comma-split, OR-joined), application, agentId, exchangeId, logger, q, time range, instanceIds (multi, comma-split, AND-joined as WHERE instance_id IN (...) — used by the Checkpoint detail drawer to scope logs to a deployment's replicas); sort asc/desc). Cursor-paginated, returns `{ data, nextCursor, hasMore, levelCounts }`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — same-millisecond tiebreak via the `insert_id` UUID column on `logs`.
- `RouteCatalogController` — GET `/api/v1/environments/{envSlug}/routes` (merged route catalog from registry + ClickHouse; env filter unconditional).
- `RouteMetricsController` — GET `/api/v1/environments/{envSlug}/routes/metrics`, GET `/api/v1/environments/{envSlug}/routes/metrics/processors`.
- `AgentListController` — GET `/api/v1/environments/{envSlug}/agents` (registered agents with runtime metrics, filtered to env).
- `AgentEventsController` — GET `/api/v1/environments/{envSlug}/agents/events` (lifecycle events; cursor-paginated, returns `{ data, nextCursor, hasMore }`; order `(timestamp DESC, insert_id DESC)`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"``insert_id` is a stable UUID column used as a same-millisecond tiebreak).
- `AgentMetricsController` — GET `/api/v1/environments/{envSlug}/agents/{agentId}/metrics` (JVM/Camel metrics). Rejects cross-env agents (404) as defence-in-depth.
- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` (env-scoped lookup). Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique).
- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` returns the most recent diagram for (app, env, route) via `DiagramStore.findLatestContentHashForAppRoute`. Registry-independent — routes whose publishing agents were removed still resolve. Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique), the point-in-time path consumed by the exchange viewer via `ExecutionDetail.diagramContentHash`.
- `AlertRuleController``/api/v1/environments/{envSlug}/alerts/rules`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/enable` / POST `{id}/disable` / POST `{id}/render-preview` / POST `{id}/test-evaluate`. OPERATOR+ for mutations, VIEWER+ for reads. CRITICAL: attribute keys in `ExchangeMatchCondition.filter.attributes` are validated at rule-save time against `^[a-zA-Z0-9._-]+$` — they are later inlined into ClickHouse SQL. `AgentLifecycleCondition` is allowlist-only — the `AgentLifecycleEventType` enum (REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED) plus the record compact ctor (non-empty `eventTypes`, `withinSeconds ≥ 1`) do the validation; custom agent-emitted event types are tracked in backlog issue #145. Webhook validation: verifies `outboundConnectionId` exists and `isAllowedInEnvironment`. Null notification templates default to `""` (NOT NULL constraint). Audit: `ALERT_RULE_CHANGE`.
- `AlertController``/api/v1/environments/{envSlug}/alerts`. GET list (inbox filtered by userId/groupIds/roleNames via `InAppInboxQuery`; optional multi-value `state`, `severity`, tri-state `acked`, tri-state `read` query params; soft-deleted rows always excluded) / GET `/unread-count` / GET `{id}` / POST `{id}/ack` / POST `{id}/read` / POST `/bulk-read` / POST `/bulk-ack` (VIEWER+) / DELETE `{id}` (OPERATOR+, soft-delete) / POST `/bulk-delete` (OPERATOR+) / POST `{id}/restore` (OPERATOR+, clears `deleted_at`). `requireLiveInstance` helper returns 404 on soft-deleted rows; `restore` explicitly fetches regardless of `deleted_at`. `BulkIdsRequest` is the shared body for bulk-read/ack/delete (`{ instanceIds }`). `AlertDto` includes `readAt`; `deletedAt` is intentionally NOT on the wire. Inbox SQL: `? = ANY(target_user_ids) OR target_group_ids && ? OR target_role_names && ?` — requires at least one matching target (no broadcast concept).
- `AlertSilenceController``/api/v1/environments/{envSlug}/alerts/silences`. GET list / POST create / DELETE `{id}`. 422 if `endsAt <= startsAt`. OPERATOR+ for mutations, VIEWER+ for list. Audit: `ALERT_SILENCE_CHANGE`.
@@ -109,6 +109,7 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
- `UsageAnalyticsController` — GET `/api/v1/admin/usage` (ClickHouse `usage_events`).
- `ClickHouseAdminController` — GET `/api/v1/admin/clickhouse/**` (conditional on `infrastructureendpoints` flag).
- `DatabaseAdminController` — GET `/api/v1/admin/database/**` (conditional on `infrastructureendpoints` flag).
- `ServerMetricsAdminController``/api/v1/admin/server-metrics/**`. GET `/catalog`, GET `/instances`, POST `/query`. Generic read API over the `server_metrics` ClickHouse table so SaaS dashboards don't need direct CH access. Delegates to `ServerMetricsQueryStore` (impl `ClickHouseServerMetricsQueryStore`). Visibility matches ClickHouse/Database admin: `@ConditionalOnProperty(infrastructureendpoints, matchIfMissing=true)` + class-level `@PreAuthorize("hasRole('ADMIN')")`. Validation: metric/tag regex `^[a-zA-Z0-9._]+$`, statistic regex `^[a-z_]+$`, `to - from ≤ 31 days`, stepSeconds ∈ [10, 3600], response capped at 500 series. `IllegalArgumentException` → 400. `/query` supports `raw` + `delta` modes (delta does per-`server_instance_id` positive-clipped differences, then aggregates across instances). Derived `statistic=mean` for timers computes `sum(total|total_time)/sum(count)` per bucket.
### Other (flat)
@@ -118,10 +119,10 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
## runtime/ — Docker orchestration
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor`@Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
- `DeploymentExecutor`@Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 chars of the deployment UUID — old and new replicas coexist during a blue/green swap. Per-replica `CAMELEER_AGENT_INSTANCEID` env var is `{envSlug}-{appSlug}-{replicaIndex}-{generation}`. Branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`: **blue-green** (default) starts all N → waits for all healthy → stops old (partial health = FAILED, preserves old untouched); **rolling** replaces replicas one at a time with rollback only for in-flight new containers (already-replaced old stay stopped; un-replaced old keep serving). DEGRADED is now only set by `DockerEventMonitor` post-deploy, never by the executor.
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Per-container identity labels: `cameleer.replica` (index), `cameleer.generation` (deployment-scoped 8-char id — for Prometheus/Grafana deploy-boundary annotations), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Router/service label keys are generation-agnostic so load balancing spans old + new replicas during a blue/green overlap.
- `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
- `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
@@ -129,6 +130,8 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
## metrics/ — Prometheus observability
- `ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
- `ServerInstanceIdConfig``@Configuration`, exposes `@Bean("serverInstanceId") String`. Resolution precedence: `cameleer.server.instance-id` property → `HOSTNAME` env → `InetAddress.getLocalHost()` → random UUID. Fixed at boot; rotates across restarts so counters restart cleanly.
- `ServerMetricsSnapshotScheduler``@Scheduled(fixedDelayString = "${cameleer.server.self-metrics.interval-ms:60000}")`. Walks `MeterRegistry.getMeters()` each tick, emits one `ServerMetricSample` per `Measurement` (Timer/DistributionSummary produce multiple rows per meter — one per Micrometer `Statistic`). Skips non-finite values; logs and swallows store failures. Disabled via `cameleer.server.self-metrics.enabled=false` (`@ConditionalOnProperty`). Write-only — no query endpoint yet; inspect via `/api/v1/admin/clickhouse/query`.
## storage/ — PostgreSQL repositories (JdbcTemplate)
@@ -145,6 +148,8 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
- `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
- `ClickHouseUsageTracker` — usage_events for billing
- `ClickHouseRouteCatalogStore` — persistent route catalog with first_seen cache, warm-loaded on startup
- `ClickHouseServerMetricsStore` — periodic dumps of the server's own Micrometer registry into the `server_metrics` table. Tenant-stamped (bound at the scheduler, not the bean); no `environment` column (server straddles envs). Batch-insert via `JdbcTemplate.batchUpdate` with `Map(String, String)` tag binding. Written by `ServerMetricsSnapshotScheduler`.
- `ClickHouseServerMetricsQueryStore` — read side of `server_metrics` for dashboards. Implements `ServerMetricsQueryStore`. `catalog(from,to)` returns name+type+statistics+tagKeys, `listInstances(from,to)` returns server_instance_ids with first/last seen, `query(request)` builds bucketed time-series with `raw` or `delta` mode and supports a derived `mean` statistic for timers. All identifier inputs regex-validated; tenant_id always bound; max range 31 days; series count capped at 500. Exposed via `ServerMetricsAdminController`.
## search/ — ClickHouse search and log stores

View File

@@ -8,8 +8,11 @@ paths:
# CI/CD & Deployment
- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches. `paths-ignore` skips the whole pipeline for docs-only / `.planning/` / `.claude/` / `*.md` changes (push and PR triggers).
- Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
- Build caches (parallel `actions/cache@v4` steps in the `build` job): `~/.m2/repository` (key on all `pom.xml`), `~/.npm` (key on `ui/package-lock.json`), `ui/node_modules/.vite` (key on `ui/package-lock.json` + `ui/vite.config.ts`). UI install uses `npm ci --prefer-offline --no-audit --fund=false` so the npm cache is the primary source.
- Maven build performance (set in `pom.xml` and `cameleer-server-app/pom.xml`): `useIncrementalCompilation=true` on the compiler plugin; Surefire uses `forkCount=1C` + `reuseForks=true` (one JVM per CPU core, reused across test classes); Failsafe keeps `forkCount=1` + `reuseForks=true`. Unit tests must not rely on per-class JVM isolation.
- UI build script (`ui/package.json`): `build` is `vite build` only — the type-check pass was split out into `npm run typecheck` (run separately when you want a full `tsc --noEmit` sweep).
- Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime. `docker-entrypoint.sh` imports `/certs/ca.pem` into JVM truststore before starting the app (supports custom CAs for OIDC discovery without `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY`).
- `REGISTRY_TOKEN` build arg required for `cameleer-common` dependency resolution
- Registry: `gitea.siegeln.net/cameleer/cameleer-server` (container images)

View File

@@ -28,15 +28,16 @@ paths:
- `AppVersion` — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
- `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
- `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName, createdBy (String, user_id reference; nullable for pre-V4 historical rows)
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED. `DEGRADED` is reserved for post-deploy drift (a replica died after RUNNING); `DeploymentExecutor` now marks partial-healthy deploys FAILED, not DEGRADED.
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
- `DeploymentStrategy` — enum: BLUE_GREEN, ROLLING. Stored on `ResolvedContainerConfig.deploymentStrategy` as kebab-case string (`"blue-green"` / `"rolling"`). `fromWire(String)` is the only conversion entry point; unknown/null inputs fall back to BLUE_GREEN so the executor dispatch site never null-checks or throws.
- `DeploymentService` — createDeployment (calls `deleteFailedByAppAndEnvironment` first so FAILED rows don't pile up; STOPPED rows are preserved as restorable checkpoints), markRunning, markFailed, markStopped
- `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
- `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
- `ContainerRequest` — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
- `ContainerStatus` — record: state, running, exitCode, error
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks, externalRouting (default `true`; when `false`, `TraefikLabelBuilder` strips all `traefik.*` labels so the container is not publicly routed), certResolver (server-wide, sourced from `CAMELEER_SERVER_RUNTIME_CERTRESOLVER`; when blank the `tls.certresolver` label is omitted — use for dev installs with a static TLS store)
- `RoutingMode` — enum for routing strategies
- `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
- `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs, startLogCapture, stopLogCapture
@@ -46,14 +47,15 @@ paths:
## search/ — Execution search and stats
- `SearchService` — search, count, stats, statsForApp, statsForRoute, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys. `statsForRoute`/`timeseriesForRoute` take `(routeId, applicationId)` — app filter is applied to `stats_1m_route`.
- `SearchRequest` / `SearchResult` — search DTOs
- `SearchRequest` / `SearchResult` — search DTOs. `SearchRequest.attributeFilters: List<AttributeFilter>` carries structured facet filters for execution attributes — key-only (exists), exact (key=value), or wildcard (`*` in value). The 21-arg legacy ctor is preserved for call-site churn; the compact ctor normalises null → `List.of()`.
- `AttributeFilter(key, value)` — record with key regex `^[a-zA-Z0-9._-]+$` (inlined into SQL, same constraint as alerting), `value == null` means key-exists, `value` containing `*` becomes a SQL LIKE pattern via `toLikePattern()`.
- `ExecutionStats`, `ExecutionSummary` — stats aggregation records
- `StatsTimeseries`, `TopError` — timeseries and error DTOs
- `LogSearchRequest` / `LogSearchResponse` — log search DTOs. `LogSearchRequest.sources` / `levels` are `List<String>` (null-normalized, multi-value OR); `cursor` + `limit` + `sort` drive keyset pagination. Response carries `nextCursor` + `hasMore` + per-level `levelCounts`.
## storage/ — Storage abstractions
- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `RouteCatalogStore`, `SearchIndex`, `LogIndex` — interfaces
- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `RouteCatalogStore`, `SearchIndex`, `LogIndex` — interfaces. `DiagramStore.findLatestContentHashForAppRoute(appId, routeId, env)` resolves the latest diagram by (app, env, route) without consulting the agent registry, so routes whose publishing agents were removed between app versions still resolve. `findContentHashForRoute(route, instance)` is retained for the ingestion path that stamps a per-execution `diagramContentHash` at ingest time (point-in-time link from `ExecutionDetail`/`ExecutionSummary`).
- `RouteCatalogEntry` — record: applicationId, routeId, environment, firstSeen, lastSeen
- `LogEntryResult` — log query result record
- `model/``ExecutionDocument`, `MetricTimeSeries`, `MetricsSnapshot`
@@ -79,7 +81,7 @@ paths:
- `AppSettings`, `AppSettingsRepository` — per-app-per-env settings config and persistence. Record carries `(applicationId, environment, …)`; repository methods are `findByApplicationAndEnvironment`, `findByEnvironment`, `save`, `delete(appId, env)`. `AppSettings.defaults(appId, env)` produces a default instance scoped to an environment.
- `ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
- `AuditService` — audit logging facade
- `AuditRecord`, `AuditResult`, `AuditCategory` (enum: `INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE`), `AuditRepository` — audit trail records and persistence
- `AuditRecord`, `AuditResult`, `AuditCategory` (enum: `INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE, ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE, DEPLOYMENT`), `AuditRepository` — audit trail records and persistence
## http/ — Outbound HTTP primitives (cross-cutting)

View File

@@ -13,19 +13,28 @@ paths:
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: `cameleer.replica` (index), `cameleer.generation` (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap. When `ResolvedContainerConfig.externalRouting()` is `false` (UI: Resources → External Routing, default `true`), the builder emits ONLY the identity labels (`managed-by`, `cameleer.*`) and skips every `traefik.*` label — the container stays on `cameleer-traefik` and the per-env network (so sibling containers can still reach it via Docker DNS) but is invisible to Traefik. The `tls.certresolver` label is emitted only when `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` is set to a non-blank resolver name (matching a resolver configured in the Traefik static config). When unset (dev installs backed by a static TLS store) only `tls=true` is emitted and Traefik serves the default cert from the TLS store.
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on `application + environment` (and optionally `replica_index`).
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
## DeploymentExecutor Details
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
**Container naming**`{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, `DockerEventMonitor`, and `ContainerLogForwarder` key on container **id**, not name — the name is operator-visibility only.
**Strategy dispatch**`DeploymentStrategy.fromWire(config.deploymentStrategy())` branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.
- **Blue/green** (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
- **Rolling**: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.
Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environment`) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.
## Deployment Status Model
@@ -34,17 +43,13 @@ Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNET
| `STOPPED` | Intentionally stopped or initial state |
| `STARTING` | Deploy in progress |
| `RUNNING` | All replicas healthy and serving |
| `DEGRADED` | Some replicas healthy, some dead |
| `DEGRADED` | Post-deploy: a replica died after the deploy was marked RUNNING. Set by `DockerEventMonitor` reconciliation, never by `DeploymentExecutor` directly. |
| `STOPPING` | Graceful shutdown in progress |
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
| `FAILED` | Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift. |
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
**Deployment retention**: `DeploymentService.createDeployment()` deletes FAILED deployments for the same app+environment before creating a new one, preventing failed-attempt buildup. STOPPED deployments are preserved as restorable checkpoints — the UI Checkpoints disclosure lists every deployment with a non-null `deployed_config_snapshot` (RUNNING, DEGRADED, STOPPED) minus the current one.
## JAR Management

View File

@@ -8,7 +8,9 @@ paths:
# Prometheus Metrics
Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component.
The same `MeterRegistry` is also snapshotted to ClickHouse every 60 s by `ServerMetricsSnapshotScheduler` (see "Server self-metrics persistence" at the bottom of this file) — so historical server-health data survives restarts without an external Prometheus.
## Gauges (auto-polled)
@@ -83,3 +85,23 @@ Mean processing time = `camel.route.policy.total_time / camel.route.policy.count
| `cameleer.sse.reconnects.count` | counter | `instanceId` |
| `cameleer.taps.evaluated.count` | counter | `instanceId` |
| `cameleer.metrics.exported.count` | counter | `instanceId` |
## Server self-metrics persistence
`ServerMetricsSnapshotScheduler` walks `MeterRegistry.getMeters()` every 60 s (configurable via `cameleer.server.self-metrics.interval-ms`) and writes one row per Micrometer `Measurement` to the ClickHouse `server_metrics` table. Full registry is captured — Spring Boot Actuator series (`jvm.*`, `process.*`, `http.server.requests`, `hikaricp.*`, `jdbc.*`, `tomcat.*`, `logback.events`, `system.*`) plus `cameleer.*` and `alerting_*`.
**Table** (`cameleer-server-app/src/main/resources/clickhouse/init.sql`):
```
server_metrics(tenant_id, collected_at, server_instance_id,
metric_name, metric_type, statistic, metric_value,
tags Map(String,String), server_received_at)
```
- `metric_type` — lowercase Micrometer `Meter.Type` (counter, gauge, timer, distribution_summary, long_task_timer, other)
- `statistic` — Micrometer `Statistic.getTagValueRepresentation()` (value, count, total, total_time, max, mean, active_tasks, duration). Timers emit 3 rows per tick (count + total_time + max); gauges/counters emit 1 (`statistic='value'` or `'count'`).
- No `environment` column — the server is env-agnostic.
- `tenant_id` threaded from `cameleer.server.tenant.id` (single-tenant per server).
- `server_instance_id` resolved once at boot by `ServerInstanceIdConfig` (property → HOSTNAME → localhost → UUID fallback). Rotates across restarts so counter resets are unambiguous.
- TTL: 90 days (vs 365 for `agent_metrics`). Write-only in v1 — no query endpoint or UI page. Inspect via ClickHouse admin: `/api/v1/admin/clickhouse/query` or direct SQL.
- Toggle off entirely with `cameleer.server.self-metrics.enabled=false` (uses `@ConditionalOnProperty`).

View File

@@ -14,13 +14,14 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
- Routes: `/apps` (list, `AppListView` in `AppsTab.tsx`), `/apps/new` + `/apps/:slug` (both render `AppDeploymentPage`).
- Identity & Artifact section always visible; name editable pre-first-deploy, read-only after. JAR picker client-stages; new JAR + any form edits flip the primary button from `Save` to `Redeploy`. Environment fixed to the currently-selected env (no selector).
- Config sub-tabs: **Monitoring | Resources | Variables | Sensitive Keys | Deployment | ● Traces & Taps | ● Route Recording**. The four staged tabs feed dirty detection; the `●` live tabs apply in real-time (amber LiveBanner + default `?apply=live` on their writes) and never mark dirty.
- Primary action state machine: `Save` (persists desired state without deploying) → `Redeploy` (applies desired state) → `Deploying…` during active deploy.
- Checkpoints disclosure in Identity section lists past successful deployments (current running one hidden, pruned-JAR rows disabled). Restore hydrates the form from `deployments.deployed_config_snapshot` for Save + Redeploy.
- Deployment tab: `StatusCard` + `DeploymentProgress` (during STARTING / FAILED) + flex-grow `StartupLogPanel` (no fixed maxHeight) + `HistoryDisclosure`. Auto-activates when a deploy starts.
- Primary action state machine: `Save` `Uploading… N%` (during JAR upload; button shows percent with a tinted progress-fill overlay) → `Redeploy``Deploying…` during active deploy. Upload progress sourced from `useUploadJar` (XHR `upload.onprogress` → page-level `uploadPct` state). The button is disabled during `uploading` and `deploying`.
- Checkpoints render as a collapsible `CheckpointsTable` (default **collapsed**) **inside the Identity & Artifact `configGrid`** as an in-grid row (`Checkpoints | ▸ Expand (N)` / `▾ Collapse (N)`). `CheckpointsTable` returns a React.Fragment of grid-ready children so the label + trigger align with the other identity rows; when opened, a third grid child spans both columns via `grid-column: 1 / -1` so the 7-column table gets full width. Wired through `IdentitySection.checkpointsSlot``CheckpointDetailDrawer` stays in `IdentitySection.children` because it portals. Columns: Version · JAR (filename) · Deployed by · Deployed (relative `timeAgo` + user-locale sub-line via `new Date(iso).toLocaleString()`) · Strategy · Outcome · . Row click opens the drawer. Drawer tabs are ordered **Config | Logs** with `Config` as the default. Config panel has Snapshot / Diff vs current view modes. Replica filter in the Logs panel uses DS `Select`. Restore lives in the drawer footer (forces review). Visible row cap = `Environment.jarRetentionCount` (default 10 if 0/null); older rows accessible via "Show older (N)" expander. Currently-running deployment is excluded — represented separately by `StatusCard`. The empty-checkpoints case returns `null` (no row). The legacy `Checkpoints.tsx` row-list component is gone.
- Deployment tab: `StatusCard` + `DeploymentProgress` (during STARTING / FAILED) + flex-grow `StartupLogPanel` (no fixed maxHeight). Auto-activates when a deploy starts. The former `HistoryDisclosure` is retired — per-deployment config and logs live in the Checkpoints drawer. `StartupLogPanel` header mirrors the Runtime Application Log pattern: title + live/stopped badge + `N entries` + sort toggle (↑/↓, default **desc**) + refresh icon (`RefreshCw`). Sort drives the backend fetch via `useStartupLogs(…, sort)` so the 500-line limit returns the window closest to the user's interest; display order matches fetch order. Refresh scrolls to the latest edge (top for desc, bottom for asc). Sort + refresh buttons disable while a refetch is in flight. 3s polling while STARTING is unchanged.
- Unsaved-change router blocker uses DS `AlertDialog` (not `window.beforeunload`). Env switch intentionally discards edits without warning.
**Admin pages** (ADMIN-only, under `/admin/`):
- **Sensitive Keys** (`ui/src/pages/Admin/SensitiveKeysPage.tsx`) — global sensitive key masking config. Shows agent built-in defaults as outlined Badge reference, editable Tag pills for custom keys, amber-highlighted push-to-agents toggle. Keys add to (not replace) agent defaults. Per-app sensitive key additions managed via `ApplicationConfigController` API. Note: `AppConfigDetailPage.tsx` exists but is not routed in `router.tsx`.
- **Server Metrics** (`ui/src/pages/Admin/ServerMetricsAdminPage.tsx`) — dashboard over the `server_metrics` ClickHouse table. Visibility matches Database/ClickHouse pages: gated on `capabilities.infrastructureEndpoints` in `buildAdminTreeNodes`; backend is `@ConditionalOnProperty(infrastructureendpoints) + @PreAuthorize('hasRole(ADMIN)')`. Uses the generic `/api/v1/admin/server-metrics/{catalog,instances,query}` API via `ui/src/api/queries/admin/serverMetrics.ts` hooks (`useServerMetricsCatalog`, `useServerMetricsInstances`, `useServerMetricsSeries`), all three of which take a `ServerMetricsRange = { from: Date; to: Date }`. Time range is driven by the global TopBar picker via `useGlobalFilters()` — no page-local selector; bucket size auto-scales through `stepSecondsFor(windowSeconds)` (10 s up to 1 h buckets). Toolbar is just server-instance badges. Sections: Server health (agents/ingestion/auth), JVM (memory/CPU/GC/threads), HTTP & DB pools, Alerting (conditional on catalog), Deployments (conditional on catalog). Each panel is a `ThemedChart` with `Line`/`Area` children from the design system; multi-series responses are flattened into overlap rows by bucket timestamp. Alerting and Deployments rows are hidden when their metrics aren't in the catalog (zero-deploy / alerting-disabled installs).
## Key UI Files
@@ -39,6 +40,7 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
- `ui/src/api/queries/agents.ts``useAgents` for agent list, `useInfiniteAgentEvents` for cursor-paginated timeline stream
- `ui/src/hooks/useInfiniteStream.ts` — tanstack `useInfiniteQuery` wrapper with top-gated auto-refetch, flattened `items[]`, and `refresh()` invalidator
- `ui/src/components/InfiniteScrollArea.tsx` — scrollable container with IntersectionObserver top/bottom sentinels. Streaming log/event views use this + `useInfiniteStream`. Bounded views (LogTab, StartupLogPanel) keep `useLogs`/`useStartupLogs`
- `ui/src/components/SideDrawer.tsx` — project-local right-slide drawer (DS has Modal but no Drawer). Portal-rendered, ESC + transparent-backdrop click closes, sticky header/footer, sizes md/lg/xl. Currently consumed only by `CheckpointDetailDrawer` — promote to `@cameleer/design-system` once a second consumer appears.
## Alerts

View File

@@ -5,8 +5,20 @@ on:
branches: [main, 'feature/**', 'fix/**', 'feat/**']
tags-ignore:
- 'v*'
paths-ignore:
- '.planning/**'
- 'docs/**'
- '**/*.md'
- '.claude/**'
- 'AGENTS.md'
- 'CLAUDE.md'
pull_request:
branches: [main]
paths-ignore:
- '.planning/**'
- 'docs/**'
- '**/*.md'
- '.claude/**'
delete:
jobs:
@@ -45,11 +57,25 @@ jobs:
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-maven-
- name: Cache npm registry
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-npm-${{ hashFiles('ui/package-lock.json') }}
restore-keys: ${{ runner.os }}-npm-
- name: Cache Vite build artifacts
uses: actions/cache@v4
with:
path: ui/node_modules/.vite
key: ${{ runner.os }}-vite-${{ hashFiles('ui/package-lock.json', 'ui/vite.config.ts') }}
restore-keys: ${{ runner.os }}-vite-
- name: Build UI
working-directory: ui
run: |
echo '//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}' >> .npmrc
npm ci
npm ci --prefer-offline --no-audit --fund=false
npm run build
env:
REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}

View File

@@ -1,7 +1,7 @@
<!-- gitnexus:start -->
# GitNexus — Code Intelligence
This project is indexed by GitNexus as **cameleer-server** (9095 symbols, 23495 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
This project is indexed by GitNexus as **cameleer-server** (9731 symbols, 24987 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.

View File

@@ -22,8 +22,19 @@ Cameleer Server — observability server that receives, stores, and serves Camel
```bash
mvn clean compile # Compile all modules
mvn clean verify # Full build with tests
mvn clean verify -DskipITs # Fast: unit tests only (no Testcontainers)
```
### Faster local builds
- **Surefire reuses forks** (`cameleer-server-app/pom.xml`): unit tests run with `forkCount=1C` + `reuseForks=true` — one JVM per CPU core, reused across classes. Test classes that mutate static state must clean up after themselves.
- **Testcontainers reuse** — opt-in per developer. Add to `~/.testcontainers.properties`:
```
testcontainers.reuse.enable=true
```
Then `AbstractPostgresIT` containers persist across `mvn verify` runs (saves ~20s per run). Stop them manually when you need a clean DB: `docker rm -f $(docker ps -aq --filter label=org.testcontainers.reuse=true)`.
- **UI build** dropped redundant `tsc --noEmit` from `npm run build` (Vite/esbuild type-checks during bundling). Run `npm run typecheck` explicitly when you want a full type-check pass.
## Run
```bash
@@ -85,7 +96,7 @@ When adding, removing, or renaming classes, controllers, endpoints, UI component
<!-- gitnexus:start -->
# GitNexus — Code Intelligence
This project is indexed by GitNexus as **cameleer-server** (9095 symbols, 23495 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
This project is indexed by GitNexus as **cameleer-server** (9731 symbols, 24987 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.

View File

@@ -499,6 +499,7 @@ Key settings in `cameleer-server-app/src/main/resources/application.yml`. All cu
| `cameleer.server.runtime.routingmode` | `path` | `CAMELEER_SERVER_RUNTIME_ROUTINGMODE` | `path` or `subdomain` Traefik routing |
| `cameleer.server.runtime.routingdomain` | `localhost` | `CAMELEER_SERVER_RUNTIME_ROUTINGDOMAIN` | Domain for Traefik routing labels |
| `cameleer.server.runtime.serverurl` | *(empty)* | `CAMELEER_SERVER_RUNTIME_SERVERURL` | Server URL injected into app containers |
| `cameleer.server.runtime.certresolver` | *(empty)* | `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` | Traefik TLS cert resolver name (e.g. `letsencrypt`). Blank = omit the `tls.certresolver` label and let Traefik serve the default TLS-store cert |
| `cameleer.server.runtime.agenthealthport` | `9464` | `CAMELEER_SERVER_RUNTIME_AGENTHEALTHPORT` | Agent health check port |
| `cameleer.server.runtime.healthchecktimeout` | `60` | `CAMELEER_SERVER_RUNTIME_HEALTHCHECKTIMEOUT` | Health check timeout (seconds) |
| `cameleer.server.runtime.container.memorylimit` | `512m` | `CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT` | Default memory limit for app containers |

View File

@@ -189,8 +189,8 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<forkCount>1</forkCount>
<reuseForks>false</reuseForks>
<forkCount>1C</forkCount>
<reuseForks>true</reuseForks>
</configuration>
</plugin>
<plugin>

View File

@@ -61,7 +61,8 @@ public class LogPatternEvaluator implements ConditionEvaluator<LogPatternConditi
to,
null, // cursor
1, // limit (count query; value irrelevant)
"desc" // sort
"desc", // sort
null // instanceIds
);
return logStore.countLogs(req);
});

View File

@@ -9,6 +9,8 @@ import com.cameleer.server.app.storage.ClickHouseRouteCatalogStore;
import com.cameleer.server.core.storage.RouteCatalogStore;
import com.cameleer.server.app.storage.ClickHouseMetricsQueryStore;
import com.cameleer.server.app.storage.ClickHouseMetricsStore;
import com.cameleer.server.app.storage.ClickHouseServerMetricsQueryStore;
import com.cameleer.server.app.storage.ClickHouseServerMetricsStore;
import com.cameleer.server.app.storage.ClickHouseStatsStore;
import com.cameleer.server.core.admin.AuditRepository;
import com.cameleer.server.core.admin.AuditService;
@@ -67,6 +69,19 @@ public class StorageBeanConfig {
return new ClickHouseMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
}
@Bean
public ServerMetricsStore clickHouseServerMetricsStore(
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseServerMetricsStore(clickHouseJdbc);
}
@Bean
public ServerMetricsQueryStore clickHouseServerMetricsQueryStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseServerMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
}
// ── Execution Store ──────────────────────────────────────────────────
@Bean

View File

@@ -196,7 +196,16 @@ public class CatalogController {
}
Set<String> routeIds = routesByApp.getOrDefault(slug, Set.of());
List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
// Resolve the env slug for this row early so fromUri can survive
// cross-env queries (env==null) against managed apps.
String rowEnvSlug = envSlug;
if (app != null && rowEnvSlug.isEmpty()) {
try {
rowEnvSlug = envService.getById(app.environmentId()).slug();
} catch (Exception ignored) {}
}
final String resolvedEnvSlug = rowEnvSlug;
// Routes
List<RouteSummary> routeSummaries = routeIds.stream()
@@ -204,7 +213,7 @@ public class CatalogController {
String key = slug + "/" + routeId;
long count = routeExchangeCounts.getOrDefault(key, 0L);
Instant lastSeen = routeLastSeen.get(key);
String fromUri = resolveFromEndpointUri(routeId, agentIds);
String fromUri = resolveFromEndpointUri(slug, routeId, resolvedEnvSlug);
String state = routeStateRegistry.getState(slug, routeId).name().toLowerCase();
String routeState = "started".equals(state) ? null : state;
return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
@@ -258,15 +267,9 @@ public class CatalogController {
String healthTooltip = buildHealthTooltip(app != null, deployStatus, agentHealth, agents.size());
String displayName = app != null ? app.displayName() : slug;
String appEnvSlug = envSlug;
if (app != null && appEnvSlug.isEmpty()) {
try {
appEnvSlug = envService.getById(app.environmentId()).slug();
} catch (Exception ignored) {}
}
catalog.add(new CatalogApp(
slug, displayName, app != null, appEnvSlug,
slug, displayName, app != null, resolvedEnvSlug,
health, healthTooltip, agents.size(), routeSummaries, agentSummaries,
totalExchanges, deploymentSummary
));
@@ -275,8 +278,11 @@ public class CatalogController {
return ResponseEntity.ok(catalog);
}
private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
private String resolveFromEndpointUri(String applicationId, String routeId, String environment) {
if (environment == null || environment.isBlank()) {
return null;
}
return diagramStore.findLatestContentHashForAppRoute(applicationId, routeId, environment)
.flatMap(diagramStore::findByContentHash)
.map(RouteGraph::getRoot)
.map(root -> root.getEndpointUri())

View File

@@ -2,8 +2,13 @@ package com.cameleer.server.app.controller;
import com.cameleer.server.app.runtime.DeploymentExecutor;
import com.cameleer.server.app.web.EnvPath;
import com.cameleer.server.core.admin.AuditCategory;
import com.cameleer.server.core.admin.AuditResult;
import com.cameleer.server.core.admin.AuditService;
import com.cameleer.server.core.runtime.App;
import com.cameleer.server.core.runtime.AppService;
import com.cameleer.server.core.runtime.AppVersion;
import com.cameleer.server.core.runtime.AppVersionRepository;
import com.cameleer.server.core.runtime.Deployment;
import com.cameleer.server.core.runtime.DeploymentService;
import com.cameleer.server.core.runtime.Environment;
@@ -12,14 +17,18 @@ import com.cameleer.server.core.runtime.RuntimeOrchestrator;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.servlet.http.HttpServletRequest;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.security.core.context.SecurityContextHolder;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import java.util.List;
import java.util.Map;
@@ -42,17 +51,23 @@ public class DeploymentController {
private final RuntimeOrchestrator orchestrator;
private final AppService appService;
private final EnvironmentService environmentService;
private final AuditService auditService;
private final AppVersionRepository appVersionRepository;
public DeploymentController(DeploymentService deploymentService,
DeploymentExecutor deploymentExecutor,
RuntimeOrchestrator orchestrator,
AppService appService,
EnvironmentService environmentService) {
EnvironmentService environmentService,
AuditService auditService,
AppVersionRepository appVersionRepository) {
this.deploymentService = deploymentService;
this.deploymentExecutor = deploymentExecutor;
this.orchestrator = orchestrator;
this.appService = appService;
this.environmentService = environmentService;
this.auditService = auditService;
this.appVersionRepository = appVersionRepository;
}
@GetMapping
@@ -86,13 +101,25 @@ public class DeploymentController {
@ApiResponse(responseCode = "202", description = "Deployment accepted and starting")
public ResponseEntity<Deployment> deploy(@EnvPath Environment env,
@PathVariable String appSlug,
@RequestBody DeployRequest request) {
@RequestBody DeployRequest request,
HttpServletRequest httpRequest) {
try {
App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), env.id());
AppVersion appVersion = appVersionRepository.findById(request.appVersionId())
.orElseThrow(() -> new IllegalArgumentException("AppVersion not found: " + request.appVersionId()));
Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), env.id(), currentUserId());
deploymentExecutor.executeAsync(deployment);
auditService.log("deploy_app", AuditCategory.DEPLOYMENT, deployment.id().toString(),
Map.of("appSlug", appSlug, "envSlug", env.slug(),
"appVersionId", request.appVersionId().toString(),
"jarFilename", appVersion.jarFilename() != null ? appVersion.jarFilename() : "",
"version", appVersion.version()),
AuditResult.SUCCESS, httpRequest);
return ResponseEntity.accepted().body(deployment);
} catch (IllegalArgumentException e) {
auditService.log("deploy_app", AuditCategory.DEPLOYMENT, null,
Map.of("appSlug", appSlug, "envSlug", env.slug(), "error", e.getMessage()),
AuditResult.FAILURE, httpRequest);
return ResponseEntity.notFound().build();
}
}
@@ -103,12 +130,19 @@ public class DeploymentController {
@ApiResponse(responseCode = "404", description = "Deployment not found")
public ResponseEntity<Deployment> stop(@EnvPath Environment env,
@PathVariable String appSlug,
@PathVariable UUID deploymentId) {
@PathVariable UUID deploymentId,
HttpServletRequest httpRequest) {
try {
Deployment deployment = deploymentService.getById(deploymentId);
deploymentExecutor.stopDeployment(deployment);
auditService.log("stop_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
Map.of("appSlug", appSlug, "envSlug", env.slug()),
AuditResult.SUCCESS, httpRequest);
return ResponseEntity.ok(deploymentService.getById(deploymentId));
} catch (IllegalArgumentException e) {
auditService.log("stop_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
Map.of("appSlug", appSlug, "envSlug", env.slug(), "error", e.getMessage()),
AuditResult.FAILURE, httpRequest);
return ResponseEntity.notFound().build();
}
}
@@ -122,18 +156,26 @@ public class DeploymentController {
public ResponseEntity<?> promote(@EnvPath Environment env,
@PathVariable String appSlug,
@PathVariable UUID deploymentId,
@RequestBody PromoteRequest request) {
@RequestBody PromoteRequest request,
HttpServletRequest httpRequest) {
try {
App sourceApp = appService.getByEnvironmentAndSlug(env.id(), appSlug);
Deployment source = deploymentService.getById(deploymentId);
Environment targetEnv = environmentService.getBySlug(request.targetEnvironment());
// Target must also have the app with the same slug
App targetApp = appService.getByEnvironmentAndSlug(targetEnv.id(), appSlug);
Deployment promoted = deploymentService.promote(targetApp.id(), source.appVersionId(), targetEnv.id());
Deployment promoted = deploymentService.promote(targetApp.id(), source.appVersionId(), targetEnv.id(), currentUserId());
deploymentExecutor.executeAsync(promoted);
auditService.log("promote_deployment", AuditCategory.DEPLOYMENT, promoted.id().toString(),
Map.of("sourceEnv", env.slug(), "targetEnv", request.targetEnvironment(),
"appSlug", appSlug, "appVersionId", source.appVersionId().toString()),
AuditResult.SUCCESS, httpRequest);
return ResponseEntity.accepted().body(promoted);
} catch (IllegalArgumentException e) {
return ResponseEntity.status(org.springframework.http.HttpStatus.NOT_FOUND)
auditService.log("promote_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
Map.of("sourceEnv", env.slug(), "targetEnv", request.targetEnvironment(),
"appSlug", appSlug, "error", e.getMessage()),
AuditResult.FAILURE, httpRequest);
return ResponseEntity.status(HttpStatus.NOT_FOUND)
.body(Map.of("error", e.getMessage()));
}
}
@@ -157,6 +199,15 @@ public class DeploymentController {
}
}
private String currentUserId() {
var auth = SecurityContextHolder.getContext().getAuthentication();
if (auth == null || auth.getName() == null) {
throw new ResponseStatusException(HttpStatus.UNAUTHORIZED, "No authentication");
}
String name = auth.getName();
return name.startsWith("user:") ? name.substring(5) : name;
}
public record DeployRequest(UUID appVersionId) {}
public record PromoteRequest(String targetEnvironment) {}
}

View File

@@ -2,8 +2,6 @@ package com.cameleer.server.app.controller;
import com.cameleer.common.graph.RouteGraph;
import com.cameleer.server.app.web.EnvPath;
import com.cameleer.server.core.agent.AgentInfo;
import com.cameleer.server.core.agent.AgentRegistryService;
import com.cameleer.server.core.diagram.DiagramLayout;
import com.cameleer.server.core.diagram.DiagramRenderer;
import com.cameleer.server.core.runtime.Environment;
@@ -21,7 +19,6 @@ import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
import java.util.Optional;
/**
@@ -42,14 +39,11 @@ public class DiagramRenderController {
private final DiagramStore diagramStore;
private final DiagramRenderer diagramRenderer;
private final AgentRegistryService registryService;
public DiagramRenderController(DiagramStore diagramStore,
DiagramRenderer diagramRenderer,
AgentRegistryService registryService) {
DiagramRenderer diagramRenderer) {
this.diagramStore = diagramStore;
this.diagramRenderer = diagramRenderer;
this.registryService = registryService;
}
@GetMapping("/api/v1/diagrams/{contentHash}/render")
@@ -90,8 +84,8 @@ public class DiagramRenderController {
@GetMapping("/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram")
@Operation(summary = "Find the latest diagram for this app's route in this environment",
description = "Resolves agents in this env for this app, then looks up the latest diagram for the route "
+ "they reported. Env scope prevents a dev route from returning a prod diagram.")
description = "Returns the most recently stored diagram for (app, env, route). Independent of the "
+ "agent registry, so routes removed from the current app version still resolve.")
@ApiResponse(responseCode = "200", description = "Diagram layout returned")
@ApiResponse(responseCode = "404", description = "No diagram found")
public ResponseEntity<DiagramLayout> findByAppAndRoute(
@@ -99,15 +93,7 @@ public class DiagramRenderController {
@PathVariable String appSlug,
@PathVariable String routeId,
@RequestParam(defaultValue = "LR") String direction) {
List<String> agentIds = registryService.findByApplicationAndEnvironment(appSlug, env.slug()).stream()
.map(AgentInfo::instanceId)
.toList();
if (agentIds.isEmpty()) {
return ResponseEntity.notFound().build();
}
Optional<String> contentHash = diagramStore.findContentHashForRouteByAgents(routeId, agentIds);
Optional<String> contentHash = diagramStore.findLatestContentHashForAppRoute(appSlug, routeId, env.slug());
if (contentHash.isEmpty()) {
return ResponseEntity.notFound().build();
}

View File

@@ -44,6 +44,7 @@ public class LogQueryController {
@RequestParam(required = false) String exchangeId,
@RequestParam(required = false) String logger,
@RequestParam(required = false) String source,
@RequestParam(required = false) String instanceIds,
@RequestParam(required = false) String from,
@RequestParam(required = false) String to,
@RequestParam(required = false) String cursor,
@@ -69,12 +70,21 @@ public class LogQueryController {
.toList();
}
List<String> instanceIdList = List.of();
if (instanceIds != null && !instanceIds.isEmpty()) {
instanceIdList = Arrays.stream(instanceIds.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.toList();
}
Instant fromInstant = from != null ? Instant.parse(from) : null;
Instant toInstant = to != null ? Instant.parse(to) : null;
LogSearchRequest request = new LogSearchRequest(
searchText, levels, application, instanceId, exchangeId,
logger, env.slug(), sources, fromInstant, toInstant, cursor, limit, sort);
logger, env.slug(), sources, fromInstant, toInstant, cursor, limit, sort,
instanceIdList);
LogSearchResponse result = logIndex.search(request);

View File

@@ -132,13 +132,12 @@ public class RouteCatalogController {
List<AgentInfo> agents = agentsByApp.getOrDefault(appId, List.of());
Set<String> routeIds = routesByApp.getOrDefault(appId, Set.of());
List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
List<RouteSummary> routeSummaries = routeIds.stream()
.map(routeId -> {
String key = appId + "/" + routeId;
long count = routeExchangeCounts.getOrDefault(key, 0L);
Instant lastSeen = routeLastSeen.get(key);
String fromUri = resolveFromEndpointUri(routeId, agentIds);
String fromUri = resolveFromEndpointUri(appId, routeId, envSlug);
String state = routeStateRegistry.getState(appId, routeId).name().toLowerCase();
String routeState = "started".equals(state) ? null : state;
return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
@@ -160,8 +159,8 @@ public class RouteCatalogController {
return ResponseEntity.ok(catalog);
}
private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
private String resolveFromEndpointUri(String applicationId, String routeId, String environment) {
return diagramStore.findLatestContentHashForAppRoute(applicationId, routeId, environment)
.flatMap(diagramStore::findByContentHash)
.map(RouteGraph::getRoot)
.map(root -> root.getEndpointUri())

View File

@@ -4,6 +4,7 @@ import com.cameleer.server.app.web.EnvPath;
import com.cameleer.server.core.admin.AppSettings;
import com.cameleer.server.core.admin.AppSettingsRepository;
import com.cameleer.server.core.runtime.Environment;
import com.cameleer.server.core.search.AttributeFilter;
import com.cameleer.server.core.search.ExecutionStats;
import com.cameleer.server.core.search.ExecutionSummary;
import com.cameleer.server.core.search.SearchRequest;
@@ -14,6 +15,7 @@ import com.cameleer.server.core.search.TopError;
import com.cameleer.server.core.storage.StatsStore;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
@@ -21,8 +23,10 @@ import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
@@ -57,11 +61,19 @@ public class SearchController {
@RequestParam(name = "agentId", required = false) String instanceId,
@RequestParam(required = false) String processorType,
@RequestParam(required = false) String application,
@RequestParam(name = "attr", required = false) List<String> attr,
@RequestParam(defaultValue = "0") int offset,
@RequestParam(defaultValue = "50") int limit,
@RequestParam(required = false) String sortField,
@RequestParam(required = false) String sortDir) {
List<AttributeFilter> attributeFilters;
try {
attributeFilters = parseAttrParams(attr);
} catch (IllegalArgumentException e) {
throw new ResponseStatusException(HttpStatus.BAD_REQUEST, e.getMessage(), e);
}
SearchRequest request = new SearchRequest(
status, timeFrom, timeTo,
null, null,
@@ -72,12 +84,36 @@ public class SearchController {
offset, limit,
sortField, sortDir,
null,
env.slug()
env.slug(),
attributeFilters
);
return ResponseEntity.ok(searchService.search(request));
}
/**
* Parses {@code attr} query params of the form {@code key} (key-only) or {@code key:value}
* (exact or wildcard via {@code *}). Splits on the first {@code :}; later colons are part of
* the value. Blank / null list → empty result. Key validation is delegated to
* {@link AttributeFilter}'s compact constructor, which throws {@link IllegalArgumentException}
* on invalid keys (mapped to 400 by the caller).
*/
static List<AttributeFilter> parseAttrParams(List<String> raw) {
if (raw == null || raw.isEmpty()) return List.of();
List<AttributeFilter> out = new ArrayList<>(raw.size());
for (String entry : raw) {
if (entry == null || entry.isBlank()) continue;
int colon = entry.indexOf(':');
if (colon < 0) {
out.add(new AttributeFilter(entry.trim(), null));
} else {
out.add(new AttributeFilter(entry.substring(0, colon).trim(),
entry.substring(colon + 1)));
}
}
return out;
}
@PostMapping("/executions/search")
@Operation(summary = "Advanced search with all filters",
description = "Env from the path overrides any environment field in the body.")

View File

@@ -0,0 +1,148 @@
package com.cameleer.server.app.controller;
import com.cameleer.server.core.storage.ServerMetricsQueryStore;
import com.cameleer.server.core.storage.model.ServerInstanceInfo;
import com.cameleer.server.core.storage.model.ServerMetricCatalogEntry;
import com.cameleer.server.core.storage.model.ServerMetricQueryRequest;
import com.cameleer.server.core.storage.model.ServerMetricQueryResponse;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.time.Instant;
import java.util.List;
import java.util.Map;
/**
* Generic read API over the ClickHouse {@code server_metrics} table. Lets
* SaaS control planes build server-health dashboards without requiring direct
* ClickHouse access.
*
* <p>Three endpoints cover all 17 panels in {@code docs/server-self-metrics.md}:
* <ul>
* <li>{@code GET /catalog} — discover available metric names, types, statistics, and tags</li>
* <li>{@code POST /query} — generic time-series query with aggregation, grouping, filtering, and counter-delta mode</li>
* <li>{@code GET /instances} — list server instances (useful for partitioning counter math)</li>
* </ul>
*
* <p>Visibility matches {@code ClickHouseAdminController} / {@code DatabaseAdminController}:
* <ul>
* <li>Conditional on {@code cameleer.server.security.infrastructureendpoints=true} (default).</li>
* <li>Class-level {@code @PreAuthorize("hasRole('ADMIN')")} on top of the
* {@code /api/v1/admin/**} catch-all in {@code SecurityConfig}.</li>
* </ul>
*/
@ConditionalOnProperty(
name = "cameleer.server.security.infrastructureendpoints",
havingValue = "true",
matchIfMissing = true
)
@RestController
@RequestMapping("/api/v1/admin/server-metrics")
@PreAuthorize("hasRole('ADMIN')")
@Tag(name = "Server Self-Metrics",
description = "Read API over the server's own Micrometer registry snapshots (ADMIN only)")
public class ServerMetricsAdminController {
/** Default lookback window for catalog/instances when from/to are omitted. */
private static final long DEFAULT_LOOKBACK_SECONDS = 3_600L;
private final ServerMetricsQueryStore store;
public ServerMetricsAdminController(ServerMetricsQueryStore store) {
this.store = store;
}
@GetMapping("/catalog")
@Operation(summary = "List metric names observed in the window",
description = "For each metric_name, returns metric_type, the set of statistics emitted, and the union of tag keys.")
public ResponseEntity<List<ServerMetricCatalogEntry>> catalog(
@RequestParam(required = false) String from,
@RequestParam(required = false) String to) {
Instant[] window = resolveWindow(from, to);
return ResponseEntity.ok(store.catalog(window[0], window[1]));
}
@GetMapping("/instances")
@Operation(summary = "List server_instance_id values observed in the window",
description = "Returns first/last seen timestamps — use to partition counter-delta computations.")
public ResponseEntity<List<ServerInstanceInfo>> instances(
@RequestParam(required = false) String from,
@RequestParam(required = false) String to) {
Instant[] window = resolveWindow(from, to);
return ResponseEntity.ok(store.listInstances(window[0], window[1]));
}
@PostMapping("/query")
@Operation(summary = "Generic time-series query",
description = "Returns bucketed series for a single metric_name. Supports aggregation (avg/sum/max/min/latest), group-by-tag, filter-by-tag, counter delta mode, and a derived 'mean' statistic for timers.")
public ResponseEntity<ServerMetricQueryResponse> query(@RequestBody QueryBody body) {
ServerMetricQueryRequest request = new ServerMetricQueryRequest(
body.metric(),
body.statistic(),
parseInstant(body.from(), "from"),
parseInstant(body.to(), "to"),
body.stepSeconds(),
body.groupByTags(),
body.filterTags(),
body.aggregation(),
body.mode(),
body.serverInstanceIds());
return ResponseEntity.ok(store.query(request));
}
@ExceptionHandler(IllegalArgumentException.class)
public ResponseEntity<Map<String, String>> handleBadRequest(IllegalArgumentException e) {
return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
}
private static Instant[] resolveWindow(String from, String to) {
Instant toI = to != null ? parseInstant(to, "to") : Instant.now();
Instant fromI = from != null
? parseInstant(from, "from")
: toI.minusSeconds(DEFAULT_LOOKBACK_SECONDS);
if (!fromI.isBefore(toI)) {
throw new IllegalArgumentException("from must be strictly before to");
}
return new Instant[]{fromI, toI};
}
private static Instant parseInstant(String raw, String field) {
if (raw == null || raw.isBlank()) {
throw new IllegalArgumentException(field + " is required");
}
try {
return Instant.parse(raw);
} catch (Exception e) {
throw new IllegalArgumentException(
field + " must be an ISO-8601 instant (e.g. 2026-04-23T10:00:00Z)");
}
}
/**
* Request body for {@link #query(QueryBody)}. Uses ISO-8601 strings on
* the wire so the OpenAPI schema stays language-neutral.
*/
public record QueryBody(
String metric,
String statistic,
String from,
String to,
Integer stepSeconds,
List<String> groupByTags,
Map<String, String> filterTags,
String aggregation,
String mode,
List<String> serverInstanceIds
) {
}
}

View File

@@ -6,8 +6,10 @@ import com.cameleer.server.core.admin.AuditService;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.stereotype.Component;
import org.springframework.util.AntPathMatcher;
import org.springframework.web.servlet.HandlerInterceptor;
import java.util.List;
import java.util.Map;
import java.util.Set;
@@ -22,7 +24,9 @@ import java.util.Set;
public class AuditInterceptor implements HandlerInterceptor {
private static final Set<String> AUDITABLE_METHODS = Set.of("POST", "PUT", "DELETE");
private static final Set<String> EXCLUDED_PATHS = Set.of("/api/v1/search/executions");
private static final List<String> EXCLUDED_PATH_PATTERNS = List.of(
"/api/v1/environments/*/executions/search");
private static final AntPathMatcher PATH_MATCHER = new AntPathMatcher();
private final AuditService auditService;
@@ -41,8 +45,10 @@ public class AuditInterceptor implements HandlerInterceptor {
}
String path = request.getRequestURI();
if (EXCLUDED_PATHS.contains(path)) {
return;
for (String pattern : EXCLUDED_PATH_PATTERNS) {
if (PATH_MATCHER.match(pattern, path)) {
return;
}
}
AuditResult result = response.getStatus() < 400 ? AuditResult.SUCCESS : AuditResult.FAILURE;

View File

@@ -0,0 +1,63 @@
package com.cameleer.server.app.metrics;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.UUID;
/**
* Resolves a stable identifier for this server process, used as the
* {@code server_instance_id} on every server_metrics sample. The value is
* fixed at boot, so counters restart cleanly whenever the id rotates.
*
* <p>Precedence:
* <ol>
* <li>{@code cameleer.server.instance-id} property / {@code CAMELEER_SERVER_INSTANCE_ID} env
* <li>{@code HOSTNAME} env (populated by Docker/Kubernetes)
* <li>{@link InetAddress#getLocalHost()} hostname
* <li>Random UUID (fallback — only hit when DNS and env are both silent)
* </ol>
*/
@Configuration
public class ServerInstanceIdConfig {
private static final Logger log = LoggerFactory.getLogger(ServerInstanceIdConfig.class);
@Bean("serverInstanceId")
public String serverInstanceId(
@Value("${cameleer.server.instance-id:}") String configuredId) {
if (!isBlank(configuredId)) {
log.info("Server instance id resolved from configuration: {}", configuredId);
return configuredId;
}
String hostnameEnv = System.getenv("HOSTNAME");
if (!isBlank(hostnameEnv)) {
log.info("Server instance id resolved from HOSTNAME env: {}", hostnameEnv);
return hostnameEnv;
}
try {
String localHost = InetAddress.getLocalHost().getHostName();
if (!isBlank(localHost)) {
log.info("Server instance id resolved from localhost lookup: {}", localHost);
return localHost;
}
} catch (UnknownHostException e) {
log.debug("InetAddress.getLocalHost() failed, falling back to UUID: {}", e.getMessage());
}
String fallback = UUID.randomUUID().toString();
log.warn("Server instance id could not be resolved; using random UUID {}", fallback);
return fallback;
}
private static boolean isBlank(String s) {
return s == null || s.isBlank();
}
}

View File

@@ -0,0 +1,106 @@
package com.cameleer.server.app.metrics;
import com.cameleer.server.core.storage.ServerMetricsStore;
import com.cameleer.server.core.storage.model.ServerMetricSample;
import io.micrometer.core.instrument.Measurement;
import io.micrometer.core.instrument.Meter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import java.time.Instant;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
/**
* Periodically snapshots every meter in the server's {@link MeterRegistry}
* and writes the result to ClickHouse via {@link ServerMetricsStore}. This
* gives us historical server-health data (buffer depths, agent transitions,
* flush latency, JVM memory, HTTP response counts, etc.) without requiring
* an external Prometheus.
*
* <p>Each Micrometer {@link Meter#measure() measurement} becomes one row, so
* a single Timer produces rows for {@code count}, {@code total_time}, and
* {@code max} each tick. Counter values are cumulative since meter
* registration (Prometheus convention) — callers compute rate() themselves.
*
* <p>Disabled via {@code cameleer.server.self-metrics.enabled=false}.
*/
@Component
@ConditionalOnProperty(
prefix = "cameleer.server.self-metrics",
name = "enabled",
havingValue = "true",
matchIfMissing = true)
public class ServerMetricsSnapshotScheduler {
private static final Logger log = LoggerFactory.getLogger(ServerMetricsSnapshotScheduler.class);
private final MeterRegistry registry;
private final ServerMetricsStore store;
private final String tenantId;
private final String serverInstanceId;
public ServerMetricsSnapshotScheduler(
MeterRegistry registry,
ServerMetricsStore store,
@Value("${cameleer.server.tenant.id:default}") String tenantId,
@Qualifier("serverInstanceId") String serverInstanceId) {
this.registry = registry;
this.store = store;
this.tenantId = tenantId;
this.serverInstanceId = serverInstanceId;
}
@Scheduled(fixedDelayString = "${cameleer.server.self-metrics.interval-ms:60000}",
initialDelayString = "${cameleer.server.self-metrics.interval-ms:60000}")
public void snapshot() {
try {
Instant now = Instant.now();
List<ServerMetricSample> batch = new ArrayList<>();
for (Meter meter : registry.getMeters()) {
Meter.Id id = meter.getId();
Map<String, String> tags = flattenTags(id.getTagsAsIterable());
String type = id.getType().name().toLowerCase();
for (Measurement m : meter.measure()) {
double v = m.getValue();
if (!Double.isFinite(v)) continue;
batch.add(new ServerMetricSample(
tenantId,
now,
serverInstanceId,
id.getName(),
type,
m.getStatistic().getTagValueRepresentation(),
v,
tags));
}
}
if (!batch.isEmpty()) {
store.insertBatch(batch);
log.debug("Persisted {} server self-metric samples", batch.size());
}
} catch (Exception e) {
log.warn("Server self-metrics snapshot failed: {}", e.getMessage());
}
}
private static Map<String, String> flattenTags(Iterable<Tag> tags) {
Map<String, String> out = new LinkedHashMap<>();
for (Tag t : tags) {
out.put(t.getKey(), t.getValue());
}
return out;
}
}

View File

@@ -62,6 +62,9 @@ public class DeploymentExecutor {
@Value("${cameleer.server.runtime.serverurl:}")
private String globalServerUrl;
@Value("${cameleer.server.runtime.certresolver:}")
private String globalCertResolver;
@Value("${cameleer.server.runtime.jardockervolume:}")
private String jarDockerVolume;
@@ -89,6 +92,34 @@ public class DeploymentExecutor {
this.applicationConfigRepository = applicationConfigRepository;
}
/** Deployment-scoped id suffix — distinguishes container names and
* CAMELEER_AGENT_INSTANCEID across redeploys so old + new replicas can
* coexist during a blue/green swap. First 8 chars of the deployment UUID. */
static String generationOf(Deployment deployment) {
return deployment.id().toString().substring(0, 8);
}
/**
* Per-deployment context assembled once at the top of executeAsync and passed
* into strategy handlers. Keeps the strategy methods readable instead of
* threading 12 positional args.
*/
private record DeployCtx(
Deployment deployment,
App app,
Environment env,
ResolvedContainerConfig config,
String jarPath,
String resolvedRuntimeType,
String mainClass,
String generation,
String primaryNetwork,
List<String> additionalNets,
Map<String, String> baseEnvVars,
Map<String, String> prometheusLabels,
long deployStart
) {}
@Async("deploymentTaskExecutor")
public void executeAsync(Deployment deployment) {
long deployStart = System.currentTimeMillis();
@@ -96,13 +127,15 @@ public class DeploymentExecutor {
App app = appService.getById(deployment.appId());
Environment env = envService.getById(deployment.environmentId());
String jarPath = appService.resolveJarPath(deployment.appVersionId());
String generation = generationOf(deployment);
var globalDefaults = new ConfigMerger.GlobalRuntimeDefaults(
parseMemoryLimitMb(globalMemoryLimit),
globalCpuShares,
globalRoutingMode,
globalRoutingDomain,
globalServerUrl.isBlank() ? "http://cameleer-server:8081" : globalServerUrl
globalServerUrl.isBlank() ? "http://cameleer-server:8081" : globalServerUrl,
globalCertResolver.isBlank() ? null : globalCertResolver
);
ResolvedContainerConfig config = ConfigMerger.resolve(
globalDefaults, env.defaultContainerConfig(), app.containerConfig());
@@ -144,7 +177,6 @@ public class DeploymentExecutor {
updateStage(deployment.id(), DeployStage.CREATE_NETWORK);
// Primary network: use configured CAMELEER_DOCKER_NETWORK (tenant-isolated in SaaS mode)
String primaryNetwork = dockerNetwork;
String envNet = null;
List<String> additionalNets = new ArrayList<>();
if (networkManager != null) {
networkManager.ensureNetwork(primaryNetwork);
@@ -152,7 +184,7 @@ public class DeploymentExecutor {
networkManager.ensureNetwork(DockerNetworkManager.TRAEFIK_NETWORK);
additionalNets.add(DockerNetworkManager.TRAEFIK_NETWORK);
// Per-environment network scoped to tenant to prevent cross-tenant collisions
envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
String envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
networkManager.ensureNetwork(envNet);
additionalNets.add(envNet);
}
@@ -167,135 +199,21 @@ public class DeploymentExecutor {
}
}
// === STOP PREVIOUS ACTIVE DEPLOYMENT ===
// Container names are deterministic ({tenant}-{env}-{app}-{replica}), so a
// previous active deployment holds the Docker names we need. Stop + remove
// it before starting new replicas to avoid a 409 name conflict. Excluding
// the current deployment id by SQL (not Java) because the newly created
// row already has status=STARTING and would otherwise be picked by
// findActiveByAppIdAndEnvironmentId ORDER BY created_at DESC LIMIT 1.
Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
deployment.appId(), deployment.environmentId(), deployment.id());
if (previous.isPresent()) {
log.info("Stopping previous deployment {} before starting new replicas", previous.get().id());
stopDeploymentContainers(previous.get());
deploymentService.markStopped(previous.get().id());
DeployCtx ctx = new DeployCtx(
deployment, app, env, config, jarPath,
resolvedRuntimeType, mainClass, generation,
primaryNetwork, additionalNets,
buildEnvVars(app, env, config),
PrometheusLabelBuilder.build(resolvedRuntimeType),
deployStart);
// Dispatch on strategy. Unknown values fall back to BLUE_GREEN via fromWire.
DeploymentStrategy strategy = DeploymentStrategy.fromWire(config.deploymentStrategy());
switch (strategy) {
case BLUE_GREEN -> deployBlueGreen(ctx);
case ROLLING -> deployRolling(ctx);
}
// === START REPLICAS ===
updateStage(deployment.id(), DeployStage.START_REPLICAS);
Map<String, String> baseEnvVars = buildEnvVars(app, env, config);
Map<String, String> prometheusLabels = PrometheusLabelBuilder.build(resolvedRuntimeType);
List<Map<String, Object>> replicaStates = new ArrayList<>();
List<String> newContainerIds = new ArrayList<>();
for (int i = 0; i < config.replicas(); i++) {
String instanceId = env.slug() + "-" + app.slug() + "-" + i;
String containerName = tenantId + "-" + instanceId;
// Per-replica labels (include replica index and instance-id)
Map<String, String> labels = TraefikLabelBuilder.build(app.slug(), env.slug(), tenantId, config, i);
labels.putAll(prometheusLabels);
// Per-replica env vars (set agent instance ID to match container log identity)
Map<String, String> replicaEnvVars = new LinkedHashMap<>(baseEnvVars);
replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
ContainerRequest request = new ContainerRequest(
containerName, baseImage, jarPath,
volumeName, jarStoragePath,
primaryNetwork,
additionalNets,
replicaEnvVars, labels,
config.memoryLimitBytes(), config.memoryReserveBytes(),
config.dockerCpuShares(), config.dockerCpuQuota(),
config.exposedPorts(), agentHealthPort,
"on-failure", 3,
resolvedRuntimeType, config.customArgs(), mainClass
);
String containerId = orchestrator.startContainer(request);
newContainerIds.add(containerId);
// Connect to additional networks after container is started
for (String net : additionalNets) {
if (networkManager != null) {
networkManager.connectContainer(containerId, net);
}
}
orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
replicaStates.add(Map.of(
"index", i,
"containerId", containerId,
"containerName", containerName,
"status", "STARTING"
));
}
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === HEALTH CHECK ===
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
int healthyCount = waitForAnyHealthy(newContainerIds, healthCheckTimeout);
if (healthyCount == 0) {
for (String cid : newContainerIds) {
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
}
pgDeployRepo.updateDeployStage(deployment.id(), null);
deploymentService.markFailed(deployment.id(), "No replicas passed health check within " + healthCheckTimeout + "s");
serverMetrics.recordDeploymentOutcome("FAILED");
serverMetrics.recordDeploymentDuration(deployStart);
return;
}
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === SWAP TRAFFIC ===
// Traffic is routed via Traefik Docker labels, so the "swap" happens
// implicitly once the new replicas are healthy and the old containers
// are gone. The old deployment was already stopped before START_REPLICAS
// to free the deterministic container names.
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
// === COMPLETE ===
updateStage(deployment.id(), DeployStage.COMPLETE);
// Capture config snapshot before marking RUNNING
ApplicationConfig agentConfig = applicationConfigRepository
.findByApplicationAndEnvironment(app.slug(), env.slug())
.orElse(null);
List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
deployment.appVersionId(),
agentConfig,
app.containerConfig(),
snapshotSensitiveKeys
);
pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
String primaryContainerId = newContainerIds.get(0);
DeploymentStatus finalStatus = healthyCount == config.replicas()
? DeploymentStatus.RUNNING : DeploymentStatus.DEGRADED;
deploymentService.markRunning(deployment.id(), primaryContainerId);
if (finalStatus == DeploymentStatus.DEGRADED) {
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.DEGRADED,
primaryContainerId, null);
}
pgDeployRepo.updateDeployStage(deployment.id(), null);
serverMetrics.recordDeploymentOutcome(finalStatus.name());
serverMetrics.recordDeploymentDuration(deployStart);
log.info("Deployment {} is {} ({}/{} replicas healthy)",
deployment.id(), finalStatus, healthyCount, config.replicas());
} catch (Exception e) {
log.error("Deployment {} FAILED: {}", deployment.id(), e.getMessage(), e);
pgDeployRepo.updateDeployStage(deployment.id(), null);
@@ -305,6 +223,262 @@ public class DeploymentExecutor {
}
}
/**
* Blue/green strategy: start all N new replicas (coexisting with the old
* ones thanks to the gen-suffixed container names), wait for ALL healthy,
* then stop the previous deployment. Strict all-healthy — partial failure
* preserves the previous deployment untouched.
*/
private void deployBlueGreen(DeployCtx ctx) {
ResolvedContainerConfig config = ctx.config();
Deployment deployment = ctx.deployment();
// === START REPLICAS ===
updateStage(deployment.id(), DeployStage.START_REPLICAS);
List<Map<String, Object>> replicaStates = new ArrayList<>();
List<String> newContainerIds = new ArrayList<>();
for (int i = 0; i < config.replicas(); i++) {
Map<String, Object> state = new LinkedHashMap<>();
String containerId = startReplica(ctx, i, state);
newContainerIds.add(containerId);
replicaStates.add(state);
}
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === HEALTH CHECK ===
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
int healthyCount = waitForAllHealthy(newContainerIds, healthCheckTimeout);
if (healthyCount < config.replicas()) {
// Strict abort: tear down new replicas, leave the previous deployment untouched.
for (String cid : newContainerIds) {
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
}
pgDeployRepo.updateDeployStage(deployment.id(), null);
String reason = String.format(
"blue-green: %d/%d replicas healthy within %ds; preserving previous deployment",
healthyCount, config.replicas(), healthCheckTimeout);
deploymentService.markFailed(deployment.id(), reason);
serverMetrics.recordDeploymentOutcome("FAILED");
serverMetrics.recordDeploymentDuration(ctx.deployStart());
return;
}
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === SWAP TRAFFIC ===
// All new replicas are healthy; Traefik labels are already attracting
// traffic to them. Stop the previous deployment now — the swap is
// implicit in the label-driven load balancer.
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
deployment.appId(), deployment.environmentId(), deployment.id());
if (previous.isPresent()) {
log.info("blue-green: stopping previous deployment {} now that new replicas are healthy",
previous.get().id());
stopDeploymentContainers(previous.get());
deploymentService.markStopped(previous.get().id());
}
// === COMPLETE ===
updateStage(deployment.id(), DeployStage.COMPLETE);
persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
log.info("Deployment {} is RUNNING (blue-green, {}/{} replicas healthy)",
deployment.id(), healthyCount, config.replicas());
}
/**
* Rolling strategy: replace replicas one at a time — start new[i], wait
* healthy, stop old[i]. On any replica's health failure, stop the
* in-flight new container, leave remaining old replicas serving, mark
* FAILED. Already-replaced old containers are not restored (can't unring
* that bell) — user redeploys to recover.
*
* Resource peak: replicas + 1 (briefly while a new replica warms up
* before its counterpart is stopped).
*/
private void deployRolling(DeployCtx ctx) {
ResolvedContainerConfig config = ctx.config();
Deployment deployment = ctx.deployment();
// Capture previous deployment's per-index container ids up front.
Optional<Deployment> previousOpt = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
deployment.appId(), deployment.environmentId(), deployment.id());
Map<Integer, String> oldContainerByIndex = new LinkedHashMap<>();
if (previousOpt.isPresent() && previousOpt.get().replicaStates() != null) {
for (Map<String, Object> r : previousOpt.get().replicaStates()) {
Object idx = r.get("index");
Object cid = r.get("containerId");
if (idx instanceof Number n && cid instanceof String s) {
oldContainerByIndex.put(n.intValue(), s);
}
}
}
// === START REPLICAS ===
updateStage(deployment.id(), DeployStage.START_REPLICAS);
List<Map<String, Object>> replicaStates = new ArrayList<>();
List<String> newContainerIds = new ArrayList<>();
for (int i = 0; i < config.replicas(); i++) {
// Start new replica i (gen-suffixed name; coexists with old[i]).
Map<String, Object> state = new LinkedHashMap<>();
String newCid = startReplica(ctx, i, state);
newContainerIds.add(newCid);
replicaStates.add(state);
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === HEALTH CHECK (per-replica) ===
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
boolean healthy = waitForOneHealthy(newCid, healthCheckTimeout);
if (!healthy) {
// Abort: stop this in-flight new replica AND any new replicas
// started so far. Already-stopped old replicas stay stopped
// (rolling is not reversible). Remaining un-replaced old
// replicas keep serving traffic.
for (String cid : newContainerIds) {
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
}
pgDeployRepo.updateDeployStage(deployment.id(), null);
String reason = String.format(
"rolling: replica %d failed to reach healthy within %ds; %d previous replicas still running",
i, healthCheckTimeout, oldContainerByIndex.size());
deploymentService.markFailed(deployment.id(), reason);
serverMetrics.recordDeploymentOutcome("FAILED");
serverMetrics.recordDeploymentDuration(ctx.deployStart());
return;
}
// Health check passed: update replica status to RUNNING, stop the
// corresponding old[i] if present, and continue with replica i+1.
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
String oldCid = oldContainerByIndex.remove(i);
if (oldCid != null) {
try {
orchestrator.stopContainer(oldCid);
orchestrator.removeContainer(oldCid);
log.info("rolling: replaced replica {} (old={}, new={})", i, oldCid, newCid);
} catch (Exception e) {
log.warn("rolling: failed to stop old replica {} ({}): {}", i, oldCid, e.getMessage());
}
}
}
// === SWAP TRAFFIC ===
// Any old replicas with indices >= new.replicas (e.g., when replica
// count shrank) are still running; sweep them now so the old
// deployment can be marked STOPPED.
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
for (Map.Entry<Integer, String> e : oldContainerByIndex.entrySet()) {
try {
orchestrator.stopContainer(e.getValue());
orchestrator.removeContainer(e.getValue());
log.info("rolling: stopped leftover old replica {} ({})", e.getKey(), e.getValue());
} catch (Exception ex) {
log.warn("rolling: failed to stop leftover old replica {}: {}", e.getKey(), ex.getMessage());
}
}
if (previousOpt.isPresent()) {
deploymentService.markStopped(previousOpt.get().id());
}
// === COMPLETE ===
updateStage(deployment.id(), DeployStage.COMPLETE);
persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
log.info("Deployment {} is RUNNING (rolling, {}/{} replicas replaced)",
deployment.id(), config.replicas(), config.replicas());
}
/** Poll a single container until healthy or the timeout expires. Returns
* true on healthy, false on timeout or thread interrupt. */
private boolean waitForOneHealthy(String containerId, int timeoutSeconds) {
long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
while (System.currentTimeMillis() < deadline) {
ContainerStatus status = orchestrator.getContainerStatus(containerId);
if ("healthy".equals(status.state())) return true;
try { Thread.sleep(2000); } catch (InterruptedException e) {
Thread.currentThread().interrupt();
return false;
}
}
return false;
}
/** Start one replica container with the gen-suffixed name and return its
* container id. Fills `stateOut` with the replicaStates JSONB row. */
private String startReplica(DeployCtx ctx, int i, Map<String, Object> stateOut) {
Environment env = ctx.env();
App app = ctx.app();
ResolvedContainerConfig config = ctx.config();
String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + ctx.generation();
String containerName = tenantId + "-" + instanceId;
Map<String, String> labels = TraefikLabelBuilder.build(
app.slug(), env.slug(), tenantId, config, i, ctx.generation());
labels.putAll(ctx.prometheusLabels());
Map<String, String> replicaEnvVars = new LinkedHashMap<>(ctx.baseEnvVars());
replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
ContainerRequest request = new ContainerRequest(
containerName, baseImage, ctx.jarPath(),
volumeName, jarStoragePath,
ctx.primaryNetwork(),
ctx.additionalNets(),
replicaEnvVars, labels,
config.memoryLimitBytes(), config.memoryReserveBytes(),
config.dockerCpuShares(), config.dockerCpuQuota(),
config.exposedPorts(), agentHealthPort,
"on-failure", 3,
ctx.resolvedRuntimeType(), config.customArgs(), ctx.mainClass()
);
String containerId = orchestrator.startContainer(request);
// Connect to additional networks after container is started
for (String net : ctx.additionalNets()) {
if (networkManager != null) {
networkManager.connectContainer(containerId, net);
}
}
orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
stateOut.put("index", i);
stateOut.put("containerId", containerId);
stateOut.put("containerName", containerName);
stateOut.put("status", "STARTING");
return containerId;
}
/** Persist the deployment snapshot and mark the deployment RUNNING.
* Finalizes the deploy in a single place shared by all strategy paths. */
private void persistSnapshotAndMarkRunning(DeployCtx ctx, String primaryContainerId) {
Deployment deployment = ctx.deployment();
ApplicationConfig agentConfig = applicationConfigRepository
.findByApplicationAndEnvironment(ctx.app().slug(), ctx.env().slug())
.orElse(null);
List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
deployment.appVersionId(),
agentConfig,
ctx.app().containerConfig(),
snapshotSensitiveKeys);
pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
deploymentService.markRunning(deployment.id(), primaryContainerId);
pgDeployRepo.updateDeployStage(deployment.id(), null);
serverMetrics.recordDeploymentOutcome("RUNNING");
serverMetrics.recordDeploymentDuration(ctx.deployStart());
}
public void stopDeployment(Deployment deployment) {
pgDeployRepo.updateTargetState(deployment.id(), "STOPPED");
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.STOPPING,
@@ -370,7 +544,10 @@ public class DeploymentExecutor {
return envVars;
}
private int waitForAnyHealthy(List<String> containerIds, int timeoutSeconds) {
/** Poll until all containers are healthy or the timeout expires. Returns
* the healthy count at return time — == ids.size() on full success, less
* if the timeout won. */
private int waitForAllHealthy(List<String> containerIds, int timeoutSeconds) {
long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
int lastHealthy = 0;
while (System.currentTimeMillis() < deadline) {
@@ -432,6 +609,10 @@ public class DeploymentExecutor {
map.put("runtimeType", config.runtimeType());
map.put("customArgs", config.customArgs());
map.put("extraNetworks", config.extraNetworks());
map.put("externalRouting", config.externalRouting());
if (config.certResolver() != null) {
map.put("certResolver", config.certResolver());
}
return map;
}
}

View File

@@ -10,19 +10,28 @@ public final class TraefikLabelBuilder {
private TraefikLabelBuilder() {}
public static Map<String, String> build(String appSlug, String envSlug, String tenantId,
ResolvedContainerConfig config, int replicaIndex) {
ResolvedContainerConfig config, int replicaIndex,
String generation) {
// Traefik router/service keys stay generation-agnostic so load balancing
// spans old + new replicas during a blue/green overlap. instance-id and
// the new generation label carry the per-deploy identity.
String svc = envSlug + "-" + appSlug;
String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex;
String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex + "-" + generation;
Map<String, String> labels = new LinkedHashMap<>();
labels.put("traefik.enable", "true");
labels.put("managed-by", "cameleer-server");
labels.put("cameleer.tenant", tenantId);
labels.put("cameleer.app", appSlug);
labels.put("cameleer.environment", envSlug);
labels.put("cameleer.replica", String.valueOf(replicaIndex));
labels.put("cameleer.generation", generation);
labels.put("cameleer.instance-id", instanceId);
if (!config.externalRouting()) {
return labels;
}
labels.put("traefik.enable", "true");
labels.put("traefik.http.services." + svc + ".loadbalancer.server.port",
String.valueOf(config.appPort()));
@@ -46,7 +55,10 @@ public final class TraefikLabelBuilder {
if (config.sslOffloading()) {
labels.put("traefik.http.routers." + svc + ".tls", "true");
labels.put("traefik.http.routers." + svc + ".tls.certresolver", "default");
if (config.certResolver() != null && !config.certResolver().isBlank()) {
labels.put("traefik.http.routers." + svc + ".tls.certresolver",
config.certResolver());
}
}
return labels;

View File

@@ -122,6 +122,14 @@ public class ClickHouseLogStore implements LogIndex {
baseParams.add(request.instanceId());
}
if (!request.instanceIds().isEmpty()) {
String placeholders = String.join(", ", Collections.nCopies(request.instanceIds().size(), "?"));
baseConditions.add("instance_id IN (" + placeholders + ")");
for (String id : request.instanceIds()) {
baseParams.add(id);
}
}
if (request.exchangeId() != null && !request.exchangeId().isEmpty()) {
baseConditions.add("(exchange_id = ?" +
" OR (mapContains(mdc, 'cameleer.exchangeId') AND mdc['cameleer.exchangeId'] = ?)" +
@@ -281,6 +289,14 @@ public class ClickHouseLogStore implements LogIndex {
params.add(request.instanceId());
}
if (!request.instanceIds().isEmpty()) {
String placeholders = String.join(", ", Collections.nCopies(request.instanceIds().size(), "?"));
conditions.add("instance_id IN (" + placeholders + ")");
for (String id : request.instanceIds()) {
params.add(id);
}
}
if (request.exchangeId() != null && !request.exchangeId().isEmpty()) {
conditions.add("(exchange_id = ?" +
" OR (mapContains(mdc, 'cameleer.exchangeId') AND mdc['cameleer.exchangeId'] = ?)" +

View File

@@ -1,6 +1,7 @@
package com.cameleer.server.app.search;
import com.cameleer.server.core.alerting.AlertMatchSpec;
import com.cameleer.server.core.search.AttributeFilter;
import com.cameleer.server.core.search.ExecutionSummary;
import com.cameleer.server.core.search.SearchRequest;
import com.cameleer.server.core.search.SearchResult;
@@ -256,6 +257,23 @@ public class ClickHouseSearchIndex implements SearchIndex {
params.add(likeTerm);
}
// Structured attribute filters. Keys were validated at AttributeFilter construction
// time against ^[a-zA-Z0-9._-]+$ so they are safe to single-quote-inline; the JSON path
// argument of JSONExtractString does not accept a ? placeholder in ClickHouse JDBC
// (same constraint as countExecutionsForAlerting below). Values are parameter-bound.
for (AttributeFilter filter : request.attributeFilters()) {
String escapedKey = filter.key().replace("'", "\\'");
if (filter.isKeyOnly()) {
conditions.add("JSONHas(attributes, '" + escapedKey + "')");
} else if (filter.isWildcard()) {
conditions.add("JSONExtractString(attributes, '" + escapedKey + "') LIKE ?");
params.add(filter.toLikePattern());
} else {
conditions.add("JSONExtractString(attributes, '" + escapedKey + "') = ?");
params.add(filter.value());
}
}
return String.join(" AND ", conditions);
}

View File

@@ -16,8 +16,6 @@ import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.sql.Timestamp;
import java.time.Instant;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HexFormat;
import java.util.List;
@@ -57,6 +55,12 @@ public class ClickHouseDiagramStore implements DiagramStore {
ORDER BY created_at DESC LIMIT 1
""";
private static final String SELECT_HASH_FOR_APP_ROUTE = """
SELECT content_hash FROM route_diagrams
WHERE tenant_id = ? AND application_id = ? AND environment = ? AND route_id = ?
ORDER BY created_at DESC LIMIT 1
""";
private static final String SELECT_DEFINITIONS_FOR_APP = """
SELECT DISTINCT route_id, definition FROM route_diagrams
WHERE tenant_id = ? AND application_id = ? AND environment = ?
@@ -68,6 +72,8 @@ public class ClickHouseDiagramStore implements DiagramStore {
// (routeId + "\0" + instanceId) → contentHash
private final ConcurrentHashMap<String, String> hashCache = new ConcurrentHashMap<>();
// (applicationId + "\0" + environment + "\0" + routeId) → most recent contentHash
private final ConcurrentHashMap<String, String> appRouteHashCache = new ConcurrentHashMap<>();
// contentHash → deserialized RouteGraph
private final ConcurrentHashMap<String, RouteGraph> graphCache = new ConcurrentHashMap<>();
@@ -92,12 +98,37 @@ public class ClickHouseDiagramStore implements DiagramStore {
} catch (Exception e) {
log.warn("Failed to warm diagram hash cache — lookups will fall back to ClickHouse: {}", e.getMessage());
}
try {
jdbc.query(
"SELECT application_id, environment, route_id, " +
"argMax(content_hash, created_at) AS content_hash " +
"FROM route_diagrams WHERE tenant_id = ? " +
"GROUP BY application_id, environment, route_id",
rs -> {
String key = appRouteCacheKey(
rs.getString("application_id"),
rs.getString("environment"),
rs.getString("route_id"));
appRouteHashCache.put(key, rs.getString("content_hash"));
},
tenantId);
log.info("Diagram app-route cache warmed: {} entries", appRouteHashCache.size());
} catch (Exception e) {
log.warn("Failed to warm diagram app-route cache — lookups will fall back to ClickHouse: {}", e.getMessage());
}
}
private static String cacheKey(String routeId, String instanceId) {
return routeId + "\0" + instanceId;
}
private static String appRouteCacheKey(String applicationId, String environment, String routeId) {
return (applicationId != null ? applicationId : "") + "\0"
+ (environment != null ? environment : "") + "\0"
+ (routeId != null ? routeId : "");
}
@Override
public void store(TaggedDiagram diagram) {
try {
@@ -122,6 +153,7 @@ public class ClickHouseDiagramStore implements DiagramStore {
// Update caches
hashCache.put(cacheKey(routeId, agentId), contentHash);
appRouteHashCache.put(appRouteCacheKey(applicationId, environment, routeId), contentHash);
graphCache.put(contentHash, graph);
log.debug("Stored diagram for route={} agent={} with hash={}", routeId, agentId, contentHash);
@@ -170,33 +202,29 @@ public class ClickHouseDiagramStore implements DiagramStore {
}
@Override
public Optional<String> findContentHashForRouteByAgents(String routeId, List<String> agentIds) {
if (agentIds == null || agentIds.isEmpty()) {
public Optional<String> findLatestContentHashForAppRoute(String applicationId,
String routeId,
String environment) {
if (applicationId == null || applicationId.isBlank()
|| routeId == null || routeId.isBlank()
|| environment == null || environment.isBlank()) {
return Optional.empty();
}
// Try cache first — return first hit
for (String agentId : agentIds) {
String cached = hashCache.get(cacheKey(routeId, agentId));
if (cached != null) {
return Optional.of(cached);
}
String key = appRouteCacheKey(applicationId, environment, routeId);
String cached = appRouteHashCache.get(key);
if (cached != null) {
return Optional.of(cached);
}
// Fall back to ClickHouse
String placeholders = String.join(", ", Collections.nCopies(agentIds.size(), "?"));
String sql = "SELECT content_hash FROM route_diagrams " +
"WHERE tenant_id = ? AND route_id = ? AND instance_id IN (" + placeholders + ") " +
"ORDER BY created_at DESC LIMIT 1";
var params = new ArrayList<Object>();
params.add(tenantId);
params.add(routeId);
params.addAll(agentIds);
List<Map<String, Object>> rows = jdbc.queryForList(sql, params.toArray());
List<Map<String, Object>> rows = jdbc.queryForList(
SELECT_HASH_FOR_APP_ROUTE, tenantId, applicationId, environment, routeId);
if (rows.isEmpty()) {
return Optional.empty();
}
return Optional.of((String) rows.get(0).get("content_hash"));
String hash = (String) rows.get(0).get("content_hash");
appRouteHashCache.put(key, hash);
return Optional.of(hash);
}
@Override

View File

@@ -0,0 +1,408 @@
package com.cameleer.server.app.storage;
import com.cameleer.server.core.storage.ServerMetricsQueryStore;
import com.cameleer.server.core.storage.model.ServerInstanceInfo;
import com.cameleer.server.core.storage.model.ServerMetricCatalogEntry;
import com.cameleer.server.core.storage.model.ServerMetricPoint;
import com.cameleer.server.core.storage.model.ServerMetricQueryRequest;
import com.cameleer.server.core.storage.model.ServerMetricQueryResponse;
import com.cameleer.server.core.storage.model.ServerMetricSeries;
import org.springframework.jdbc.core.JdbcTemplate;
import java.sql.Array;
import java.sql.Timestamp;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
import java.util.regex.Pattern;
/**
* ClickHouse-backed {@link ServerMetricsQueryStore}.
*
* <p>Safety rules for every query:
* <ul>
* <li>tenant_id always bound as a parameter — no cross-tenant reads.</li>
* <li>Identifier-like inputs (metric name, statistic, tag keys,
* aggregation, mode) are regex-validated. Tag keys flow through the
* query as JDBC parameter-bound values of {@code tags[?]} map lookups,
* so even with a "safe" regex they cannot inject SQL.</li>
* <li>Literal values ({@code from}, {@code to}, tag filter values,
* server_instance_id allow-list) always go through {@code ?}.</li>
* <li>The time range is capped at {@link #MAX_RANGE}.</li>
* <li>Result cardinality is capped at {@link #MAX_SERIES} series.</li>
* </ul>
*/
public class ClickHouseServerMetricsQueryStore implements ServerMetricsQueryStore {
private static final Pattern SAFE_IDENTIFIER = Pattern.compile("^[a-zA-Z0-9._]+$");
private static final Pattern SAFE_STATISTIC = Pattern.compile("^[a-z_]+$");
private static final Set<String> AGGREGATIONS = Set.of("avg", "sum", "max", "min", "latest");
private static final Set<String> MODES = Set.of("raw", "delta");
/** Maximum {@code to - from} window accepted by the API. */
static final Duration MAX_RANGE = Duration.ofDays(31);
/** Clamp bounds and default for {@code stepSeconds}. */
static final int MIN_STEP = 10;
static final int MAX_STEP = 3600;
static final int DEFAULT_STEP = 60;
/** Defence against group-by explosion — limit the series count per response. */
static final int MAX_SERIES = 500;
private final String tenantId;
private final JdbcTemplate jdbc;
public ClickHouseServerMetricsQueryStore(String tenantId, JdbcTemplate jdbc) {
this.tenantId = tenantId;
this.jdbc = jdbc;
}
// ── catalog ─────────────────────────────────────────────────────────
@Override
public List<ServerMetricCatalogEntry> catalog(Instant from, Instant to) {
requireRange(from, to);
String sql = """
SELECT
metric_name,
any(metric_type) AS metric_type,
arraySort(groupUniqArray(statistic)) AS statistics,
arraySort(arrayDistinct(arrayFlatten(groupArray(mapKeys(tags))))) AS tag_keys
FROM server_metrics
WHERE tenant_id = ?
AND collected_at >= ?
AND collected_at < ?
GROUP BY metric_name
ORDER BY metric_name
""";
return jdbc.query(sql, (rs, n) -> new ServerMetricCatalogEntry(
rs.getString("metric_name"),
rs.getString("metric_type"),
arrayToStringList(rs.getArray("statistics")),
arrayToStringList(rs.getArray("tag_keys"))
), tenantId, Timestamp.from(from), Timestamp.from(to));
}
// ── instances ───────────────────────────────────────────────────────
@Override
public List<ServerInstanceInfo> listInstances(Instant from, Instant to) {
requireRange(from, to);
String sql = """
SELECT
server_instance_id,
min(collected_at) AS first_seen,
max(collected_at) AS last_seen
FROM server_metrics
WHERE tenant_id = ?
AND collected_at >= ?
AND collected_at < ?
GROUP BY server_instance_id
ORDER BY last_seen DESC
""";
return jdbc.query(sql, (rs, n) -> new ServerInstanceInfo(
rs.getString("server_instance_id"),
rs.getTimestamp("first_seen").toInstant(),
rs.getTimestamp("last_seen").toInstant()
), tenantId, Timestamp.from(from), Timestamp.from(to));
}
// ── query ───────────────────────────────────────────────────────────
@Override
public ServerMetricQueryResponse query(ServerMetricQueryRequest request) {
if (request == null) throw new IllegalArgumentException("request is required");
String metric = requireSafeIdentifier(request.metric(), "metric");
requireRange(request.from(), request.to());
String aggregation = request.aggregation() != null ? request.aggregation().toLowerCase() : "avg";
if (!AGGREGATIONS.contains(aggregation)) {
throw new IllegalArgumentException("aggregation must be one of " + AGGREGATIONS);
}
String mode = request.mode() != null ? request.mode().toLowerCase() : "raw";
if (!MODES.contains(mode)) {
throw new IllegalArgumentException("mode must be one of " + MODES);
}
int step = request.stepSeconds() != null ? request.stepSeconds() : DEFAULT_STEP;
if (step < MIN_STEP || step > MAX_STEP) {
throw new IllegalArgumentException(
"stepSeconds must be in [" + MIN_STEP + "," + MAX_STEP + "]");
}
String statistic = request.statistic();
if (statistic != null && !SAFE_STATISTIC.matcher(statistic).matches()) {
throw new IllegalArgumentException("statistic contains unsafe characters");
}
List<String> groupByTags = request.groupByTags() != null
? request.groupByTags() : List.of();
for (String t : groupByTags) requireSafeIdentifier(t, "groupByTag");
Map<String, String> filterTags = request.filterTags() != null
? request.filterTags() : Map.of();
for (String t : filterTags.keySet()) requireSafeIdentifier(t, "filterTag key");
List<String> instanceAllowList = request.serverInstanceIds() != null
? request.serverInstanceIds() : List.of();
boolean isDelta = "delta".equals(mode);
boolean isMean = "mean".equals(statistic);
String sql = isDelta
? buildDeltaSql(step, groupByTags, filterTags, instanceAllowList, statistic, isMean)
: buildRawSql(step, groupByTags, filterTags, instanceAllowList,
statistic, aggregation, isMean);
List<Object> params = buildParams(groupByTags, metric, statistic, isMean,
request.from(), request.to(),
filterTags, instanceAllowList);
List<Row> rows = jdbc.query(sql, (rs, n) -> {
int idx = 1;
Instant bucket = rs.getTimestamp(idx++).toInstant();
List<String> tagValues = new ArrayList<>(groupByTags.size());
for (int g = 0; g < groupByTags.size(); g++) {
tagValues.add(rs.getString(idx++));
}
double value = rs.getDouble(idx);
return new Row(bucket, tagValues, value);
}, params.toArray());
return assembleSeries(rows, metric, statistic, aggregation, mode, step, groupByTags);
}
// ── SQL builders ────────────────────────────────────────────────────
/**
* Builds a single-pass SQL for raw mode:
* <pre>{@code
* SELECT bucket, tag0, ..., <agg>(metric_value) AS value
* FROM server_metrics WHERE ...
* GROUP BY bucket, tag0, ...
* ORDER BY bucket, tag0, ...
* }</pre>
* For {@code statistic=mean}, replaces the aggregate with
* {@code sumIf(value, statistic IN ('total','total_time')) / nullIf(sumIf(value, statistic='count'), 0)}.
*/
private String buildRawSql(int step, List<String> groupByTags,
Map<String, String> filterTags,
List<String> instanceAllowList,
String statistic, String aggregation, boolean isMean) {
StringBuilder s = new StringBuilder(512);
s.append("SELECT\n toDateTime64(toStartOfInterval(collected_at, INTERVAL ")
.append(step).append(" SECOND), 3) AS bucket");
for (int i = 0; i < groupByTags.size(); i++) {
s.append(",\n tags[?] AS tag").append(i);
}
s.append(",\n ").append(isMean ? meanExpr() : scalarAggExpr(aggregation))
.append(" AS value\nFROM server_metrics\n");
appendWhereClause(s, filterTags, instanceAllowList, statistic, isMean);
s.append("GROUP BY bucket");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
s.append("\nORDER BY bucket");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
return s.toString();
}
/**
* Builds a three-level SQL for delta mode. Inner fills one
* (bucket, instance, tag-group) row via {@code max(metric_value)};
* middle computes positive-clipped per-instance differences via a
* window function; outer sums across instances.
*/
private String buildDeltaSql(int step, List<String> groupByTags,
Map<String, String> filterTags,
List<String> instanceAllowList,
String statistic, boolean isMean) {
StringBuilder s = new StringBuilder(1024);
s.append("SELECT bucket");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
s.append(", sum(delta) AS value FROM (\n");
// Middle: per-instance positive-clipped delta using window.
s.append(" SELECT bucket");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
s.append(", server_instance_id, greatest(0, value - coalesce(any(value) OVER (")
.append("PARTITION BY server_instance_id");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
s.append(" ORDER BY bucket ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), value)) AS delta FROM (\n");
// Inner: one representative value per (bucket, instance, tag-group).
s.append(" SELECT\n toDateTime64(toStartOfInterval(collected_at, INTERVAL ")
.append(step).append(" SECOND), 3) AS bucket,\n server_instance_id");
for (int i = 0; i < groupByTags.size(); i++) {
s.append(",\n tags[?] AS tag").append(i);
}
s.append(",\n ").append(isMean ? meanExpr() : "max(metric_value)")
.append(" AS value\n FROM server_metrics\n");
appendWhereClause(s, filterTags, instanceAllowList, statistic, isMean);
s.append(" GROUP BY bucket, server_instance_id");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
s.append("\n ) AS bucketed\n) AS deltas\n");
s.append("GROUP BY bucket");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
s.append("\nORDER BY bucket");
for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
return s.toString();
}
/**
* WHERE clause shared by both raw and delta SQL shapes. Appended at the
* correct indent under either the single {@code FROM server_metrics}
* (raw) or the innermost one (delta).
*/
private void appendWhereClause(StringBuilder s, Map<String, String> filterTags,
List<String> instanceAllowList,
String statistic, boolean isMean) {
s.append(" WHERE tenant_id = ?\n")
.append(" AND metric_name = ?\n");
if (isMean) {
s.append(" AND statistic IN ('count', 'total', 'total_time')\n");
} else if (statistic != null) {
s.append(" AND statistic = ?\n");
}
s.append(" AND collected_at >= ?\n")
.append(" AND collected_at < ?\n");
for (int i = 0; i < filterTags.size(); i++) {
s.append(" AND tags[?] = ?\n");
}
if (!instanceAllowList.isEmpty()) {
s.append(" AND server_instance_id IN (")
.append("?,".repeat(instanceAllowList.size() - 1)).append("?)\n");
}
}
/**
* SQL-positional params for both raw and delta queries (same relative
* order because the WHERE clause is emitted by {@link #appendWhereClause}
* only once, with the {@code tags[?]} select-list placeholders appearing
* earlier in the SQL text).
*/
private List<Object> buildParams(List<String> groupByTags, String metric,
String statistic, boolean isMean,
Instant from, Instant to,
Map<String, String> filterTags,
List<String> instanceAllowList) {
List<Object> params = new ArrayList<>();
// SELECT-list tags[?] placeholders
params.addAll(groupByTags);
// WHERE
params.add(tenantId);
params.add(metric);
if (!isMean && statistic != null) params.add(statistic);
params.add(Timestamp.from(from));
params.add(Timestamp.from(to));
for (Map.Entry<String, String> e : filterTags.entrySet()) {
params.add(e.getKey());
params.add(e.getValue());
}
params.addAll(instanceAllowList);
return params;
}
private static String scalarAggExpr(String aggregation) {
return switch (aggregation) {
case "avg" -> "avg(metric_value)";
case "sum" -> "sum(metric_value)";
case "max" -> "max(metric_value)";
case "min" -> "min(metric_value)";
case "latest" -> "argMax(metric_value, collected_at)";
default -> throw new IllegalStateException("unreachable: " + aggregation);
};
}
private static String meanExpr() {
return "sumIf(metric_value, statistic IN ('total', 'total_time'))"
+ " / nullIf(sumIf(metric_value, statistic = 'count'), 0)";
}
// ── response assembly ───────────────────────────────────────────────
private ServerMetricQueryResponse assembleSeries(
List<Row> rows, String metric, String statistic,
String aggregation, String mode, int step, List<String> groupByTags) {
Map<List<String>, List<ServerMetricPoint>> bySignature = new LinkedHashMap<>();
for (Row r : rows) {
if (Double.isNaN(r.value) || Double.isInfinite(r.value)) continue;
bySignature.computeIfAbsent(r.tagValues, k -> new ArrayList<>())
.add(new ServerMetricPoint(r.bucket, r.value));
}
if (bySignature.size() > MAX_SERIES) {
throw new IllegalArgumentException(
"query produced " + bySignature.size()
+ " series; reduce groupByTags or tighten filterTags (max "
+ MAX_SERIES + ")");
}
List<ServerMetricSeries> series = new ArrayList<>(bySignature.size());
for (Map.Entry<List<String>, List<ServerMetricPoint>> e : bySignature.entrySet()) {
Map<String, String> tags = new LinkedHashMap<>();
for (int i = 0; i < groupByTags.size(); i++) {
tags.put(groupByTags.get(i), e.getKey().get(i));
}
series.add(new ServerMetricSeries(Collections.unmodifiableMap(tags), e.getValue()));
}
return new ServerMetricQueryResponse(metric,
statistic != null ? statistic : "value",
aggregation, mode, step, series);
}
// ── helpers ─────────────────────────────────────────────────────────
private static void requireRange(Instant from, Instant to) {
if (from == null || to == null) {
throw new IllegalArgumentException("from and to are required");
}
if (!from.isBefore(to)) {
throw new IllegalArgumentException("from must be strictly before to");
}
if (Duration.between(from, to).compareTo(MAX_RANGE) > 0) {
throw new IllegalArgumentException(
"time range exceeds maximum of " + MAX_RANGE.toDays() + " days");
}
}
private static String requireSafeIdentifier(String value, String field) {
if (value == null || value.isBlank()) {
throw new IllegalArgumentException(field + " is required");
}
if (!SAFE_IDENTIFIER.matcher(value).matches()) {
throw new IllegalArgumentException(
field + " contains unsafe characters (allowed: [a-zA-Z0-9._])");
}
return value;
}
private static List<String> arrayToStringList(Array array) {
if (array == null) return List.of();
try {
Object[] values = (Object[]) array.getArray();
Set<String> sorted = new TreeSet<>();
for (Object v : values) {
if (v != null) sorted.add(v.toString());
}
return List.copyOf(sorted);
} catch (Exception e) {
return List.of();
} finally {
try { array.free(); } catch (Exception ignore) { }
}
}
private record Row(Instant bucket, List<String> tagValues, double value) {
}
}

View File

@@ -0,0 +1,46 @@
package com.cameleer.server.app.storage;
import com.cameleer.server.core.storage.ServerMetricsStore;
import com.cameleer.server.core.storage.model.ServerMetricSample;
import org.springframework.jdbc.core.JdbcTemplate;
import java.sql.Timestamp;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class ClickHouseServerMetricsStore implements ServerMetricsStore {
private final JdbcTemplate jdbc;
public ClickHouseServerMetricsStore(JdbcTemplate jdbc) {
this.jdbc = jdbc;
}
@Override
public void insertBatch(List<ServerMetricSample> samples) {
if (samples.isEmpty()) return;
jdbc.batchUpdate("""
INSERT INTO server_metrics
(tenant_id, collected_at, server_instance_id, metric_name,
metric_type, statistic, metric_value, tags)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
samples.stream().map(s -> new Object[]{
s.tenantId(),
Timestamp.from(s.collectedAt()),
s.serverInstanceId(),
s.metricName(),
s.metricType(),
s.statistic(),
s.value(),
tagsToClickHouseMap(s.tags())
}).toList());
}
private Map<String, String> tagsToClickHouseMap(Map<String, String> tags) {
if (tags == null || tags.isEmpty()) return new HashMap<>();
return new HashMap<>(tags);
}
}

View File

@@ -22,7 +22,7 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
private static final String SELECT_COLS =
"id, app_id, app_version_id, environment_id, status, target_state, deployment_strategy, " +
"replica_states, deploy_stage, container_id, container_name, error_message, " +
"resolved_config, deployed_config_snapshot, deployed_at, stopped_at, created_at";
"resolved_config, deployed_config_snapshot, deployed_at, stopped_at, created_at, created_by";
private final JdbcTemplate jdbc;
private final ObjectMapper objectMapper;
@@ -81,10 +81,10 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
}
@Override
public UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName) {
public UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName, String createdBy) {
UUID id = UUID.randomUUID();
jdbc.update("INSERT INTO deployments (id, app_id, app_version_id, environment_id, container_name) VALUES (?, ?, ?, ?, ?)",
id, appId, appVersionId, environmentId, containerName);
jdbc.update("INSERT INTO deployments (id, app_id, app_version_id, environment_id, container_name, created_by) VALUES (?, ?, ?, ?, ?, ?)",
id, appId, appVersionId, environmentId, containerName, createdBy);
return id;
}
@@ -126,8 +126,8 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
}
@Override
public void deleteTerminalByAppAndEnvironment(UUID appId, UUID environmentId) {
jdbc.update("DELETE FROM deployments WHERE app_id = ? AND environment_id = ? AND status IN ('STOPPED', 'FAILED')",
public void deleteFailedByAppAndEnvironment(UUID appId, UUID environmentId) {
jdbc.update("DELETE FROM deployments WHERE app_id = ? AND environment_id = ? AND status = 'FAILED'",
appId, environmentId);
}
@@ -216,7 +216,8 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
deployedConfigSnapshot,
deployedAt != null ? deployedAt.toInstant() : null,
stoppedAt != null ? stoppedAt.toInstant() : null,
rs.getTimestamp("created_at").toInstant()
rs.getTimestamp("created_at").toInstant(),
rs.getString("created_by")
);
}
}

View File

@@ -55,6 +55,7 @@ cameleer:
routingmode: ${CAMELEER_SERVER_RUNTIME_ROUTINGMODE:path}
routingdomain: ${CAMELEER_SERVER_RUNTIME_ROUTINGDOMAIN:localhost}
serverurl: ${CAMELEER_SERVER_RUNTIME_SERVERURL:}
certresolver: ${CAMELEER_SERVER_RUNTIME_CERTRESOLVER:}
jardockervolume: ${CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME:}
indexer:
debouncems: ${CAMELEER_SERVER_INDEXER_DEBOUNCEMS:2000}
@@ -111,6 +112,10 @@ cameleer:
url: ${CAMELEER_SERVER_CLICKHOUSE_URL:jdbc:clickhouse://localhost:8123/cameleer}
username: ${CAMELEER_SERVER_CLICKHOUSE_USERNAME:default}
password: ${CAMELEER_SERVER_CLICKHOUSE_PASSWORD:}
self-metrics:
enabled: ${CAMELEER_SERVER_SELFMETRICS_ENABLED:true}
interval-ms: ${CAMELEER_SERVER_SELFMETRICS_INTERVALMS:60000}
instance-id: ${CAMELEER_SERVER_INSTANCE_ID:}
springdoc:
api-docs:

View File

@@ -401,6 +401,29 @@ CREATE TABLE IF NOT EXISTS route_catalog (
ENGINE = ReplacingMergeTree(last_seen)
ORDER BY (tenant_id, environment, application_id, route_id);
-- ── Server Self-Metrics ────────────────────────────────────────────────
-- Periodic snapshot of the server's own Micrometer registry (written by
-- ServerMetricsSnapshotScheduler). No `environment` column — the server
-- straddles environments. `statistic` distinguishes Timer/DistributionSummary
-- sub-measurements (count, total_time, max, mean) from plain counter/gauge values.
CREATE TABLE IF NOT EXISTS server_metrics (
tenant_id LowCardinality(String) DEFAULT 'default',
collected_at DateTime64(3),
server_instance_id LowCardinality(String),
metric_name LowCardinality(String),
metric_type LowCardinality(String),
statistic LowCardinality(String) DEFAULT 'value',
metric_value Float64,
tags Map(String, String) DEFAULT map(),
server_received_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
PARTITION BY (tenant_id, toYYYYMM(collected_at))
ORDER BY (tenant_id, collected_at, server_instance_id, metric_name, statistic)
TTL toDateTime(collected_at) + INTERVAL 90 DAY DELETE
SETTINGS index_granularity = 8192;
-- insert_id tiebreak for keyset pagination (fixes same-millisecond cursor collision).
-- IF NOT EXISTS on ADD COLUMN is idempotent. MATERIALIZE COLUMN is a background mutation,
-- effectively a no-op once all parts are already materialized.

View File

@@ -0,0 +1,8 @@
-- V4: add created_by column to deployments for audit trail
-- Captures which user initiated a deployment. Nullable for backwards compatibility;
-- pre-V4 historical deployments will have NULL.
ALTER TABLE deployments
ADD COLUMN created_by TEXT REFERENCES users(user_id);
CREATE INDEX idx_deployments_created_by ON deployments (created_by);

View File

@@ -21,10 +21,12 @@ public abstract class AbstractPostgresIT {
postgres = new PostgreSQLContainer<>("postgres:16")
.withDatabaseName("cameleer")
.withUsername("cameleer")
.withPassword("test");
.withPassword("test")
.withReuse(true);
postgres.start();
clickhouse = new ClickHouseContainer("clickhouse/clickhouse-server:24.12");
clickhouse = new ClickHouseContainer("clickhouse/clickhouse-server:24.12")
.withReuse(true);
clickhouse.start();
}

View File

@@ -48,7 +48,7 @@ class DeploymentStateEvaluatorTest {
private Deployment deployment(DeploymentStatus status) {
return new Deployment(DEP_ID, APP_ID, UUID.randomUUID(), ENV_ID, status,
null, null, List.of(), null, null, "orders-0", null,
Map.of(), null, NOW.minusSeconds(60), null, NOW.minusSeconds(120));
Map.of(), null, NOW.minusSeconds(60), null, NOW.minusSeconds(120), "test-user");
}
@Test

View File

@@ -52,10 +52,14 @@ class SchemaBootstrapIT extends AbstractPostgresIT {
@Test
void alerting_enums_exist() {
// Scope to current schema's namespace — Testcontainers reuse can otherwise
// expose enums from a previous run's tenant_default schema alongside public.
var enums = jdbcTemplate.queryForList("""
SELECT typname FROM pg_type
WHERE typname IN ('severity_enum','condition_kind_enum','alert_state_enum',
'target_kind_enum','notification_status_enum')
SELECT t.typname FROM pg_type t
JOIN pg_namespace n ON n.oid = t.typnamespace
WHERE t.typname IN ('severity_enum','condition_kind_enum','alert_state_enum',
'target_kind_enum','notification_status_enum')
AND n.nspname = current_schema()
""", String.class);
assertThat(enums).containsExactlyInAnyOrder(
"severity_enum", "condition_kind_enum", "alert_state_enum",
@@ -86,6 +90,7 @@ class SchemaBootstrapIT extends AbstractPostgresIT {
SELECT column_name FROM information_schema.columns
WHERE table_name = 'alert_instances'
AND column_name IN ('read_at','deleted_at')
AND table_schema = current_schema()
""", String.class);
assertThat(cols).containsExactlyInAnyOrder("read_at", "deleted_at");
}
@@ -96,13 +101,16 @@ class SchemaBootstrapIT extends AbstractPostgresIT {
SELECT COUNT(*)::int FROM pg_indexes
WHERE indexname = 'alert_instances_open_rule_uq'
AND tablename = 'alert_instances'
AND schemaname = current_schema()
""", Integer.class);
assertThat(count).isEqualTo(1);
Boolean isUnique = jdbcTemplate.queryForObject("""
SELECT indisunique FROM pg_index
JOIN pg_class ON pg_class.oid = pg_index.indexrelid
WHERE pg_class.relname = 'alert_instances_open_rule_uq'
JOIN pg_class c ON c.oid = pg_index.indexrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = 'alert_instances_open_rule_uq'
AND n.nspname = current_schema()
""", Boolean.class);
assertThat(isUnique).isTrue();
}

View File

@@ -65,6 +65,10 @@ class AppDirtyStateIT extends AbstractPostgresIT {
jdbcTemplate.update("DELETE FROM app_versions");
jdbcTemplate.update("DELETE FROM apps");
jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
// Ensure test-operator exists in users table (required for deployments.created_by FK)
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
}
// -----------------------------------------------------------------------

View File

@@ -0,0 +1,253 @@
package com.cameleer.server.app.controller;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.TestSecurityHelper;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.web.client.TestRestTemplate;
import org.springframework.core.io.ByteArrayResource;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.util.MultiValueMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class DeploymentControllerAuditIT extends AbstractPostgresIT {
@Autowired
private TestRestTemplate restTemplate;
@Autowired
private ObjectMapper objectMapper;
@Autowired
private TestSecurityHelper securityHelper;
private String aliceJwt;
private String adminJwt;
private String appSlug;
private String versionId;
@BeforeEach
void setUp() throws Exception {
// Mint JWT for alice (OPERATOR) — subject must start with "user:" for JwtAuthenticationFilter
aliceJwt = securityHelper.createToken("user:alice", "user", List.of("OPERATOR"));
adminJwt = securityHelper.adminToken();
// Clean up deployment-related tables and test-created environments
jdbcTemplate.update("DELETE FROM deployments");
jdbcTemplate.update("DELETE FROM app_versions");
jdbcTemplate.update("DELETE FROM apps");
jdbcTemplate.update("DELETE FROM environments WHERE slug LIKE 'promote-target-%'");
jdbcTemplate.update("DELETE FROM audit_log");
// Ensure alice exists in the users table (required for deployments.created_by FK)
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, display_name) VALUES ('alice', 'local', 'Alice Test') ON CONFLICT (user_id) DO NOTHING");
// Create app in the seeded "default" environment
appSlug = "audit-test-" + UUID.randomUUID().toString().substring(0, 8);
String appJson = String.format("""
{"slug": "%s", "displayName": "Audit Test App"}
""", appSlug);
ResponseEntity<String> appResponse = restTemplate.exchange(
"/api/v1/environments/default/apps", HttpMethod.POST,
new HttpEntity<>(appJson, authHeaders(aliceJwt)),
String.class);
assertThat(appResponse.getStatusCode()).isEqualTo(HttpStatus.CREATED);
// Upload a JAR version
byte[] jarContent = "fake-jar-for-audit-test".getBytes();
ByteArrayResource resource = new ByteArrayResource(jarContent) {
@Override
public String getFilename() {
return "audit-test.jar";
}
};
MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
body.add("file", resource);
HttpHeaders headers = new HttpHeaders();
headers.set("Authorization", "Bearer " + aliceJwt);
headers.set("X-Cameleer-Protocol-Version", "1");
headers.setContentType(MediaType.MULTIPART_FORM_DATA);
ResponseEntity<String> versionResponse = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/versions", HttpMethod.POST,
new HttpEntity<>(body, headers),
String.class);
assertThat(versionResponse.getStatusCode().is2xxSuccessful()).isTrue();
versionId = objectMapper.readTree(versionResponse.getBody()).path("id").asText();
}
@Test
void deploy_writes_audit_row_with_DEPLOYMENT_category_and_alice_actor() throws Exception {
String json = String.format("""
{"appVersionId": "%s"}
""", versionId);
ResponseEntity<String> response = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
new HttpEntity<>(json, authHeaders(aliceJwt)),
String.class);
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
Map<String, Object> row = queryAuditRow("deploy_app");
assertThat(row).isNotNull();
assertThat(row.get("username")).isEqualTo("alice");
assertThat(row.get("action")).isEqualTo("deploy_app");
assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
assertThat(row.get("result")).isEqualTo("SUCCESS");
assertThat(row.get("target")).isNotNull();
assertThat(row.get("target").toString()).isNotBlank();
}
@Test
void stop_writes_audit_row() throws Exception {
// First deploy
String deployJson = String.format("""
{"appVersionId": "%s"}
""", versionId);
ResponseEntity<String> deployResponse = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
new HttpEntity<>(deployJson, authHeaders(aliceJwt)),
String.class);
assertThat(deployResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
String deploymentId = objectMapper.readTree(deployResponse.getBody()).path("id").asText();
// Clear audit log to isolate stop audit row
jdbcTemplate.update("DELETE FROM audit_log");
// Stop the deployment
ResponseEntity<String> stopResponse = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/deployments/" + deploymentId + "/stop",
HttpMethod.POST,
new HttpEntity<>(authHeadersNoBody(aliceJwt)),
String.class);
assertThat(stopResponse.getStatusCode()).isEqualTo(HttpStatus.OK);
Map<String, Object> row = queryAuditRow("stop_deployment");
assertThat(row).isNotNull();
assertThat(row.get("username")).isEqualTo("alice");
assertThat(row.get("action")).isEqualTo("stop_deployment");
assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
assertThat(row.get("result")).isEqualTo("SUCCESS");
assertThat(row.get("target").toString()).isEqualTo(deploymentId);
}
@Test
void promote_writes_audit_row() throws Exception {
// Create a second environment for promotion target
String targetEnvSlug = "promote-target-" + UUID.randomUUID().toString().substring(0, 8);
String envJson = String.format("""
{"slug": "%s", "displayName": "Promote Target Env"}
""", targetEnvSlug);
ResponseEntity<String> envResponse = restTemplate.exchange(
"/api/v1/admin/environments", HttpMethod.POST,
new HttpEntity<>(envJson, authHeaders(adminJwt)),
String.class);
assertThat(envResponse.getStatusCode()).isEqualTo(HttpStatus.CREATED);
// Create the same app slug in the target environment
String appJson = String.format("""
{"slug": "%s", "displayName": "Audit Test App (target)"}
""", appSlug);
ResponseEntity<String> targetAppResponse = restTemplate.exchange(
"/api/v1/environments/" + targetEnvSlug + "/apps", HttpMethod.POST,
new HttpEntity<>(appJson, authHeaders(aliceJwt)),
String.class);
assertThat(targetAppResponse.getStatusCode()).isEqualTo(HttpStatus.CREATED);
// Deploy in source (default) env
String deployJson = String.format("""
{"appVersionId": "%s"}
""", versionId);
ResponseEntity<String> deployResponse = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
new HttpEntity<>(deployJson, authHeaders(aliceJwt)),
String.class);
assertThat(deployResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
String deploymentId = objectMapper.readTree(deployResponse.getBody()).path("id").asText();
// Clear audit log to isolate promote audit row
jdbcTemplate.update("DELETE FROM audit_log");
// Promote to target env
String promoteJson = String.format("""
{"targetEnvironment": "%s"}
""", targetEnvSlug);
ResponseEntity<String> promoteResponse = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/deployments/" + deploymentId + "/promote",
HttpMethod.POST,
new HttpEntity<>(promoteJson, authHeaders(aliceJwt)),
String.class);
assertThat(promoteResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
Map<String, Object> row = queryAuditRow("promote_deployment");
assertThat(row).isNotNull();
assertThat(row.get("username")).isEqualTo("alice");
assertThat(row.get("action")).isEqualTo("promote_deployment");
assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
assertThat(row.get("result")).isEqualTo("SUCCESS");
assertThat(row.get("target")).isNotNull();
assertThat(row.get("target").toString()).isNotBlank();
}
@Test
void deploy_with_unknown_appVersion_writes_FAILURE_audit_row() throws Exception {
String unknownVersionId = UUID.randomUUID().toString();
String json = String.format("""
{"appVersionId": "%s"}
""", unknownVersionId);
ResponseEntity<String> response = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
new HttpEntity<>(json, authHeaders(aliceJwt)),
String.class);
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.NOT_FOUND);
Map<String, Object> row = queryAuditRow("deploy_app");
assertThat(row).isNotNull();
assertThat(row.get("username")).isEqualTo("alice");
assertThat(row.get("action")).isEqualTo("deploy_app");
assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
assertThat(row.get("result")).isEqualTo("FAILURE");
}
// ---- helpers ----
private HttpHeaders authHeaders(String jwt) {
HttpHeaders headers = new HttpHeaders();
headers.set("Authorization", "Bearer " + jwt);
headers.set("X-Cameleer-Protocol-Version", "1");
headers.setContentType(MediaType.APPLICATION_JSON);
return headers;
}
private HttpHeaders authHeadersNoBody(String jwt) {
HttpHeaders headers = new HttpHeaders();
headers.set("Authorization", "Bearer " + jwt);
headers.set("X-Cameleer-Protocol-Version", "1");
return headers;
}
/** Query the most recent audit_log row for the given action. Returns null if not found. */
private Map<String, Object> queryAuditRow(String action) {
List<Map<String, Object>> rows = jdbcTemplate.queryForList(
"SELECT username, action, category, target, result FROM audit_log WHERE action = ? ORDER BY timestamp DESC LIMIT 1",
action);
return rows.isEmpty() ? null : rows.get(0);
}
}

View File

@@ -48,6 +48,10 @@ class DeploymentControllerIT extends AbstractPostgresIT {
jdbcTemplate.update("DELETE FROM app_versions");
jdbcTemplate.update("DELETE FROM apps");
// Ensure test-operator exists in users table (required for deployments.created_by FK)
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
// Get default environment ID
ResponseEntity<String> envResponse = restTemplate.exchange(
"/api/v1/admin/environments", HttpMethod.GET,

View File

@@ -166,6 +166,157 @@ class DiagramRenderControllerIT extends AbstractPostgresIT {
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.NOT_FOUND);
}
@Test
void findByAppAndRoute_returnsLatestDiagram_noLiveAgentPrereq() {
// The env-scoped /routes/{routeId}/diagram endpoint no longer depends
// on the agent registry — routes whose publishing agents have been
// removed must still resolve. The seed step stored a diagram for
// route "render-test-route" under app "test-group" / env "default",
// so the same lookup must succeed even though the registry-driven
// "find agents for app" path used to be a hard 404 prerequisite.
HttpHeaders headers = securityHelper.authHeadersNoBody(viewerJwt);
headers.set("Accept", "application/json");
ResponseEntity<String> response = restTemplate.exchange(
"/api/v1/environments/default/apps/test-group/routes/render-test-route/diagram",
HttpMethod.GET,
new HttpEntity<>(headers),
String.class);
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
assertThat(response.getBody()).contains("nodes");
assertThat(response.getBody()).contains("edges");
}
@Test
void findByAppAndRoute_returns404ForUnknownRoute() {
HttpHeaders headers = securityHelper.authHeadersNoBody(viewerJwt);
headers.set("Accept", "application/json");
ResponseEntity<String> response = restTemplate.exchange(
"/api/v1/environments/default/apps/test-group/routes/nonexistent-route/diagram",
HttpMethod.GET,
new HttpEntity<>(headers),
String.class);
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.NOT_FOUND);
}
@Test
void exchangeDiagramHash_pinsPointInTimeEvenAfterNewerVersion() throws Exception {
// Point-in-time guarantee: an execution's stored diagramContentHash
// must keep resolving to the route shape captured at execution time,
// even after a newer diagram version for the same route is stored.
// Content-hash addressing + never-delete of route_diagrams makes this
// automatic — this test locks the invariant in.
HttpHeaders viewerHeaders = securityHelper.authHeadersNoBody(viewerJwt);
viewerHeaders.set("Accept", "application/json");
// Snapshot the pinned v1 render via the flat content-hash endpoint
// BEFORE a newer version is stored, so the post-v2 fetch can compare
// byte-for-byte.
ResponseEntity<String> pinnedBefore = restTemplate.exchange(
"/api/v1/diagrams/{hash}/render",
HttpMethod.GET,
new HttpEntity<>(viewerHeaders),
String.class,
contentHash);
assertThat(pinnedBefore.getStatusCode()).isEqualTo(HttpStatus.OK);
// Also snapshot the by-route "latest" render for the same route.
ResponseEntity<String> latestBefore = restTemplate.exchange(
"/api/v1/environments/default/apps/test-group/routes/render-test-route/diagram",
HttpMethod.GET,
new HttpEntity<>(viewerHeaders),
String.class);
assertThat(latestBefore.getStatusCode()).isEqualTo(HttpStatus.OK);
// Store a materially different v2 for the same (app, env, route).
// The renderer walks the `root` tree (not the legacy flat `nodes`
// list that the seed payload uses), so v2 uses the tree shape and
// will render non-empty output — letting us detect the version flip.
String newerDiagramJson = """
{
"routeId": "render-test-route",
"description": "v2 with extra step",
"version": 2,
"root": {
"id": "n1",
"type": "ENDPOINT",
"label": "timer:tick-v2",
"children": [
{
"id": "n2",
"type": "BEAN",
"label": "myBeanV2",
"children": [
{
"id": "n3",
"type": "TO",
"label": "log:out-v2",
"children": [
{"id": "n4", "type": "TO", "label": "log:audit"}
]
}
]
}
]
},
"edges": [
{"source": "n1", "target": "n2", "edgeType": "FLOW"},
{"source": "n2", "target": "n3", "edgeType": "FLOW"},
{"source": "n3", "target": "n4", "edgeType": "FLOW"}
]
}
""";
restTemplate.postForEntity(
"/api/v1/data/diagrams",
new HttpEntity<>(newerDiagramJson, securityHelper.authHeaders(jwt)),
String.class);
// Invariant 1: The execution's stored diagramContentHash must not
// drift — exchanges stay pinned to the version captured at ingest.
ResponseEntity<String> detailAfter = restTemplate.exchange(
"/api/v1/environments/default/executions?correlationId=render-probe-corr",
HttpMethod.GET,
new HttpEntity<>(viewerHeaders),
String.class);
JsonNode search = objectMapper.readTree(detailAfter.getBody());
String execId = search.get("data").get(0).get("executionId").asText();
ResponseEntity<String> exec = restTemplate.exchange(
"/api/v1/executions/" + execId,
HttpMethod.GET,
new HttpEntity<>(viewerHeaders),
String.class);
JsonNode execBody = objectMapper.readTree(exec.getBody());
assertThat(execBody.path("diagramContentHash").asText()).isEqualTo(contentHash);
// Invariant 2: The pinned render (by H1) must be byte-identical
// before and after v2 is stored — content-hash addressing is stable.
ResponseEntity<String> pinnedAfter = restTemplate.exchange(
"/api/v1/diagrams/{hash}/render",
HttpMethod.GET,
new HttpEntity<>(viewerHeaders),
String.class,
contentHash);
assertThat(pinnedAfter.getStatusCode()).isEqualTo(HttpStatus.OK);
assertThat(pinnedAfter.getBody()).isEqualTo(pinnedBefore.getBody());
// Invariant 3: The by-route "latest" endpoint must now surface v2,
// so its body differs from the pre-v2 snapshot. Retry briefly to
// absorb the diagram-ingest flush path.
await().atMost(20, SECONDS).untilAsserted(() -> {
ResponseEntity<String> latestAfter = restTemplate.exchange(
"/api/v1/environments/default/apps/test-group/routes/render-test-route/diagram",
HttpMethod.GET,
new HttpEntity<>(viewerHeaders),
String.class);
assertThat(latestAfter.getStatusCode()).isEqualTo(HttpStatus.OK);
assertThat(latestAfter.getBody()).isNotEqualTo(latestBefore.getBody());
assertThat(latestAfter.getBody()).contains("myBeanV2");
});
}
@Test
void getWithNoAcceptHeader_defaultsToSvg() {
HttpHeaders headers = securityHelper.authHeadersNoBody(viewerJwt);

View File

@@ -166,6 +166,42 @@ class SearchControllerIT extends AbstractPostgresIT {
""", i, i, i, i, i));
}
// Executions 11-12: carry structured attributes used by the attribute-filter tests.
ingest("""
{
"exchangeId": "ex-search-attr-1",
"applicationId": "test-group",
"instanceId": "test-agent-search-it",
"routeId": "search-route-attr-1",
"correlationId": "corr-attr-alpha",
"status": "COMPLETED",
"startTime": "2026-03-12T10:00:00Z",
"endTime": "2026-03-12T10:00:00.050Z",
"durationMs": 50,
"attributes": {"order": "12345", "tenant": "acme"},
"chunkSeq": 0,
"final": true,
"processors": []
}
""");
ingest("""
{
"exchangeId": "ex-search-attr-2",
"applicationId": "test-group",
"instanceId": "test-agent-search-it",
"routeId": "search-route-attr-2",
"correlationId": "corr-attr-beta",
"status": "COMPLETED",
"startTime": "2026-03-12T10:01:00Z",
"endTime": "2026-03-12T10:01:00.050Z",
"durationMs": 50,
"attributes": {"order": "99999"},
"chunkSeq": 0,
"final": true,
"processors": []
}
""");
// Wait for async ingestion + search indexing via REST (no raw SQL).
// Probe the last seeded execution to avoid false positives from
// other test classes that may have written into the shared CH tables.
@@ -174,6 +210,11 @@ class SearchControllerIT extends AbstractPostgresIT {
JsonNode body = objectMapper.readTree(r.getBody());
assertThat(body.get("total").asLong()).isGreaterThanOrEqualTo(1);
});
await().atMost(30, SECONDS).untilAsserted(() -> {
ResponseEntity<String> r = searchGet("?correlationId=corr-attr-beta");
JsonNode body = objectMapper.readTree(r.getBody());
assertThat(body.get("total").asLong()).isGreaterThanOrEqualTo(1);
});
}
@Test
@@ -371,6 +412,69 @@ class SearchControllerIT extends AbstractPostgresIT {
assertThat(body.get("limit").asInt()).isEqualTo(50);
}
@Test
void attrParam_exactMatch_filtersToMatchingExecution() throws Exception {
ResponseEntity<String> response = searchGet("?attr=order:12345&correlationId=corr-attr-alpha");
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = objectMapper.readTree(response.getBody());
assertThat(body.get("total").asLong()).isEqualTo(1);
assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
}
@Test
void attrParam_keyOnly_matchesAnyExecutionCarryingTheKey() throws Exception {
ResponseEntity<String> response = searchGet("?attr=tenant&correlationId=corr-attr-alpha");
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = objectMapper.readTree(response.getBody());
assertThat(body.get("total").asLong()).isEqualTo(1);
assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
}
@Test
void attrParam_multipleValues_produceIntersection() throws Exception {
// order:99999 AND tenant=* should yield zero — exec-attr-2 has order=99999 but no tenant.
ResponseEntity<String> response = searchGet("?attr=order:99999&attr=tenant");
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = objectMapper.readTree(response.getBody());
assertThat(body.get("total").asLong()).isZero();
}
@Test
void attrParam_invalidKey_returns400() throws Exception {
ResponseEntity<String> response = searchGet("?attr=bad%20key:x");
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
}
@Test
void attributeFilters_inPostBody_filtersCorrectly() throws Exception {
ResponseEntity<String> response = searchPost("""
{
"attributeFilters": [
{"key": "order", "value": "12345"}
],
"correlationId": "corr-attr-alpha"
}
""");
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = objectMapper.readTree(response.getBody());
assertThat(body.get("total").asLong()).isEqualTo(1);
assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
}
@Test
void attrParam_wildcardValue_matchesOnPrefix() throws Exception {
ResponseEntity<String> response = searchGet("?attr=order:1*&correlationId=corr-attr-alpha");
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = objectMapper.readTree(response.getBody());
assertThat(body.get("total").asLong()).isEqualTo(1);
assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
}
// --- Helper methods ---
private void ingest(String json) {

View File

@@ -0,0 +1,314 @@
package com.cameleer.server.app.controller;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.TestSecurityHelper;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.web.client.TestRestTemplate;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import java.sql.Timestamp;
import java.time.Instant;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class ServerMetricsAdminControllerIT extends AbstractPostgresIT {
@Autowired
private TestRestTemplate restTemplate;
@Autowired
private TestSecurityHelper securityHelper;
private final ObjectMapper mapper = new ObjectMapper();
private HttpHeaders adminJson;
private HttpHeaders adminGet;
private HttpHeaders viewerGet;
@BeforeEach
void seedAndAuth() {
adminJson = securityHelper.adminHeaders();
adminGet = securityHelper.authHeadersNoBody(securityHelper.adminToken());
viewerGet = securityHelper.authHeadersNoBody(securityHelper.viewerToken());
// Fresh rows for each test. The Spring-context ClickHouse JdbcTemplate
// lives in a different bean; reach for it here by executing through
// the same JdbcTemplate used by the store via the ClickHouseConfig bean.
org.springframework.jdbc.core.JdbcTemplate ch = clickhouseJdbc();
ch.execute("TRUNCATE TABLE server_metrics");
Instant t0 = Instant.parse("2026-04-23T10:00:00Z");
// Gauge: cameleer.agents.connected, two states, two buckets.
insert(ch, "default", t0, "srv-A", "cameleer.agents.connected", "gauge", "value", 3.0,
Map.of("state", "live"));
insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.agents.connected", "gauge", "value", 4.0,
Map.of("state", "live"));
insert(ch, "default", t0, "srv-A", "cameleer.agents.connected", "gauge", "value", 1.0,
Map.of("state", "stale"));
insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.agents.connected", "gauge", "value", 0.0,
Map.of("state", "stale"));
// Counter: cumulative drops, +5 per minute on srv-A.
insert(ch, "default", t0, "srv-A", "cameleer.ingestion.drops", "counter", "count", 0.0, Map.of("reason", "buffer_full"));
insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.ingestion.drops", "counter", "count", 5.0, Map.of("reason", "buffer_full"));
insert(ch, "default", t0.plusSeconds(120), "srv-A", "cameleer.ingestion.drops", "counter", "count", 10.0, Map.of("reason", "buffer_full"));
// Simulated restart to srv-B: counter resets to 0, then climbs to 2.
insert(ch, "default", t0.plusSeconds(180), "srv-B", "cameleer.ingestion.drops", "counter", "count", 0.0, Map.of("reason", "buffer_full"));
insert(ch, "default", t0.plusSeconds(240), "srv-B", "cameleer.ingestion.drops", "counter", "count", 2.0, Map.of("reason", "buffer_full"));
// Timer mean inputs: two buckets, 2 samples each (count=2, total_time=30).
insert(ch, "default", t0, "srv-A", "cameleer.ingestion.flush.duration", "timer", "count", 2.0, Map.of("type", "execution"));
insert(ch, "default", t0, "srv-A", "cameleer.ingestion.flush.duration", "timer", "total_time", 30.0, Map.of("type", "execution"));
insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.ingestion.flush.duration", "timer", "count", 4.0, Map.of("type", "execution"));
insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.ingestion.flush.duration", "timer", "total_time", 100.0, Map.of("type", "execution"));
}
// ── catalog ─────────────────────────────────────────────────────────
@Test
void catalog_listsSeededMetricsWithStatisticsAndTagKeys() throws Exception {
ResponseEntity<String> r = restTemplate.exchange(
"/api/v1/admin/server-metrics/catalog?from=2026-04-23T09:00:00Z&to=2026-04-23T11:00:00Z",
HttpMethod.GET, new HttpEntity<>(adminGet), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = mapper.readTree(r.getBody());
assertThat(body.isArray()).isTrue();
JsonNode drops = findByField(body, "metricName", "cameleer.ingestion.drops");
assertThat(drops.get("metricType").asText()).isEqualTo("counter");
assertThat(asStringList(drops.get("statistics"))).contains("count");
assertThat(asStringList(drops.get("tagKeys"))).contains("reason");
JsonNode timer = findByField(body, "metricName", "cameleer.ingestion.flush.duration");
assertThat(asStringList(timer.get("statistics"))).contains("count", "total_time");
}
// ── instances ───────────────────────────────────────────────────────
@Test
void instances_listsDistinctServerInstanceIdsWithFirstAndLastSeen() throws Exception {
ResponseEntity<String> r = restTemplate.exchange(
"/api/v1/admin/server-metrics/instances?from=2026-04-23T09:00:00Z&to=2026-04-23T11:00:00Z",
HttpMethod.GET, new HttpEntity<>(adminGet), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = mapper.readTree(r.getBody());
assertThat(body.isArray()).isTrue();
assertThat(body.size()).isEqualTo(2);
// Ordered by last_seen DESC — srv-B saw a later row.
assertThat(body.get(0).get("serverInstanceId").asText()).isEqualTo("srv-B");
assertThat(body.get(1).get("serverInstanceId").asText()).isEqualTo("srv-A");
}
// ── query — gauge with group-by-tag ─────────────────────────────────
@Test
void query_gaugeWithGroupByTag_returnsSeriesPerTagValue() throws Exception {
String requestBody = """
{
"metric": "cameleer.agents.connected",
"statistic": "value",
"from": "2026-04-23T09:59:00Z",
"to": "2026-04-23T10:02:00Z",
"stepSeconds": 60,
"groupByTags": ["state"],
"aggregation": "avg",
"mode": "raw"
}
""";
ResponseEntity<String> r = restTemplate.postForEntity(
"/api/v1/admin/server-metrics/query",
new HttpEntity<>(requestBody, adminJson), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = mapper.readTree(r.getBody());
assertThat(body.get("metric").asText()).isEqualTo("cameleer.agents.connected");
assertThat(body.get("statistic").asText()).isEqualTo("value");
assertThat(body.get("mode").asText()).isEqualTo("raw");
assertThat(body.get("stepSeconds").asInt()).isEqualTo(60);
JsonNode series = body.get("series");
assertThat(series.isArray()).isTrue();
assertThat(series.size()).isEqualTo(2);
JsonNode live = findByTag(series, "state", "live");
assertThat(live.get("points").size()).isEqualTo(2);
assertThat(live.get("points").get(0).get("v").asDouble()).isEqualTo(3.0);
assertThat(live.get("points").get(1).get("v").asDouble()).isEqualTo(4.0);
}
// ── query — counter delta across instance rotation ──────────────────
@Test
void query_counterDelta_clipsNegativesAcrossInstanceRotation() throws Exception {
String requestBody = """
{
"metric": "cameleer.ingestion.drops",
"statistic": "count",
"from": "2026-04-23T09:59:00Z",
"to": "2026-04-23T10:05:00Z",
"stepSeconds": 60,
"groupByTags": ["reason"],
"aggregation": "sum",
"mode": "delta"
}
""";
ResponseEntity<String> r = restTemplate.postForEntity(
"/api/v1/admin/server-metrics/query",
new HttpEntity<>(requestBody, adminJson), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = mapper.readTree(r.getBody());
JsonNode reason = findByTag(body.get("series"), "reason", "buffer_full");
// Deltas: 0 (first bucket on srv-A), 5, 5, 0 (first on srv-B, clipped), 2.
// Sum across the window should be 12 if we tally all positive deltas.
double sum = 0;
for (JsonNode p : reason.get("points")) sum += p.get("v").asDouble();
assertThat(sum).isEqualTo(12.0);
// No individual point may be negative.
for (JsonNode p : reason.get("points")) {
assertThat(p.get("v").asDouble()).isGreaterThanOrEqualTo(0.0);
}
}
// ── query — derived 'mean' statistic for timers ─────────────────────
@Test
void query_timerMeanStatistic_computesTotalOverCountPerBucket() throws Exception {
String requestBody = """
{
"metric": "cameleer.ingestion.flush.duration",
"statistic": "mean",
"from": "2026-04-23T09:59:00Z",
"to": "2026-04-23T10:02:00Z",
"stepSeconds": 60,
"groupByTags": ["type"],
"aggregation": "avg",
"mode": "raw"
}
""";
ResponseEntity<String> r = restTemplate.postForEntity(
"/api/v1/admin/server-metrics/query",
new HttpEntity<>(requestBody, adminJson), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
JsonNode body = mapper.readTree(r.getBody());
JsonNode points = findByTag(body.get("series"), "type", "execution").get("points");
// Bucket 0: 30 / 2 = 15.0
// Bucket 1: 100 / 4 = 25.0
assertThat(points.get(0).get("v").asDouble()).isEqualTo(15.0);
assertThat(points.get(1).get("v").asDouble()).isEqualTo(25.0);
}
// ── query — input validation ────────────────────────────────────────
@Test
void query_rejectsUnsafeMetricName() {
String requestBody = """
{
"metric": "cameleer.agents; DROP TABLE server_metrics",
"from": "2026-04-23T09:59:00Z",
"to": "2026-04-23T10:02:00Z"
}
""";
ResponseEntity<String> r = restTemplate.postForEntity(
"/api/v1/admin/server-metrics/query",
new HttpEntity<>(requestBody, adminJson), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
}
@Test
void query_rejectsRangeBeyondMax() {
String requestBody = """
{
"metric": "cameleer.agents.connected",
"from": "2026-01-01T00:00:00Z",
"to": "2026-04-23T00:00:00Z"
}
""";
ResponseEntity<String> r = restTemplate.postForEntity(
"/api/v1/admin/server-metrics/query",
new HttpEntity<>(requestBody, adminJson), String.class);
assertThat(r.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
}
// ── authorization ───────────────────────────────────────────────────
@Test
void allEndpoints_requireAdminRole() {
ResponseEntity<String> catalog = restTemplate.exchange(
"/api/v1/admin/server-metrics/catalog",
HttpMethod.GET, new HttpEntity<>(viewerGet), String.class);
assertThat(catalog.getStatusCode()).isEqualTo(HttpStatus.FORBIDDEN);
ResponseEntity<String> instances = restTemplate.exchange(
"/api/v1/admin/server-metrics/instances",
HttpMethod.GET, new HttpEntity<>(viewerGet), String.class);
assertThat(instances.getStatusCode()).isEqualTo(HttpStatus.FORBIDDEN);
HttpHeaders viewerPost = securityHelper.authHeaders(securityHelper.viewerToken());
ResponseEntity<String> query = restTemplate.exchange(
"/api/v1/admin/server-metrics/query",
HttpMethod.POST, new HttpEntity<>("{}", viewerPost), String.class);
assertThat(query.getStatusCode()).isEqualTo(HttpStatus.FORBIDDEN);
}
// ── helpers ─────────────────────────────────────────────────────────
private org.springframework.jdbc.core.JdbcTemplate clickhouseJdbc() {
return org.springframework.test.util.AopTestUtils.getTargetObject(
applicationContext.getBean("clickHouseJdbcTemplate"));
}
@Autowired
private org.springframework.context.ApplicationContext applicationContext;
private static void insert(org.springframework.jdbc.core.JdbcTemplate jdbc,
String tenantId, Instant collectedAt, String serverInstanceId,
String metricName, String metricType, String statistic,
double value, Map<String, String> tags) {
jdbc.update("""
INSERT INTO server_metrics
(tenant_id, collected_at, server_instance_id,
metric_name, metric_type, statistic, metric_value, tags)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
tenantId, Timestamp.from(collectedAt), serverInstanceId,
metricName, metricType, statistic, value, tags);
}
private static JsonNode findByField(JsonNode array, String field, String value) {
for (JsonNode n : array) {
if (value.equals(n.path(field).asText())) return n;
}
throw new AssertionError("no element with " + field + "=" + value);
}
private static JsonNode findByTag(JsonNode seriesArray, String tagKey, String tagValue) {
for (JsonNode s : seriesArray) {
if (tagValue.equals(s.path("tags").path(tagKey).asText())) return s;
}
throw new AssertionError("no series with tag " + tagKey + "=" + tagValue);
}
private static java.util.List<String> asStringList(JsonNode arr) {
java.util.List<String> out = new java.util.ArrayList<>();
if (arr != null) for (JsonNode n : arr) out.add(n.asText());
return out;
}
}

View File

@@ -0,0 +1,130 @@
package com.cameleer.server.app.metrics;
import com.cameleer.server.core.storage.ServerMetricsStore;
import com.cameleer.server.core.storage.model.ServerMetricSample;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.simple.SimpleMeterRegistry;
import org.junit.jupiter.api.Test;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import static org.assertj.core.api.Assertions.assertThat;
class ServerMetricsSnapshotSchedulerTest {
@Test
void snapshot_capturesCounterGaugeAndTimerMeasurements() {
MeterRegistry registry = new SimpleMeterRegistry();
Counter counter = Counter.builder("cameleer.test.counter")
.tag("env", "dev")
.register(registry);
counter.increment(3);
AtomicInteger gaugeSource = new AtomicInteger(42);
Gauge.builder("cameleer.test.gauge", gaugeSource, AtomicInteger::doubleValue)
.register(registry);
Timer timer = Timer.builder("cameleer.test.timer").register(registry);
timer.record(Duration.ofMillis(5));
timer.record(Duration.ofMillis(15));
RecordingStore store = new RecordingStore();
ServerMetricsSnapshotScheduler scheduler =
new ServerMetricsSnapshotScheduler(registry, store, "tenant-7", "server-A");
scheduler.snapshot();
assertThat(store.batches).hasSize(1);
List<ServerMetricSample> samples = store.batches.get(0);
// Every sample is stamped with tenant + instance + finite value
assertThat(samples).allSatisfy(s -> {
assertThat(s.tenantId()).isEqualTo("tenant-7");
assertThat(s.serverInstanceId()).isEqualTo("server-A");
assertThat(Double.isFinite(s.value())).isTrue();
assertThat(s.collectedAt()).isNotNull();
});
// Counter -> 1 row with statistic=count, value=3, tag propagated
List<ServerMetricSample> counterRows = samples.stream()
.filter(s -> s.metricName().equals("cameleer.test.counter"))
.toList();
assertThat(counterRows).hasSize(1);
assertThat(counterRows.get(0).statistic()).isEqualTo("count");
assertThat(counterRows.get(0).metricType()).isEqualTo("counter");
assertThat(counterRows.get(0).value()).isEqualTo(3.0);
assertThat(counterRows.get(0).tags()).containsEntry("env", "dev");
// Gauge -> 1 row with statistic=value
List<ServerMetricSample> gaugeRows = samples.stream()
.filter(s -> s.metricName().equals("cameleer.test.gauge"))
.toList();
assertThat(gaugeRows).hasSize(1);
assertThat(gaugeRows.get(0).statistic()).isEqualTo("value");
assertThat(gaugeRows.get(0).metricType()).isEqualTo("gauge");
assertThat(gaugeRows.get(0).value()).isEqualTo(42.0);
// Timer -> emits multiple statistics (count, total_time, max)
List<ServerMetricSample> timerRows = samples.stream()
.filter(s -> s.metricName().equals("cameleer.test.timer"))
.toList();
assertThat(timerRows).isNotEmpty();
// SimpleMeterRegistry emits Statistic.TOTAL ("total"); other registries (Prometheus)
// emit TOTAL_TIME ("total_time"). Accept either so the test isn't registry-coupled.
assertThat(timerRows).extracting(ServerMetricSample::statistic)
.contains("count", "max");
assertThat(timerRows).extracting(ServerMetricSample::statistic)
.containsAnyOf("total_time", "total");
assertThat(timerRows).allSatisfy(s ->
assertThat(s.metricType()).isEqualTo("timer"));
ServerMetricSample count = timerRows.stream()
.filter(s -> s.statistic().equals("count"))
.findFirst().orElseThrow();
assertThat(count.value()).isEqualTo(2.0);
}
@Test
void snapshot_withEmptyRegistry_doesNotWriteBatch() {
MeterRegistry registry = new SimpleMeterRegistry();
// Force removal of any auto-registered meters (SimpleMeterRegistry has none by default).
RecordingStore store = new RecordingStore();
ServerMetricsSnapshotScheduler scheduler =
new ServerMetricsSnapshotScheduler(registry, store, "t", "s");
scheduler.snapshot();
assertThat(store.batches).isEmpty();
}
@Test
void snapshot_swallowsStoreFailures() {
MeterRegistry registry = new SimpleMeterRegistry();
Counter.builder("cameleer.test").register(registry).increment();
ServerMetricsStore throwingStore = batch -> {
throw new RuntimeException("clickhouse down");
};
ServerMetricsSnapshotScheduler scheduler =
new ServerMetricsSnapshotScheduler(registry, throwingStore, "t", "s");
// Must not propagate — the scheduler thread would otherwise die.
scheduler.snapshot();
}
private static final class RecordingStore implements ServerMetricsStore {
final List<List<ServerMetricSample>> batches = new ArrayList<>();
@Override
public void insertBatch(List<ServerMetricSample> samples) {
batches.add(List.copyOf(samples));
}
}
}

View File

@@ -34,6 +34,10 @@ class OutboundConnectionAdminControllerIT extends AbstractPostgresIT {
@org.junit.jupiter.api.AfterEach
void cleanupRows() {
jdbcTemplate.update("DELETE FROM outbound_connections WHERE tenant_id = 'default'");
// Clear deployments.created_by for our test users — sibling ITs
// (DeploymentControllerIT etc.) may have left rows that FK-block user deletion.
jdbcTemplate.update(
"DELETE FROM deployments WHERE created_by IN ('test-admin','test-operator','test-viewer')");
jdbcTemplate.update("DELETE FROM users WHERE user_id IN ('test-admin','test-operator','test-viewer')");
}

View File

@@ -0,0 +1,194 @@
package com.cameleer.server.app.runtime;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.TestSecurityHelper;
import com.cameleer.server.app.storage.PostgresDeploymentRepository;
import com.cameleer.server.core.runtime.ContainerStatus;
import com.cameleer.server.core.runtime.Deployment;
import com.cameleer.server.core.runtime.DeploymentStatus;
import com.cameleer.server.core.runtime.RuntimeOrchestrator;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.boot.test.web.client.TestRestTemplate;
import org.springframework.core.io.ByteArrayResource;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.http.MediaType;
import org.springframework.test.context.TestPropertySource;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.util.MultiValueMap;
import java.util.UUID;
import java.util.concurrent.TimeUnit;
import static org.assertj.core.api.Assertions.assertThat;
import static org.awaitility.Awaitility.await;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
/**
* Verifies the blue-green deployment strategy: start all new → health-check
* all → stop old. Strict all-healthy — partial failure preserves the previous
* deployment untouched.
*/
@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
class BlueGreenStrategyIT extends AbstractPostgresIT {
@MockBean
RuntimeOrchestrator runtimeOrchestrator;
@Autowired private TestRestTemplate restTemplate;
@Autowired private ObjectMapper objectMapper;
@Autowired private TestSecurityHelper securityHelper;
@Autowired private PostgresDeploymentRepository deploymentRepository;
private String operatorJwt;
private String appSlug;
private String versionId;
@BeforeEach
void setUp() throws Exception {
operatorJwt = securityHelper.operatorToken();
jdbcTemplate.update("DELETE FROM deployments");
jdbcTemplate.update("DELETE FROM app_versions");
jdbcTemplate.update("DELETE FROM apps");
jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
// Ensure test-operator exists in users table (required for deployments.created_by FK)
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
when(runtimeOrchestrator.isEnabled()).thenReturn(true);
appSlug = "bg-" + UUID.randomUUID().toString().substring(0, 8);
post("/api/v1/environments/default/apps", String.format("""
{"slug": "%s", "displayName": "BG App"}
""", appSlug), operatorJwt);
put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
{"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "blue-green"}
""", operatorJwt);
versionId = uploadJar(appSlug, ("bg-jar-" + appSlug).getBytes());
}
@Test
void blueGreen_allHealthy_stopsOldAfterNew() throws Exception {
when(runtimeOrchestrator.startContainer(any()))
.thenReturn("old-0", "old-1", "new-0", "new-1");
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
String firstDeployId = triggerDeploy();
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
String secondDeployId = triggerDeploy();
awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
// Previous deployment was stopped once new was healthy
Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
verify(runtimeOrchestrator).stopContainer("old-0");
verify(runtimeOrchestrator).stopContainer("old-1");
verify(runtimeOrchestrator, never()).stopContainer("new-0");
verify(runtimeOrchestrator, never()).stopContainer("new-1");
// New deployment has both new replicas recorded
Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
assertThat(second.replicaStates()).hasSize(2);
}
@Test
void blueGreen_partialHealthy_preservesOldAndMarksFailed() throws Exception {
when(runtimeOrchestrator.startContainer(any()))
.thenReturn("old-0", "old-1", "new-0", "new-1");
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
String firstDeployId = triggerDeploy();
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
String secondDeployId = triggerDeploy();
awaitStatus(secondDeployId, DeploymentStatus.FAILED);
Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
assertThat(second.errorMessage())
.contains("blue-green")
.contains("1/2");
// Previous deployment stays RUNNING — blue-green's safety promise.
Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
assertThat(first.status()).isEqualTo(DeploymentStatus.RUNNING);
verify(runtimeOrchestrator, never()).stopContainer("old-0");
verify(runtimeOrchestrator, never()).stopContainer("old-1");
// Cleanup ran on both new replicas.
verify(runtimeOrchestrator).stopContainer("new-0");
verify(runtimeOrchestrator).stopContainer("new-1");
}
// ---- helpers ----
private String triggerDeploy() throws Exception {
JsonNode deployResponse = post(
"/api/v1/environments/default/apps/" + appSlug + "/deployments",
String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
return deployResponse.path("id").asText();
}
private void awaitStatus(String deployId, DeploymentStatus expected) {
await().atMost(30, TimeUnit.SECONDS)
.pollInterval(500, TimeUnit.MILLISECONDS)
.untilAsserted(() -> {
Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
.orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
assertThat(d.status()).isEqualTo(expected);
});
}
private JsonNode post(String path, String json, String jwt) throws Exception {
HttpHeaders headers = securityHelper.authHeaders(jwt);
var response = restTemplate.exchange(path, HttpMethod.POST,
new HttpEntity<>(json, headers), String.class);
return objectMapper.readTree(response.getBody());
}
private void put(String path, String json, String jwt) {
HttpHeaders headers = securityHelper.authHeaders(jwt);
restTemplate.exchange(path, HttpMethod.PUT,
new HttpEntity<>(json, headers), String.class);
}
private String uploadJar(String appSlug, byte[] content) throws Exception {
ByteArrayResource resource = new ByteArrayResource(content) {
@Override public String getFilename() { return "app.jar"; }
};
MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
body.add("file", resource);
HttpHeaders headers = new HttpHeaders();
headers.set("Authorization", "Bearer " + operatorJwt);
headers.set("X-Cameleer-Protocol-Version", "1");
headers.setContentType(MediaType.MULTIPART_FORM_DATA);
var response = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/versions",
HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
JsonNode versionNode = objectMapper.readTree(response.getBody());
return versionNode.path("id").asText();
}
}

View File

@@ -69,6 +69,10 @@ class DeploymentSnapshotIT extends AbstractPostgresIT {
jdbcTemplate.update("DELETE FROM app_versions");
jdbcTemplate.update("DELETE FROM apps");
jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
// Ensure test-operator exists in users table (required for deployments.created_by FK)
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
}
// -----------------------------------------------------------------------

View File

@@ -0,0 +1,198 @@
package com.cameleer.server.app.runtime;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.TestSecurityHelper;
import com.cameleer.server.app.storage.PostgresDeploymentRepository;
import com.cameleer.server.core.runtime.ContainerStatus;
import com.cameleer.server.core.runtime.Deployment;
import com.cameleer.server.core.runtime.DeploymentStatus;
import com.cameleer.server.core.runtime.RuntimeOrchestrator;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.InOrder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.boot.test.web.client.TestRestTemplate;
import org.springframework.core.io.ByteArrayResource;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.http.MediaType;
import org.springframework.test.context.TestPropertySource;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.util.MultiValueMap;
import java.util.UUID;
import java.util.concurrent.TimeUnit;
import static org.assertj.core.api.Assertions.assertThat;
import static org.awaitility.Awaitility.await;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.inOrder;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
/**
* Verifies the rolling deployment strategy: per-replica start → health → stop
* old. Mid-rollout health failure preserves remaining un-replaced old replicas;
* already-stopped old replicas are not restored.
*/
@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
class RollingStrategyIT extends AbstractPostgresIT {
@MockBean
RuntimeOrchestrator runtimeOrchestrator;
@Autowired private TestRestTemplate restTemplate;
@Autowired private ObjectMapper objectMapper;
@Autowired private TestSecurityHelper securityHelper;
@Autowired private PostgresDeploymentRepository deploymentRepository;
private String operatorJwt;
private String appSlug;
private String versionId;
@BeforeEach
void setUp() throws Exception {
operatorJwt = securityHelper.operatorToken();
jdbcTemplate.update("DELETE FROM deployments");
jdbcTemplate.update("DELETE FROM app_versions");
jdbcTemplate.update("DELETE FROM apps");
jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
// Ensure test-operator exists in users table (required for deployments.created_by FK)
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
when(runtimeOrchestrator.isEnabled()).thenReturn(true);
appSlug = "roll-" + UUID.randomUUID().toString().substring(0, 8);
post("/api/v1/environments/default/apps", String.format("""
{"slug": "%s", "displayName": "Rolling App"}
""", appSlug), operatorJwt);
put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
{"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "rolling"}
""", operatorJwt);
versionId = uploadJar(appSlug, ("roll-jar-" + appSlug).getBytes());
}
@Test
void rolling_allHealthy_replacesOneByOne() throws Exception {
when(runtimeOrchestrator.startContainer(any()))
.thenReturn("old-0", "old-1", "new-0", "new-1");
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
String firstDeployId = triggerDeploy();
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
String secondDeployId = triggerDeploy();
awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
// Rolling invariant: old-0 is stopped BEFORE old-1 (replicas replaced
// one at a time, not all at once). Checking stop order is sufficient —
// a blue-green path would have both stops adjacent at the end with no
// interleaved starts; rolling interleaves starts between stops.
InOrder inOrder = inOrder(runtimeOrchestrator);
inOrder.verify(runtimeOrchestrator).stopContainer("old-0");
inOrder.verify(runtimeOrchestrator).stopContainer("old-1");
// Total of 4 startContainer calls: 2 for first deploy, 2 for rolling.
verify(runtimeOrchestrator, times(4)).startContainer(any());
// New replicas were not stopped — they're the running ones now.
verify(runtimeOrchestrator, never()).stopContainer("new-0");
verify(runtimeOrchestrator, never()).stopContainer("new-1");
Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
}
@Test
void rolling_failsMidRollout_preservesRemainingOld() throws Exception {
when(runtimeOrchestrator.startContainer(any()))
.thenReturn("old-0", "old-1", "new-0", "new-1");
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
String firstDeployId = triggerDeploy();
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
String secondDeployId = triggerDeploy();
awaitStatus(secondDeployId, DeploymentStatus.FAILED);
Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
assertThat(second.errorMessage())
.contains("rolling")
.contains("replica 1");
// old-0 was replaced before the failure; old-1 was never touched.
verify(runtimeOrchestrator).stopContainer("old-0");
verify(runtimeOrchestrator, never()).stopContainer("old-1");
// Cleanup stops both new replicas started so far.
verify(runtimeOrchestrator).stopContainer("new-0");
verify(runtimeOrchestrator).stopContainer("new-1");
}
// ---- helpers (same pattern as BlueGreenStrategyIT) ----
private String triggerDeploy() throws Exception {
JsonNode deployResponse = post(
"/api/v1/environments/default/apps/" + appSlug + "/deployments",
String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
return deployResponse.path("id").asText();
}
private void awaitStatus(String deployId, DeploymentStatus expected) {
await().atMost(30, TimeUnit.SECONDS)
.pollInterval(500, TimeUnit.MILLISECONDS)
.untilAsserted(() -> {
Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
.orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
assertThat(d.status()).isEqualTo(expected);
});
}
private JsonNode post(String path, String json, String jwt) throws Exception {
HttpHeaders headers = securityHelper.authHeaders(jwt);
var response = restTemplate.exchange(path, HttpMethod.POST,
new HttpEntity<>(json, headers), String.class);
return objectMapper.readTree(response.getBody());
}
private void put(String path, String json, String jwt) {
HttpHeaders headers = securityHelper.authHeaders(jwt);
restTemplate.exchange(path, HttpMethod.PUT,
new HttpEntity<>(json, headers), String.class);
}
private String uploadJar(String appSlug, byte[] content) throws Exception {
ByteArrayResource resource = new ByteArrayResource(content) {
@Override public String getFilename() { return "app.jar"; }
};
MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
body.add("file", resource);
HttpHeaders headers = new HttpHeaders();
headers.set("Authorization", "Bearer " + operatorJwt);
headers.set("X-Cameleer-Protocol-Version", "1");
headers.setContentType(MediaType.MULTIPART_FORM_DATA);
var response = restTemplate.exchange(
"/api/v1/environments/default/apps/" + appSlug + "/versions",
HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
JsonNode versionNode = objectMapper.readTree(response.getBody());
return versionNode.path("id").asText();
}
}

View File

@@ -0,0 +1,90 @@
package com.cameleer.server.app.runtime;
import com.cameleer.server.core.runtime.ResolvedContainerConfig;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import static org.junit.jupiter.api.Assertions.*;
class TraefikLabelBuilderTest {
private static ResolvedContainerConfig config(boolean externalRouting, String certResolver) {
return new ResolvedContainerConfig(
512, null, 500, null,
8080, List.of(), Map.of(),
true, true,
"path", "example.com", "https://cameleer.example.com",
1, "blue-green",
true, true,
"spring-boot", "", List.of(),
externalRouting,
certResolver
);
}
@Test
void build_emitsTraefikLabelsWhenExternalRoutingEnabled() {
Map<String, String> labels = TraefikLabelBuilder.build(
"myapp", "dev", "acme", config(true, null), 0, "abcdef01");
assertEquals("true", labels.get("traefik.enable"));
assertEquals("8080", labels.get("traefik.http.services.dev-myapp.loadbalancer.server.port"));
assertEquals("PathPrefix(`/dev/myapp/`)", labels.get("traefik.http.routers.dev-myapp.rule"));
}
@Test
void build_omitsAllTraefikLabelsWhenExternalRoutingDisabled() {
Map<String, String> labels = TraefikLabelBuilder.build(
"myapp", "dev", "acme", config(false, null), 0, "abcdef01");
long traefikLabelCount = labels.keySet().stream()
.filter(k -> k.startsWith("traefik."))
.count();
assertEquals(0, traefikLabelCount, "expected no traefik.* labels but found: " + labels);
}
@Test
void build_preservesIdentityLabelsWhenExternalRoutingDisabled() {
Map<String, String> labels = TraefikLabelBuilder.build(
"myapp", "dev", "acme", config(false, null), 2, "abcdef01");
assertEquals("cameleer-server", labels.get("managed-by"));
assertEquals("acme", labels.get("cameleer.tenant"));
assertEquals("myapp", labels.get("cameleer.app"));
assertEquals("dev", labels.get("cameleer.environment"));
assertEquals("2", labels.get("cameleer.replica"));
assertEquals("abcdef01", labels.get("cameleer.generation"));
assertEquals("dev-myapp-2-abcdef01", labels.get("cameleer.instance-id"));
}
@Test
void build_emitsCertResolverLabelWhenConfigured() {
Map<String, String> labels = TraefikLabelBuilder.build(
"myapp", "dev", "acme", config(true, "letsencrypt"), 0, "abcdef01");
assertEquals("true", labels.get("traefik.http.routers.dev-myapp.tls"));
assertEquals("letsencrypt", labels.get("traefik.http.routers.dev-myapp.tls.certresolver"));
}
@Test
void build_omitsCertResolverLabelWhenNull() {
Map<String, String> labels = TraefikLabelBuilder.build(
"myapp", "dev", "acme", config(true, null), 0, "abcdef01");
assertEquals("true", labels.get("traefik.http.routers.dev-myapp.tls"),
"sslOffloading=true should still mark the router TLS-enabled");
assertNull(labels.get("traefik.http.routers.dev-myapp.tls.certresolver"),
"cert resolver label must be omitted when none is configured");
}
@Test
void build_omitsCertResolverLabelWhenBlank() {
Map<String, String> labels = TraefikLabelBuilder.build(
"myapp", "dev", "acme", config(true, " "), 0, "abcdef01");
assertNull(labels.get("traefik.http.routers.dev-myapp.tls.certresolver"),
"whitespace-only cert resolver must be treated as unset");
}
}

View File

@@ -79,7 +79,8 @@ class ClickHouseLogStoreCountIT {
base.plusSeconds(30),
null,
100,
"desc"));
"desc",
null));
assertThat(count).isEqualTo(3);
}
@@ -102,7 +103,8 @@ class ClickHouseLogStoreCountIT {
base.plusSeconds(30),
null,
100,
"desc"));
"desc",
null));
assertThat(count).isZero();
}
@@ -120,7 +122,7 @@ class ClickHouseLogStoreCountIT {
null, List.of("ERROR"), "orders", null, null, null,
"dev", List.of(),
base.minusSeconds(1), base.plusSeconds(60),
null, 100, "desc"));
null, 100, "desc", null));
assertThat(devCount).isEqualTo(2);
}

View File

@@ -53,7 +53,7 @@ class ClickHouseLogStoreIT {
}
private LogSearchRequest req(String application) {
return new LogSearchRequest(null, null, application, null, null, null, null, null, null, null, null, 100, "desc");
return new LogSearchRequest(null, null, application, null, null, null, null, null, null, null, null, 100, "desc", null);
}
// ── Tests ─────────────────────────────────────────────────────────────
@@ -99,7 +99,7 @@ class ClickHouseLogStoreIT {
));
LogSearchResponse result = store.search(new LogSearchRequest(
null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).level()).isEqualTo("ERROR");
@@ -116,7 +116,7 @@ class ClickHouseLogStoreIT {
));
LogSearchResponse result = store.search(new LogSearchRequest(
null, List.of("WARN", "ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
null, List.of("WARN", "ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(2);
}
@@ -130,7 +130,7 @@ class ClickHouseLogStoreIT {
));
LogSearchResponse result = store.search(new LogSearchRequest(
"order #12345", null, "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
"order #12345", null, "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).message()).contains("order #12345");
@@ -147,7 +147,7 @@ class ClickHouseLogStoreIT {
));
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, "my-app", null, "exchange-abc", null, null, null, null, null, null, 100, "desc"));
null, null, "my-app", null, "exchange-abc", null, null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).message()).isEqualTo("msg with exchange");
@@ -170,7 +170,7 @@ class ClickHouseLogStoreIT {
Instant to = Instant.parse("2026-03-31T13:00:00Z");
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null, null, from, to, null, 100, "desc"));
null, null, "my-app", null, null, null, null, null, from, to, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).message()).isEqualTo("noon");
@@ -188,7 +188,7 @@ class ClickHouseLogStoreIT {
// No application filter — should return both
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, null, null, null, null, null, null, null, null, null, 100, "desc"));
null, null, null, null, null, null, null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(2);
}
@@ -202,7 +202,7 @@ class ClickHouseLogStoreIT {
));
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, "my-app", null, null, "OrderProcessor", null, null, null, null, null, 100, "desc"));
null, null, "my-app", null, null, "OrderProcessor", null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).loggerName()).contains("OrderProcessor");
@@ -221,7 +221,7 @@ class ClickHouseLogStoreIT {
// Page 1: limit 2
LogSearchResponse page1 = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null, null, null, null, null, 2, "desc"));
null, null, "my-app", null, null, null, null, null, null, null, null, 2, "desc", null));
assertThat(page1.data()).hasSize(2);
assertThat(page1.hasMore()).isTrue();
@@ -230,7 +230,7 @@ class ClickHouseLogStoreIT {
// Page 2: use cursor
LogSearchResponse page2 = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null, null, null, null, page1.nextCursor(), 2, "desc"));
null, null, "my-app", null, null, null, null, null, null, null, page1.nextCursor(), 2, "desc", null));
assertThat(page2.data()).hasSize(2);
assertThat(page2.hasMore()).isTrue();
@@ -238,7 +238,7 @@ class ClickHouseLogStoreIT {
// Page 3: last page
LogSearchResponse page3 = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null, null, null, null, page2.nextCursor(), 2, "desc"));
null, null, "my-app", null, null, null, null, null, null, null, page2.nextCursor(), 2, "desc", null));
assertThat(page3.data()).hasSize(1);
assertThat(page3.hasMore()).isFalse();
@@ -257,7 +257,7 @@ class ClickHouseLogStoreIT {
// Filter for ERROR only, but counts should include all levels
LogSearchResponse result = store.search(new LogSearchRequest(
null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.levelCounts()).containsEntry("INFO", 2L);
@@ -275,7 +275,7 @@ class ClickHouseLogStoreIT {
));
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null, null, null, null, null, 100, "asc"));
null, null, "my-app", null, null, null, null, null, null, null, null, 100, "asc", null));
assertThat(result.data()).hasSize(3);
assertThat(result.data().get(0).message()).isEqualTo("msg-1");
@@ -340,7 +340,7 @@ class ClickHouseLogStoreIT {
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null,
List.of("container"), null, null, null, 100, "desc"));
List.of("container"), null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).message()).isEqualTo("container msg");
@@ -365,7 +365,7 @@ class ClickHouseLogStoreIT {
LogSearchResponse result = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null,
List.of("app", "container"), null, null, null, 100, "desc"));
List.of("app", "container"), null, null, null, 100, "desc", null));
assertThat(result.data()).hasSize(2);
assertThat(result.data()).extracting(LogEntryResult::message)
@@ -388,7 +388,7 @@ class ClickHouseLogStoreIT {
for (int page = 0; page < 10; page++) {
LogSearchResponse resp = store.search(new LogSearchRequest(
null, null, "my-app", null, null, null, null, null,
null, null, cursor, 2, "desc"));
null, null, cursor, 2, "desc", null));
for (LogEntryResult r : resp.data()) {
assertThat(seen.add(r.message())).as("duplicate row returned: " + r.message()).isTrue();
}

View File

@@ -0,0 +1,196 @@
package com.cameleer.server.app.search;
import com.cameleer.server.core.ingestion.BufferedLogEntry;
import com.cameleer.server.core.search.LogSearchRequest;
import com.cameleer.server.core.search.LogSearchResponse;
import com.cameleer.common.model.LogEntry;
import com.cameleer.server.app.ClickHouseTestHelper;
import com.zaxxer.hikari.HikariDataSource;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.jdbc.core.JdbcTemplate;
import org.testcontainers.clickhouse.ClickHouseContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
import java.time.Instant;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
/**
* Integration test for the {@code instanceIds} multi-value filter on
* {@link ClickHouseLogStore#search(LogSearchRequest)}.
*
* <p>Three rows are seeded with distinct {@code instance_id} values:
* <ul>
* <li>{@code prod-app1-0-aaa11111} — included in filter</li>
* <li>{@code prod-app1-1-aaa11111} — included in filter</li>
* <li>{@code prod-app1-0-bbb22222} — excluded from filter</li>
* </ul>
*/
@Testcontainers
class ClickHouseLogStoreInstanceIdsIT {
@Container
static final ClickHouseContainer clickhouse =
new ClickHouseContainer("clickhouse/clickhouse-server:24.12");
private JdbcTemplate jdbc;
private ClickHouseLogStore store;
private static final String TENANT = "default";
private static final String ENV = "prod";
private static final String APP = "app1";
private static final String INST_A = "prod-app1-0-aaa11111";
private static final String INST_B = "prod-app1-1-aaa11111";
private static final String INST_C = "prod-app1-0-bbb22222";
@BeforeEach
void setUp() throws Exception {
HikariDataSource ds = new HikariDataSource();
ds.setJdbcUrl(clickhouse.getJdbcUrl());
ds.setUsername(clickhouse.getUsername());
ds.setPassword(clickhouse.getPassword());
jdbc = new JdbcTemplate(ds);
ClickHouseTestHelper.executeInitSql(jdbc);
jdbc.execute("TRUNCATE TABLE logs");
store = new ClickHouseLogStore(TENANT, jdbc);
Instant base = Instant.parse("2026-04-23T09:00:00Z");
seedLog(INST_A, base, "msg-from-replica-0-gen-aaa");
seedLog(INST_B, base.plusSeconds(1), "msg-from-replica-1-gen-aaa");
seedLog(INST_C, base.plusSeconds(2), "msg-from-replica-0-gen-bbb");
}
@AfterEach
void tearDown() {
jdbc.execute("TRUNCATE TABLE logs");
}
private void seedLog(String instanceId, Instant ts, String message) {
LogEntry entry = new LogEntry(ts, "INFO", "com.example.Svc", message, "main", null, null);
store.insertBufferedBatch(List.of(
new BufferedLogEntry(TENANT, ENV, instanceId, APP, entry)));
}
// ── Tests ─────────────────────────────────────────────────────────────
@Test
void search_instanceIds_returnsOnlyMatchingInstances() {
LogSearchResponse result = store.search(new LogSearchRequest(
null,
List.of(),
APP,
null,
null,
null,
ENV,
List.of(),
null,
null,
null,
100,
"desc",
List.of(INST_A, INST_B)));
assertThat(result.data()).hasSize(2);
assertThat(result.data())
.extracting(r -> r.instanceId())
.containsExactlyInAnyOrder(INST_A, INST_B);
assertThat(result.data())
.extracting(r -> r.instanceId())
.doesNotContain(INST_C);
}
@Test
void search_emptyInstanceIds_returnsAllRows() {
LogSearchResponse result = store.search(new LogSearchRequest(
null,
List.of(),
APP,
null,
null,
null,
ENV,
List.of(),
null,
null,
null,
100,
"desc",
List.of()));
assertThat(result.data()).hasSize(3);
}
@Test
void search_nullInstanceIds_returnsAllRows() {
LogSearchResponse result = store.search(new LogSearchRequest(
null,
List.of(),
APP,
null,
null,
null,
ENV,
List.of(),
null,
null,
null,
100,
"desc",
null));
assertThat(result.data()).hasSize(3);
}
@Test
void search_instanceIds_singleValue_filtersToOneReplica() {
LogSearchResponse result = store.search(new LogSearchRequest(
null,
List.of(),
APP,
null,
null,
null,
ENV,
List.of(),
null,
null,
null,
100,
"desc",
List.of(INST_C)));
assertThat(result.data()).hasSize(1);
assertThat(result.data().get(0).instanceId()).isEqualTo(INST_C);
assertThat(result.data().get(0).message()).isEqualTo("msg-from-replica-0-gen-bbb");
}
@Test
void search_instanceIds_doesNotConflictWithSingularInstanceId() {
// Singular instanceId=INST_A AND instanceIds=[INST_B] → intersection = empty
// (both conditions apply: instance_id = A AND instance_id IN (B))
LogSearchResponse result = store.search(new LogSearchRequest(
null,
List.of(),
APP,
INST_A, // singular
null,
null,
ENV,
List.of(),
null,
null,
null,
100,
"desc",
List.of(INST_B))); // plural — no overlap
assertThat(result.data()).isEmpty();
}
}

View File

@@ -2,6 +2,7 @@ package com.cameleer.server.app.search;
import com.cameleer.server.app.storage.ClickHouseExecutionStore;
import com.cameleer.server.core.ingestion.MergedExecution;
import com.cameleer.server.core.search.AttributeFilter;
import com.cameleer.server.core.search.ExecutionSummary;
import com.cameleer.server.core.search.SearchRequest;
import com.cameleer.server.core.search.SearchResult;
@@ -62,7 +63,7 @@ class ClickHouseSearchIndexIT {
500L,
"", "", "", "", "", "",
"hash-abc", "FULL",
"{\"order\":\"12345\"}", "", "", "", "", "", "{\"env\":\"prod\"}",
"", "", "", "", "", "", "{\"order\":\"12345\",\"tenant\":\"acme\"}",
"", "",
false, false,
null, null
@@ -79,7 +80,7 @@ class ClickHouseSearchIndexIT {
"java.lang.NPE\n at Foo.bar(Foo.java:42)",
"NullPointerException", "RUNTIME", "", "",
"", "FULL",
"", "", "", "", "", "", "",
"", "", "", "", "", "", "{\"order\":\"99999\"}",
"", "",
false, false,
null, null
@@ -309,4 +310,59 @@ class ClickHouseSearchIndexIT {
assertThat(result.total()).isEqualTo(1);
assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
}
@Test
void search_byAttributeFilter_exactMatch_matchesExec1() {
SearchRequest request = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null,
List.of(new AttributeFilter("order", "12345")));
SearchResult<ExecutionSummary> result = searchIndex.search(request);
assertThat(result.total()).isEqualTo(1);
assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
}
@Test
void search_byAttributeFilter_keyOnly_matchesExec1AndExec2() {
SearchRequest request = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null,
List.of(new AttributeFilter("order", null)));
SearchResult<ExecutionSummary> result = searchIndex.search(request);
assertThat(result.total()).isEqualTo(2);
assertThat(result.data()).extracting(ExecutionSummary::executionId)
.containsExactlyInAnyOrder("exec-1", "exec-2");
}
@Test
void search_byAttributeFilter_wildcardValue_matchesExec1Only() {
SearchRequest request = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null,
List.of(new AttributeFilter("order", "123*")));
SearchResult<ExecutionSummary> result = searchIndex.search(request);
assertThat(result.total()).isEqualTo(1);
assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
}
@Test
void search_byAttributeFilter_multipleFiltersAreAnded() {
SearchRequest request = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null,
List.of(
new AttributeFilter("order", "12345"),
new AttributeFilter("tenant", "acme")));
SearchResult<ExecutionSummary> result = searchIndex.search(request);
assertThat(result.total()).isEqualTo(1);
assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
}
}

View File

@@ -155,21 +155,51 @@ class ClickHouseDiagramStoreIT {
}
@Test
void findContentHashForRouteByAgents_returnsHash() {
RouteGraph graph = buildGraph("route-4", "node-z");
store.store(tagged("agent-10", "app-b", graph));
store.store(tagged("agent-20", "app-b", graph));
void findLatestContentHashForAppRoute_returnsLatestAcrossInstances() throws InterruptedException {
// v1 published by one agent, v2 by a different agent. The app+env+route
// resolver must pick v2 regardless of which instance produced it, and
// must keep working even if neither instance is "live" anywhere.
RouteGraph v1 = buildGraph("evolving-route", "n-a");
v1.setDescription("v1");
RouteGraph v2 = buildGraph("evolving-route", "n-a", "n-b");
v2.setDescription("v2");
Optional<String> result = store.findContentHashForRouteByAgents(
"route-4", java.util.List.of("agent-10", "agent-20"));
store.store(new TaggedDiagram("publisher-old", "versioned-app", "default", v1));
Thread.sleep(10);
store.store(new TaggedDiagram("publisher-new", "versioned-app", "default", v2));
assertThat(result).isPresent();
Optional<String> hashOpt = store.findLatestContentHashForAppRoute(
"versioned-app", "evolving-route", "default");
assertThat(hashOpt).isPresent();
RouteGraph retrieved = store.findByContentHash(hashOpt.get()).orElseThrow();
assertThat(retrieved.getDescription()).isEqualTo("v2");
}
@Test
void findContentHashForRouteByAgents_emptyListReturnsEmpty() {
Optional<String> result = store.findContentHashForRouteByAgents("route-x", java.util.List.of());
assertThat(result).isEmpty();
void findLatestContentHashForAppRoute_isolatesByAppAndEnv() {
RouteGraph graph = buildGraph("shared-route", "node-1");
store.store(new TaggedDiagram("a1", "app-alpha", "dev", graph));
store.store(new TaggedDiagram("a2", "app-beta", "prod", graph));
// Same route id exists across two (app, env) combos. The resolver must
// return empty for a mismatch on either dimension.
assertThat(store.findLatestContentHashForAppRoute("app-alpha", "shared-route", "dev"))
.isPresent();
assertThat(store.findLatestContentHashForAppRoute("app-alpha", "shared-route", "prod"))
.isEmpty();
assertThat(store.findLatestContentHashForAppRoute("app-beta", "shared-route", "dev"))
.isEmpty();
assertThat(store.findLatestContentHashForAppRoute("app-gamma", "shared-route", "dev"))
.isEmpty();
}
@Test
void findLatestContentHashForAppRoute_emptyInputsReturnEmpty() {
assertThat(store.findLatestContentHashForAppRoute(null, "r", "default")).isEmpty();
assertThat(store.findLatestContentHashForAppRoute("app", null, "default")).isEmpty();
assertThat(store.findLatestContentHashForAppRoute("app", "r", null)).isEmpty();
assertThat(store.findLatestContentHashForAppRoute("", "r", "default")).isEmpty();
}
@Test

View File

@@ -0,0 +1,117 @@
package com.cameleer.server.app.storage;
import com.cameleer.server.core.storage.model.ServerMetricSample;
import com.zaxxer.hikari.HikariDataSource;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.jdbc.core.JdbcTemplate;
import org.testcontainers.clickhouse.ClickHouseContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
@Testcontainers
class ClickHouseServerMetricsStoreIT {
@Container
static final ClickHouseContainer clickhouse =
new ClickHouseContainer("clickhouse/clickhouse-server:24.12");
private JdbcTemplate jdbc;
private ClickHouseServerMetricsStore store;
@BeforeEach
void setUp() {
HikariDataSource ds = new HikariDataSource();
ds.setJdbcUrl(clickhouse.getJdbcUrl());
ds.setUsername(clickhouse.getUsername());
ds.setPassword(clickhouse.getPassword());
jdbc = new JdbcTemplate(ds);
jdbc.execute("""
CREATE TABLE IF NOT EXISTS server_metrics (
tenant_id LowCardinality(String) DEFAULT 'default',
collected_at DateTime64(3),
server_instance_id LowCardinality(String),
metric_name LowCardinality(String),
metric_type LowCardinality(String),
statistic LowCardinality(String) DEFAULT 'value',
metric_value Float64,
tags Map(String, String) DEFAULT map(),
server_received_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
ORDER BY (tenant_id, collected_at, server_instance_id, metric_name, statistic)
""");
jdbc.execute("TRUNCATE TABLE server_metrics");
store = new ClickHouseServerMetricsStore(jdbc);
}
@Test
void insertBatch_roundTripsAllColumns() {
Instant ts = Instant.parse("2026-04-23T12:00:00Z");
store.insertBatch(List.of(
new ServerMetricSample("tenant-a", ts, "srv-1",
"cameleer.ingestion.drops", "counter", "count", 17.0,
Map.of("reason", "buffer_full")),
new ServerMetricSample("tenant-a", ts, "srv-1",
"jvm.memory.used", "gauge", "value", 1_048_576.0,
Map.of("area", "heap", "id", "G1 Eden Space"))
));
Integer count = jdbc.queryForObject(
"SELECT count() FROM server_metrics WHERE tenant_id = 'tenant-a'",
Integer.class);
assertThat(count).isEqualTo(2);
Double dropsValue = jdbc.queryForObject(
"""
SELECT metric_value FROM server_metrics
WHERE tenant_id = 'tenant-a'
AND server_instance_id = 'srv-1'
AND metric_name = 'cameleer.ingestion.drops'
AND statistic = 'count'
""",
Double.class);
assertThat(dropsValue).isEqualTo(17.0);
String heapArea = jdbc.queryForObject(
"""
SELECT tags['area'] FROM server_metrics
WHERE tenant_id = 'tenant-a'
AND metric_name = 'jvm.memory.used'
""",
String.class);
assertThat(heapArea).isEqualTo("heap");
}
@Test
void insertBatch_emptyList_doesNothing() {
store.insertBatch(List.of());
Integer count = jdbc.queryForObject(
"SELECT count() FROM server_metrics", Integer.class);
assertThat(count).isEqualTo(0);
}
@Test
void insertBatch_nullTags_storesEmptyMap() {
store.insertBatch(List.of(
new ServerMetricSample("default", Instant.parse("2026-04-23T12:00:00Z"),
"srv-2", "process.cpu.usage", "gauge", "value", 0.12, null)
));
Integer count = jdbc.queryForObject(
"SELECT count() FROM server_metrics WHERE server_instance_id = 'srv-2'",
Integer.class);
assertThat(count).isEqualTo(1);
}
}

View File

@@ -0,0 +1,77 @@
package com.cameleer.server.app.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.runtime.Deployment;
import com.cameleer.server.core.runtime.DeploymentService;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class PostgresDeploymentRepositoryCreatedByIT extends AbstractPostgresIT {
@Autowired DeploymentService deploymentService;
@Autowired JdbcTemplate jdbc;
private UUID appId;
private UUID envId;
private UUID versionId;
@BeforeEach
void seedAppAndVersion() {
// Clean up to avoid conflicts across test runs
jdbc.update("DELETE FROM deployments");
jdbc.update("DELETE FROM app_versions");
jdbc.update("DELETE FROM apps");
jdbc.update("DELETE FROM users WHERE user_id IN ('alice', 'bob')");
envId = jdbc.queryForObject(
"SELECT id FROM environments WHERE slug = 'default'", UUID.class);
// Seed users (alice, bob) — use the bare user_id convention; provider is NOT NULL
jdbc.update("INSERT INTO users (user_id, provider) VALUES (?, 'LOCAL') " +
"ON CONFLICT (user_id) DO NOTHING", "alice");
jdbc.update("INSERT INTO users (user_id, provider) VALUES (?, 'LOCAL') " +
"ON CONFLICT (user_id) DO NOTHING", "bob");
// Seed app
appId = UUID.randomUUID();
jdbc.update("INSERT INTO apps (id, environment_id, slug, display_name) " +
"VALUES (?, ?, 'test-app', 'Test App')",
appId, envId);
// Seed version
versionId = UUID.randomUUID();
jdbc.update("INSERT INTO app_versions (id, app_id, version, jar_path, jar_checksum) " +
"VALUES (?, ?, 1, '/tmp/x.jar', 'abc')",
versionId, appId);
}
@AfterEach
void cleanup() {
jdbc.update("DELETE FROM deployments");
jdbc.update("DELETE FROM app_versions");
jdbc.update("DELETE FROM apps");
jdbc.update("DELETE FROM users WHERE user_id IN ('alice', 'bob')");
}
@Test
void createDeployment_persists_createdBy_and_returns_it() {
Deployment d = deploymentService.createDeployment(appId, versionId, envId, "alice");
assertThat(d.createdBy()).isEqualTo("alice");
String fromDb = jdbc.queryForObject(
"SELECT created_by FROM deployments WHERE id = ?", String.class, d.id());
assertThat(fromDb).isEqualTo("alice");
}
@Test
void promote_persists_createdBy() {
Deployment promoted = deploymentService.promote(appId, versionId, envId, "bob");
assertThat(promoted.createdBy()).isEqualTo("bob");
}
}

View File

@@ -65,7 +65,8 @@ class PostgresDeploymentRepositoryIT extends AbstractPostgresIT {
null
);
UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container");
// pre-V4 rows: no creator (createdBy is nullable)
UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container", null);
repository.saveDeployedConfigSnapshot(deploymentId, snapshot);
// when — load it back
@@ -80,13 +81,34 @@ class PostgresDeploymentRepositoryIT extends AbstractPostgresIT {
@Test
void deployedConfigSnapshot_nullByDefault() {
// deployments created without a snapshot must return null (not throw)
UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container-null");
UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container-null", null);
Deployment loaded = repository.findById(deploymentId).orElseThrow();
assertThat(loaded.deployedConfigSnapshot()).isNull();
}
@Test
void deleteFailedByAppAndEnvironment_keepsStoppedAndActive() {
// given: one STOPPED (checkpoint), one FAILED, one RUNNING
UUID stoppedId = repository.create(appId, appVersionId, envId, "stopped", null);
repository.updateStatus(stoppedId, com.cameleer.server.core.runtime.DeploymentStatus.STOPPED, null, null);
UUID failedId = repository.create(appId, appVersionId, envId, "failed", null);
repository.updateStatus(failedId, com.cameleer.server.core.runtime.DeploymentStatus.FAILED, null, "boom");
UUID runningId = repository.create(appId, appVersionId, envId, "running", null);
repository.updateStatus(runningId, com.cameleer.server.core.runtime.DeploymentStatus.RUNNING, "c1", null);
// when
repository.deleteFailedByAppAndEnvironment(appId, envId);
// then: STOPPED and RUNNING survive; FAILED is gone
assertThat(repository.findById(stoppedId)).isPresent();
assertThat(repository.findById(runningId)).isPresent();
assertThat(repository.findById(failedId)).isEmpty();
}
@Test
void deployedConfigSnapshot_canBeClearedToNull() {
UUID jarVersionId = UUID.randomUUID();
@@ -97,7 +119,7 @@ class PostgresDeploymentRepositoryIT extends AbstractPostgresIT {
null
);
UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container-clear");
UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container-clear", null);
repository.saveDeployedConfigSnapshot(deploymentId, snapshot);
repository.saveDeployedConfigSnapshot(deploymentId, null);

View File

@@ -0,0 +1,58 @@
package com.cameleer.server.app.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import java.util.List;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class V4DeploymentCreatedByMigrationIT extends AbstractPostgresIT {
@Autowired JdbcTemplate jdbc;
@Test
void created_by_column_exists_with_correct_type_and_nullable() {
// Scope to current schema — Testcontainers reuse can otherwise leave
// a previous run's tenant_default schema visible alongside public.
List<Map<String, Object>> cols = jdbc.queryForList(
"SELECT column_name, data_type, is_nullable " +
"FROM information_schema.columns " +
"WHERE table_name = 'deployments' AND column_name = 'created_by' " +
" AND table_schema = current_schema()"
);
assertThat(cols).hasSize(1);
assertThat(cols.get(0)).containsEntry("data_type", "text");
assertThat(cols.get(0)).containsEntry("is_nullable", "YES");
}
@Test
void created_by_index_exists() {
Integer count = jdbc.queryForObject(
"SELECT count(*)::int FROM pg_indexes " +
"WHERE tablename = 'deployments' AND indexname = 'idx_deployments_created_by' " +
" AND schemaname = current_schema()",
Integer.class
);
assertThat(count).isEqualTo(1);
}
@Test
void created_by_has_fk_to_users() {
Integer count = jdbc.queryForObject(
"SELECT count(*)::int FROM information_schema.table_constraints tc " +
"JOIN information_schema.constraint_column_usage ccu " +
" ON tc.constraint_name = ccu.constraint_name " +
"WHERE tc.table_name = 'deployments' " +
" AND tc.constraint_type = 'FOREIGN KEY' " +
" AND ccu.table_name = 'users' " +
" AND ccu.column_name = 'user_id' " +
" AND tc.table_schema = current_schema()",
Integer.class
);
assertThat(count).isGreaterThanOrEqualTo(1);
}
}

View File

@@ -3,5 +3,6 @@ package com.cameleer.server.core.admin;
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE,
DEPLOYMENT
}

View File

@@ -33,7 +33,9 @@ public final class ConfigMerger {
boolVal(appConfig, envConfig, "replayEnabled", true),
stringVal(appConfig, envConfig, "runtimeType", "auto"),
stringVal(appConfig, envConfig, "customArgs", ""),
stringList(appConfig, envConfig, "extraNetworks")
stringList(appConfig, envConfig, "extraNetworks"),
boolVal(appConfig, envConfig, "externalRouting", true),
global.certResolver()
);
}
@@ -107,6 +109,7 @@ public final class ConfigMerger {
int cpuRequest,
String routingMode,
String routingDomain,
String serverUrl
String serverUrl,
String certResolver
) {}
}

View File

@@ -22,19 +22,20 @@ public record Deployment(
DeploymentConfigSnapshot deployedConfigSnapshot,
Instant deployedAt,
Instant stoppedAt,
Instant createdAt
Instant createdAt,
String createdBy
) {
public Deployment withStatus(DeploymentStatus newStatus) {
return new Deployment(id, appId, appVersionId, environmentId, newStatus,
targetState, deploymentStrategy, replicaStates, deployStage,
containerId, containerName, errorMessage, resolvedConfig,
deployedConfigSnapshot, deployedAt, stoppedAt, createdAt);
deployedConfigSnapshot, deployedAt, stoppedAt, createdAt, createdBy);
}
public Deployment withDeployedConfigSnapshot(DeploymentConfigSnapshot snapshot) {
return new Deployment(id, appId, appVersionId, environmentId, status,
targetState, deploymentStrategy, replicaStates, deployStage,
containerId, containerName, errorMessage, resolvedConfig,
snapshot, deployedAt, stoppedAt, createdAt);
snapshot, deployedAt, stoppedAt, createdAt, createdBy);
}
}

View File

@@ -10,9 +10,10 @@ public interface DeploymentRepository {
Optional<Deployment> findById(UUID id);
Optional<Deployment> findActiveByAppIdAndEnvironmentId(UUID appId, UUID environmentId);
Optional<Deployment> findActiveByAppIdAndEnvironmentIdExcluding(UUID appId, UUID environmentId, UUID excludeDeploymentId);
UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName);
UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName, String createdBy);
void updateStatus(UUID id, DeploymentStatus status, String containerId, String errorMessage);
void markDeployed(UUID id);
void markStopped(UUID id);
void deleteTerminalByAppAndEnvironment(UUID appId, UUID environmentId);
/** Delete FAILED deployments for this (app, env). STOPPED deployments are preserved as checkpoints. */
void deleteFailedByAppAndEnvironment(UUID appId, UUID environmentId);
}

View File

@@ -23,19 +23,19 @@ public class DeploymentService {
public Deployment getById(UUID id) { return deployRepo.findById(id).orElseThrow(() -> new IllegalArgumentException("Deployment not found: " + id)); }
/** Create a deployment record. Actual container start is handled by DeploymentExecutor (async). */
public Deployment createDeployment(UUID appId, UUID appVersionId, UUID environmentId) {
public Deployment createDeployment(UUID appId, UUID appVersionId, UUID environmentId, String createdBy) {
App app = appService.getById(appId);
Environment env = envService.getById(environmentId);
String containerName = env.slug() + "-" + app.slug();
deployRepo.deleteTerminalByAppAndEnvironment(appId, environmentId);
UUID deploymentId = deployRepo.create(appId, appVersionId, environmentId, containerName);
deployRepo.deleteFailedByAppAndEnvironment(appId, environmentId);
UUID deploymentId = deployRepo.create(appId, appVersionId, environmentId, containerName, createdBy);
return deployRepo.findById(deploymentId).orElseThrow();
}
/** Promote: deploy the same app version to a different environment. */
public Deployment promote(UUID appId, UUID appVersionId, UUID targetEnvironmentId) {
return createDeployment(appId, appVersionId, targetEnvironmentId);
public Deployment promote(UUID appId, UUID appVersionId, UUID targetEnvironmentId, String createdBy) {
return createDeployment(appId, appVersionId, targetEnvironmentId, createdBy);
}
public void markRunning(UUID deploymentId, String containerId) {

View File

@@ -0,0 +1,31 @@
package com.cameleer.server.core.runtime;
/**
* Supported deployment strategies. Persisted as a kebab-case string on
* ApplicationConfig / ResolvedContainerConfig; {@link #fromWire(String)} is
* the only conversion entry point and falls back to {@link #BLUE_GREEN} for
* unknown or null input so the executor never has to null-check.
*/
public enum DeploymentStrategy {
BLUE_GREEN("blue-green"),
ROLLING("rolling");
private final String wire;
DeploymentStrategy(String wire) {
this.wire = wire;
}
public String toWire() {
return wire;
}
public static DeploymentStrategy fromWire(String value) {
if (value == null) return BLUE_GREEN;
String normalized = value.trim().toLowerCase();
for (DeploymentStrategy s : values()) {
if (s.wire.equals(normalized)) return s;
}
return BLUE_GREEN;
}
}

View File

@@ -23,8 +23,13 @@ import java.util.UUID;
*/
public class DirtyStateCalculator {
// Live-pushed fields are excluded from the deploy diff: changes to them take effect
// via SSE config-update without a redeploy, so they are not "pending deploy" when they
// differ from the last successful deployment snapshot. See ui/rules: the Traces & Taps
// and Route Recording tabs apply with ?apply=live and "never mark dirty".
private static final Set<String> AGENT_CONFIG_IGNORED_KEYS = Set.of(
"version", "updatedAt", "updatedBy", "environment", "application"
"version", "updatedAt", "updatedBy", "environment", "application",
"taps", "tapVersion", "tracedProcessors", "routeRecording"
);
private final ObjectMapper mapper;

View File

@@ -22,7 +22,9 @@ public record ResolvedContainerConfig(
boolean replayEnabled,
String runtimeType,
String customArgs,
List<String> extraNetworks
List<String> extraNetworks,
boolean externalRouting,
String certResolver
) {
public long memoryLimitBytes() {
return (long) memoryLimitMb * 1024 * 1024;

View File

@@ -0,0 +1,60 @@
package com.cameleer.server.core.search;
import java.util.regex.Pattern;
/**
* Structured attribute filter for execution search.
* <p>
* Value semantics:
* <ul>
* <li>{@code value == null} or blank -> key-exists check</li>
* <li>{@code value} contains {@code *} -> wildcard match (translated to SQL LIKE pattern)</li>
* <li>otherwise -> exact match</li>
* </ul>
* <p>
* Keys must match {@code ^[a-zA-Z0-9._-]+$} — they are later inlined into
* ClickHouse SQL via {@code JSONExtractString}, which does not accept a
* parameter placeholder for the JSON path. Values are always parameter-bound.
*/
public record AttributeFilter(String key, String value) {
private static final Pattern KEY_PATTERN = Pattern.compile("^[a-zA-Z0-9._-]+$");
public AttributeFilter {
if (key == null || !KEY_PATTERN.matcher(key).matches()) {
throw new IllegalArgumentException(
"Invalid attribute key: must match " + KEY_PATTERN.pattern() + ", got: " + key);
}
if (value != null && value.isBlank()) {
value = null;
}
}
public boolean isKeyOnly() {
return value == null;
}
public boolean isWildcard() {
return value != null && value.indexOf('*') >= 0;
}
/**
* Returns a SQL LIKE pattern for wildcard matches with {@code %} / {@code _} / {@code \}
* in the source value escaped, or {@code null} for exact / key-only filters.
*/
public String toLikePattern() {
if (!isWildcard()) return null;
StringBuilder sb = new StringBuilder(value.length() + 4);
for (int i = 0; i < value.length(); i++) {
char c = value.charAt(i);
switch (c) {
case '\\' -> sb.append("\\\\");
case '%' -> sb.append("\\%");
case '_' -> sb.append("\\_");
case '*' -> sb.append('%');
default -> sb.append(c);
}
}
return sb.toString();
}
}

View File

@@ -9,7 +9,7 @@ import java.util.List;
* @param q free-text search across message and stack trace
* @param levels log level filter (e.g. ["WARN","ERROR"]), OR-joined
* @param application application ID filter (nullable = all apps)
* @param instanceId agent instance ID filter
* @param instanceId agent instance ID filter (single value; coexists with instanceIds)
* @param exchangeId Camel exchange ID filter
* @param logger logger name substring filter
* @param environment optional environment filter (e.g. "dev", "staging", "prod")
@@ -19,6 +19,9 @@ import java.util.List;
* @param cursor ISO timestamp cursor for keyset pagination
* @param limit page size (1-500, default 100)
* @param sort sort direction: "asc" or "desc" (default "desc")
* @param instanceIds multi-value instance ID filter (IN clause); scopes logs to one deployment's
* replicas when provided. Both instanceId and instanceIds may coexist — both
* conditions apply (AND). Empty/null means no additional filtering.
*/
public record LogSearchRequest(
String q,
@@ -33,7 +36,8 @@ public record LogSearchRequest(
Instant to,
String cursor,
int limit,
String sort
String sort,
List<String> instanceIds
) {
private static final int DEFAULT_LIMIT = 100;
@@ -45,5 +49,6 @@ public record LogSearchRequest(
if (sort == null || !"asc".equalsIgnoreCase(sort)) sort = "desc";
if (levels == null) levels = List.of();
if (sources == null) sources = List.of();
if (instanceIds == null) instanceIds = List.of();
}
}

View File

@@ -54,7 +54,8 @@ public record SearchRequest(
String sortField,
String sortDir,
String afterExecutionId,
String environment
String environment,
List<AttributeFilter> attributeFilters
) {
private static final int DEFAULT_LIMIT = 50;
@@ -83,6 +84,24 @@ public record SearchRequest(
if (offset < 0) offset = 0;
if (sortField == null || !ALLOWED_SORT_FIELDS.contains(sortField)) sortField = "startTime";
if (!"asc".equalsIgnoreCase(sortDir)) sortDir = "desc";
if (attributeFilters == null) attributeFilters = List.of();
}
/** Legacy 21-arg constructor preserved for existing call sites — defaults attributeFilters to empty. */
public SearchRequest(
String status, Instant timeFrom, Instant timeTo,
Long durationMin, Long durationMax, String correlationId,
String text, String textInBody, String textInHeaders, String textInErrors,
String routeId, String instanceId, String processorType,
String applicationId, List<String> instanceIds,
int offset, int limit, String sortField, String sortDir,
String afterExecutionId, String environment
) {
this(status, timeFrom, timeTo, durationMin, durationMax, correlationId,
text, textInBody, textInHeaders, textInErrors,
routeId, instanceId, processorType, applicationId, instanceIds,
offset, limit, sortField, sortDir, afterExecutionId, environment,
List.of());
}
/** Returns the snake_case column name for ORDER BY. */
@@ -96,7 +115,8 @@ public record SearchRequest(
status, timeFrom, timeTo, durationMin, durationMax, correlationId,
text, textInBody, textInHeaders, textInErrors,
routeId, instanceId, processorType, applicationId, resolvedInstanceIds,
offset, limit, sortField, sortDir, afterExecutionId, environment
offset, limit, sortField, sortDir, afterExecutionId, environment,
attributeFilters
);
}
@@ -106,7 +126,8 @@ public record SearchRequest(
status, timeFrom, timeTo, durationMin, durationMax, correlationId,
text, textInBody, textInHeaders, textInErrors,
routeId, instanceId, processorType, applicationId, instanceIds,
offset, limit, sortField, sortDir, afterExecutionId, env
offset, limit, sortField, sortDir, afterExecutionId, env,
attributeFilters
);
}
@@ -122,7 +143,8 @@ public record SearchRequest(
status, ts, timeTo, durationMin, durationMax, correlationId,
text, textInBody, textInHeaders, textInErrors,
routeId, instanceId, processorType, applicationId, instanceIds,
offset, limit, sortField, sortDir, afterExecutionId, environment
offset, limit, sortField, sortDir, afterExecutionId, environment,
attributeFilters
);
}
}

View File

@@ -3,7 +3,6 @@ package com.cameleer.server.core.storage;
import com.cameleer.common.graph.RouteGraph;
import com.cameleer.server.core.ingestion.TaggedDiagram;
import java.util.List;
import java.util.Map;
import java.util.Optional;
@@ -15,7 +14,18 @@ public interface DiagramStore {
Optional<String> findContentHashForRoute(String routeId, String instanceId);
Optional<String> findContentHashForRouteByAgents(String routeId, List<String> instanceIds);
/**
* Return the most recently stored {@code content_hash} for the given
* {@code (applicationId, environment, routeId)} triple, regardless of the
* agent instance that produced it.
*
* <p>Unlike {@link #findContentHashForRoute(String, String)}, this lookup
* is independent of the agent registry — so it keeps working for routes
* whose publishing agents have since been redeployed or removed.
*/
Optional<String> findLatestContentHashForAppRoute(String applicationId,
String routeId,
String environment);
Map<String, String> findProcessorRouteMapping(String applicationId, String environment);
}

View File

@@ -0,0 +1,36 @@
package com.cameleer.server.core.storage;
import com.cameleer.server.core.storage.model.ServerInstanceInfo;
import com.cameleer.server.core.storage.model.ServerMetricCatalogEntry;
import com.cameleer.server.core.storage.model.ServerMetricQueryRequest;
import com.cameleer.server.core.storage.model.ServerMetricQueryResponse;
import java.time.Instant;
import java.util.List;
/**
* Read-side access to the ClickHouse {@code server_metrics} table. Exposed
* to dashboards through {@code /api/v1/admin/server-metrics/**} so SaaS
* control planes don't need direct ClickHouse access.
*/
public interface ServerMetricsQueryStore {
/**
* Catalog of metric names observed in {@code [from, to)} along with their
* type, the set of statistics emitted, and the union of tag keys seen.
*/
List<ServerMetricCatalogEntry> catalog(Instant from, Instant to);
/**
* Distinct {@code server_instance_id} values that wrote at least one
* sample in {@code [from, to)}, with first/last seen timestamps.
*/
List<ServerInstanceInfo> listInstances(Instant from, Instant to);
/**
* Generic time-series query. See {@link ServerMetricQueryRequest} for
* request semantics. Implementations must enforce input validation and
* reject unsafe inputs with {@link IllegalArgumentException}.
*/
ServerMetricQueryResponse query(ServerMetricQueryRequest request);
}

View File

@@ -0,0 +1,16 @@
package com.cameleer.server.core.storage;
import com.cameleer.server.core.storage.model.ServerMetricSample;
import java.util.List;
/**
* Sink for periodic snapshots of the server's own Micrometer meter registry.
* Implementations persist the samples (e.g. to ClickHouse) so server
* self-metrics survive restarts and can be queried historically without an
* external Prometheus.
*/
public interface ServerMetricsStore {
void insertBatch(List<ServerMetricSample> samples);
}

View File

@@ -0,0 +1,15 @@
package com.cameleer.server.core.storage.model;
import java.time.Instant;
/**
* One row of the {@code /api/v1/admin/server-metrics/instances} response.
* Used by dashboards to partition counter-delta computations across server
* process boundaries (each boot rotates the id).
*/
public record ServerInstanceInfo(
String serverInstanceId,
Instant firstSeen,
Instant lastSeen
) {
}

View File

@@ -0,0 +1,17 @@
package com.cameleer.server.core.storage.model;
import java.util.List;
/**
* One row of the {@code /api/v1/admin/server-metrics/catalog} response.
* Surfaces the set of statistics and tag keys observed for a metric across
* the requested window, so dashboards can build selectors without ClickHouse
* access.
*/
public record ServerMetricCatalogEntry(
String metricName,
String metricType,
List<String> statistics,
List<String> tagKeys
) {
}

View File

@@ -0,0 +1,10 @@
package com.cameleer.server.core.storage.model;
import java.time.Instant;
/** One {@code (bucket, value)} point of a server-metrics series. */
public record ServerMetricPoint(
Instant t,
double v
) {
}

View File

@@ -0,0 +1,40 @@
package com.cameleer.server.core.storage.model;
import java.time.Instant;
import java.util.List;
import java.util.Map;
/**
* Request contract for the generic server-metrics time-series query.
*
* <p>{@code aggregation} controls how multiple samples within a bucket
* collapse: {@code avg|sum|max|min|latest}. {@code mode} controls counter
* handling: {@code raw} returns values as stored (cumulative for counters),
* {@code delta} returns per-bucket positive-clipped differences computed
* per {@code server_instance_id}.
*
* <p>{@code statistic} filters which Micrometer sub-measurement to read
* ({@code value} / {@code count} / {@code total_time} / {@code total} /
* {@code max} / {@code mean}). {@code mean} is a derived statistic for
* timers: {@code sum(total_time|total) / sum(count)} per bucket.
*
* <p>{@code groupByTags} splits the output into one series per unique tag
* combination. {@code filterTags} narrows the input to samples whose tag
* map matches every entry.
*
* <p>{@code serverInstanceIds} is an optional allow-list. When null or
* empty all instances observed in the window are included.
*/
public record ServerMetricQueryRequest(
String metric,
String statistic,
Instant from,
Instant to,
Integer stepSeconds,
List<String> groupByTags,
Map<String, String> filterTags,
String aggregation,
String mode,
List<String> serverInstanceIds
) {
}

View File

@@ -0,0 +1,14 @@
package com.cameleer.server.core.storage.model;
import java.util.List;
/** Response of the generic server-metrics time-series query. */
public record ServerMetricQueryResponse(
String metric,
String statistic,
String aggregation,
String mode,
int stepSeconds,
List<ServerMetricSeries> series
) {
}

View File

@@ -0,0 +1,23 @@
package com.cameleer.server.core.storage.model;
import java.time.Instant;
import java.util.Map;
/**
* A single sample of the server's own Micrometer registry, captured by a
* scheduled snapshot and destined for the ClickHouse {@code server_metrics}
* table. One {@code ServerMetricSample} per Micrometer {@code Measurement},
* so Timers and DistributionSummaries produce multiple samples per tick
* (distinguished by {@link #statistic()}).
*/
public record ServerMetricSample(
String tenantId,
Instant collectedAt,
String serverInstanceId,
String metricName,
String metricType,
String statistic,
double value,
Map<String, String> tags
) {
}

View File

@@ -0,0 +1,14 @@
package com.cameleer.server.core.storage.model;
import java.util.List;
import java.util.Map;
/**
* One series of the server-metrics query response, identified by its
* {@link #tags} group (empty map when the query had no {@code groupByTags}).
*/
public record ServerMetricSeries(
Map<String, String> tags,
List<ServerMetricPoint> points
) {
}

View File

@@ -9,4 +9,10 @@ class AuditCategoryTest {
assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull();
assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull();
}
@Test
void deploymentCategoryPresent() {
assertThat(AuditCategory.valueOf("DEPLOYMENT"))
.isEqualTo(AuditCategory.DEPLOYMENT);
}
}

View File

@@ -22,7 +22,7 @@ class ChunkAccumulatorTest {
public void store(com.cameleer.server.core.ingestion.TaggedDiagram d) {}
public Optional<com.cameleer.common.graph.RouteGraph> findByContentHash(String h) { return Optional.empty(); }
public Optional<String> findContentHashForRoute(String r, String a) { return Optional.empty(); }
public Optional<String> findContentHashForRouteByAgents(String r, List<String> a) { return Optional.empty(); }
public Optional<String> findLatestContentHashForAppRoute(String app, String r, String env) { return Optional.empty(); }
public Map<String, String> findProcessorRouteMapping(String app, String env) { return Map.of(); }
};

View File

@@ -0,0 +1,34 @@
package com.cameleer.server.core.runtime;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class DeploymentStrategyTest {
@Test
void fromWire_knownValues() {
assertThat(DeploymentStrategy.fromWire("blue-green")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
assertThat(DeploymentStrategy.fromWire("rolling")).isEqualTo(DeploymentStrategy.ROLLING);
}
@Test
void fromWire_caseInsensitiveAndTrims() {
assertThat(DeploymentStrategy.fromWire("BLUE-GREEN")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
assertThat(DeploymentStrategy.fromWire(" Rolling ")).isEqualTo(DeploymentStrategy.ROLLING);
}
@Test
void fromWire_unknownOrNullFallsBackToBlueGreen() {
assertThat(DeploymentStrategy.fromWire(null)).isEqualTo(DeploymentStrategy.BLUE_GREEN);
assertThat(DeploymentStrategy.fromWire("")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
assertThat(DeploymentStrategy.fromWire("canary")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
}
@Test
void toWire_roundTrips() {
for (DeploymentStrategy s : DeploymentStrategy.values()) {
assertThat(DeploymentStrategy.fromWire(s.toWire())).isEqualTo(s);
}
}
}

View File

@@ -5,6 +5,7 @@ import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import java.util.UUID;
@@ -114,9 +115,9 @@ class DirtyStateCalculatorTest {
DirtyStateCalculator calc = CALC;
ApplicationConfig deployed = new ApplicationConfig();
deployed.setTracedProcessors(Map.of("proc-1", "DEBUG"));
deployed.setSensitiveKeys(List.of("password", "token"));
ApplicationConfig desired = new ApplicationConfig();
desired.setTracedProcessors(Map.of("proc-1", "TRACE"));
desired.setSensitiveKeys(List.of("password", "token", "secret"));
UUID jarId = UUID.randomUUID();
DeploymentConfigSnapshot snap = new DeploymentConfigSnapshot(jarId, deployed, Map.of(), null);
@@ -124,7 +125,29 @@ class DirtyStateCalculatorTest {
assertThat(result.dirty()).isTrue();
assertThat(result.differences()).extracting(DirtyStateResult.Difference::field)
.contains("agentConfig.tracedProcessors.proc-1");
.anyMatch(f -> f.startsWith("agentConfig.sensitiveKeys"));
}
@Test
void livePushedFields_doNotMarkDirty() {
// Taps, tracedProcessors, and routeRecording apply via live SSE push (never redeploy),
// so they must not appear as "pending deploy" when they differ from the last deploy snapshot.
ApplicationConfig deployed = new ApplicationConfig();
deployed.setTracedProcessors(Map.of("proc-1", "DEBUG"));
deployed.setRouteRecording(Map.of("route-a", true));
deployed.setTapVersion(1);
ApplicationConfig desired = new ApplicationConfig();
desired.setTracedProcessors(Map.of("proc-1", "TRACE", "proc-2", "DEBUG"));
desired.setRouteRecording(Map.of("route-a", false, "route-b", true));
desired.setTapVersion(5);
UUID jarId = UUID.randomUUID();
DeploymentConfigSnapshot snap = new DeploymentConfigSnapshot(jarId, deployed, Map.of(), null);
DirtyStateResult result = CALC.compute(jarId, desired, Map.of(), snap);
assertThat(result.dirty()).isFalse();
assertThat(result.differences()).isEmpty();
}
@Test

View File

@@ -0,0 +1,88 @@
package com.cameleer.server.core.search;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class AttributeFilterTest {
@Test
void keyOnly_blankValue_normalizesToNull() {
AttributeFilter f = new AttributeFilter("order", "");
assertThat(f.value()).isNull();
assertThat(f.isKeyOnly()).isTrue();
assertThat(f.isWildcard()).isFalse();
}
@Test
void keyOnly_nullValue_isKeyOnly() {
AttributeFilter f = new AttributeFilter("order", null);
assertThat(f.isKeyOnly()).isTrue();
}
@Test
void exactValue_isNotWildcard() {
AttributeFilter f = new AttributeFilter("order", "47");
assertThat(f.isKeyOnly()).isFalse();
assertThat(f.isWildcard()).isFalse();
}
@Test
void starInValue_isWildcard() {
AttributeFilter f = new AttributeFilter("order", "47*");
assertThat(f.isWildcard()).isTrue();
}
@Test
void invalidKey_throws() {
assertThatThrownBy(() -> new AttributeFilter("bad key", "x"))
.isInstanceOf(IllegalArgumentException.class)
.hasMessageContaining("attribute key");
}
@Test
void blankKey_throws() {
assertThatThrownBy(() -> new AttributeFilter(" ", null))
.isInstanceOf(IllegalArgumentException.class);
}
@Test
void wildcardPattern_escapesLikeMetaCharacters() {
AttributeFilter f = new AttributeFilter("order", "a_b%c\\d*");
assertThat(f.toLikePattern()).isEqualTo("a\\_b\\%c\\\\d%");
}
@Test
void exactValue_toLikePattern_returnsNull() {
AttributeFilter f = new AttributeFilter("order", "47");
assertThat(f.toLikePattern()).isNull();
}
@Test
void searchRequest_canonicalCtor_acceptsAttributeFilters() {
SearchRequest r = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null,
java.util.List.of(new AttributeFilter("order", "47")));
assertThat(r.attributeFilters()).hasSize(1);
assertThat(r.attributeFilters().get(0).key()).isEqualTo("order");
}
@Test
void searchRequest_legacyCtor_defaultsAttributeFiltersToEmpty() {
SearchRequest r = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null);
assertThat(r.attributeFilters()).isEmpty();
}
@Test
void searchRequest_compactCtor_normalizesNullAttributeFilters() {
SearchRequest r = new SearchRequest(
null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, 0, 50, null, null, null, null,
null);
assertThat(r.attributeFilters()).isNotNull().isEmpty();
}
}

View File

@@ -204,6 +204,21 @@ All query endpoints require JWT with `VIEWER` role or higher.
| `GET /api/v1/agents/events-log` | Agent lifecycle event history |
| `GET /api/v1/agents/{id}/metrics` | Agent-level metrics time series |
### Server Self-Metrics
The server snapshots its own Micrometer registry into ClickHouse every 60 s (table `server_metrics`) — JVM, HTTP, DB pools, agent/ingestion business metrics, and alerting metrics. Use this instead of running an external Prometheus when building a server-health dashboard. The live scrape endpoint `/api/v1/prometheus` remains available for traditional scraping.
Two ways to consume:
| Consumer | How |
|---|---|
| Web UI (built-in) | `/admin/server-metrics` — 17 panels across Server Health / JVM / HTTP & DB / Alerting / Deployments with a 15 min7 d time picker. ADMIN-only, hidden when `infrastructureendpoints=false`. |
| Programmatic | Generic REST API under `/api/v1/admin/server-metrics/{catalog,instances,query}`. Same visibility rules. Designed for SaaS control planes that embed server health in their own console. |
Persistence can be disabled entirely with `cameleer.server.self-metrics.enabled=false`. Snapshot cadence via `cameleer.server.self-metrics.interval-ms` (default `60000`).
See [`docs/server-self-metrics.md`](./server-self-metrics.md) for the full metric catalog, API contract, and ready-to-paste query bodies for each panel.
---
## Application Configuration

522
docs/server-self-metrics.md Normal file
View File

@@ -0,0 +1,522 @@
# Server Self-Metrics — Reference for Dashboard Builders
This is the reference for anyone building a server-health dashboard on top of the Cameleer server. It documents the `server_metrics` ClickHouse table, every series you can expect to find in it, and the queries we recommend for each dashboard panel.
> **tl;dr** — Every 60 s, every meter in the server's Micrometer registry (all `cameleer.*`, all `alerting_*`, and the full Spring Boot Actuator set) is written into ClickHouse as one row per `(meter, statistic)` pair. No external Prometheus required.
---
## Built-in admin dashboard
The server ships a ready-to-use dashboard at **`/admin/server-metrics`** in the web UI. It renders the 17 panels listed below using `ThemedChart` from the design system. The window is driven by the app-wide time-range control in the TopBar (same one used by Exchanges, Dashboard, and Runtime), so every panel automatically reflects the range you've selected globally. Visibility mirrors the Database and ClickHouse admin pages:
- Requires the `ADMIN` role.
- Hidden when `cameleer.server.security.infrastructureendpoints=false` (both the backend endpoints and the sidebar entry disappear).
Use this page for single-tenant installs and dev/staging — it's the fastest path to "is the server healthy right now?". For multi-tenant control planes, cross-environment rollups, or embedding metrics inside an existing operations console, call the REST API below instead.
---
## Table schema
```sql
server_metrics (
tenant_id LowCardinality(String) DEFAULT 'default',
collected_at DateTime64(3),
server_instance_id LowCardinality(String),
metric_name LowCardinality(String),
metric_type LowCardinality(String), -- counter|gauge|timer|distribution_summary|long_task_timer|other
statistic LowCardinality(String) DEFAULT 'value',
metric_value Float64,
tags Map(String, String) DEFAULT map(),
server_received_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
PARTITION BY (tenant_id, toYYYYMM(collected_at))
ORDER BY (tenant_id, collected_at, server_instance_id, metric_name, statistic)
TTL toDateTime(collected_at) + INTERVAL 90 DAY DELETE
```
### What each column means
| Column | Notes |
|---|---|
| `tenant_id` | Always filter by this. One tenant per server deployment. |
| `server_instance_id` | Stable id per server process: property → `HOSTNAME` env → DNS → random UUID. **Rotates on restart**, so counters restart cleanly. |
| `metric_name` | Raw Micrometer meter name. Dots, not underscores. |
| `metric_type` | Lowercase Micrometer `Meter.Type`. |
| `statistic` | Which `Measurement` this row is. Counters/gauges → `value` or `count`. Timers → three rows per tick: `count`, `total_time` (or `total`), `max`. Distribution summaries → same shape. |
| `metric_value` | `Float64`. Non-finite values (NaN / ±∞) are dropped before insert. |
| `tags` | `Map(String, String)`. Micrometer tags copied verbatim. |
### Counter semantics (important)
Counters are **cumulative totals since meter registration**, same convention as Prometheus. To get a rate, compute a delta within a `server_instance_id`:
```sql
SELECT
toStartOfMinute(collected_at) AS minute,
metric_value - any(metric_value) OVER (
PARTITION BY server_instance_id, metric_name, tags
ORDER BY collected_at
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
) AS per_minute_delta
FROM server_metrics
WHERE metric_name = 'cameleer.ingestion.drops'
AND statistic = 'count'
ORDER BY minute;
```
On restart the `server_instance_id` rotates, so a simple `LAG()` partitioned by `server_instance_id` gives monotonic segments without fighting counter resets.
### Retention
90 days, TTL-enforced. Long-term trend analysis is out of scope — ship raw data to an external warehouse if you need more.
---
## How to query
Use the REST API — `/api/v1/admin/server-metrics/**`. It does the tenant filter, range bounding, counter-delta math, and input validation for you, so the dashboard never needs direct ClickHouse access. ADMIN role required (standard `/api/v1/admin/**` RBAC gate).
### `GET /catalog`
Enumerate every `metric_name` observed in a window, with its `metric_type`, the set of statistics emitted, and the union of tag keys.
```
GET /api/v1/admin/server-metrics/catalog?from=2026-04-22T00:00:00Z&to=2026-04-23T00:00:00Z
Authorization: Bearer <admin-jwt>
```
```json
[
{
"metricName": "cameleer.agents.connected",
"metricType": "gauge",
"statistics": ["value"],
"tagKeys": ["state"]
},
{
"metricName": "cameleer.ingestion.drops",
"metricType": "counter",
"statistics": ["count"],
"tagKeys": ["reason"]
},
...
]
```
`from`/`to` are optional; default is the last 1 h.
### `GET /instances`
Enumerate the `server_instance_id` values that wrote at least one sample in the window, with `firstSeen` / `lastSeen`. Use this when you need to annotate restarts on a graph or reason about counter-delta partitions.
```
GET /api/v1/admin/server-metrics/instances?from=2026-04-22T00:00:00Z&to=2026-04-23T00:00:00Z
```
```json
[
{ "serverInstanceId": "srv-prod-b", "firstSeen": "2026-04-22T14:30:00Z", "lastSeen": "2026-04-23T00:00:00Z" },
{ "serverInstanceId": "srv-prod-a", "firstSeen": "2026-04-22T00:00:00Z", "lastSeen": "2026-04-22T14:25:00Z" }
]
```
### `POST /query` — generic time-series
The workhorse. One endpoint covers every panel in the dashboard.
```
POST /api/v1/admin/server-metrics/query
Authorization: Bearer <admin-jwt>
Content-Type: application/json
```
Request body:
```json
{
"metric": "cameleer.ingestion.drops",
"statistic": "count",
"from": "2026-04-22T00:00:00Z",
"to": "2026-04-23T00:00:00Z",
"stepSeconds": 60,
"groupByTags": ["reason"],
"filterTags": { },
"aggregation": "sum",
"mode": "delta",
"serverInstanceIds": null
}
```
Response:
```json
{
"metric": "cameleer.ingestion.drops",
"statistic": "count",
"aggregation": "sum",
"mode": "delta",
"stepSeconds": 60,
"series": [
{
"tags": { "reason": "buffer_full" },
"points": [
{ "t": "2026-04-22T00:00:00.000Z", "v": 0.0 },
{ "t": "2026-04-22T00:01:00.000Z", "v": 5.0 },
{ "t": "2026-04-22T00:02:00.000Z", "v": 5.0 }
]
}
]
}
```
#### Request field reference
| Field | Type | Required | Description |
|---|---|---|---|
| `metric` | string | yes | Metric name. Regex `^[a-zA-Z0-9._]+$`. |
| `statistic` | string | no | `value` / `count` / `total` / `total_time` / `max` / `mean`. `mean` is a derived statistic for timers: `sum(total_time \| total) / sum(count)` per bucket. |
| `from`, `to` | ISO-8601 instant | yes | Half-open window. `to - from ≤ 31 days`. |
| `stepSeconds` | int | no | Bucket size. Clamped to [10, 3600]. Default 60. |
| `groupByTags` | string[] | no | Emit one series per unique combination of these tag values. Tag keys regex `^[a-zA-Z0-9._]+$`. |
| `filterTags` | map<string,string> | no | Narrow to samples whose tag map contains every entry. Values bound via parameter — no injection. |
| `aggregation` | string | no | Within-bucket reducer for raw mode: `avg` (default), `sum`, `max`, `min`, `latest`. For `mode=delta` this controls cross-instance aggregation (defaults to `sum` of per-instance deltas). |
| `mode` | string | no | `raw` (default) or `delta`. Delta mode computes per-`server_instance_id` positive-clipped differences and then aggregates across instances — so you get a rate-like time series that survives server restarts. |
| `serverInstanceIds` | string[] | no | Allow-list. When null or empty, every instance in the window is included. |
#### Validation errors
Any `IllegalArgumentException` surfaces as `400 Bad Request` with `{"error": "…"}`. Triggers:
- unsafe characters in identifiers
- `from ≥ to` or range > 31 days
- `stepSeconds` outside [10, 3600]
- result cardinality > 500 series (reduce `groupByTags` or tighten `filterTags`)
### Direct ClickHouse (fallback)
If you need something the generic query can't express (complex joins, percentile aggregates, materialized-view rollups), reach for `/api/v1/admin/clickhouse/query` (`infrastructureendpoints=true`, ADMIN) or a dedicated read-only CH user scoped to `server_metrics`. All direct queries must filter by `tenant_id`.
---
## Metric catalog
Every series below is populated. Names follow Micrometer conventions (dots, not underscores). Use these as the starting point for dashboard panels — pick the handful you care about, ignore the rest.
### Cameleer business metrics — agent + ingestion
Source: `cameleer-server-app/.../metrics/ServerMetrics.java`.
| Metric | Type | Statistic | Tags | Meaning |
|---|---|---|---|---|
| `cameleer.agents.connected` | gauge | `value` | `state` (live/stale/dead/shutdown) | Count of agents in each lifecycle state |
| `cameleer.agents.sse.active` | gauge | `value` | — | Active SSE connections (command channel) |
| `cameleer.agents.transitions` | counter | `count` | `transition` (went_stale/went_dead/recovered) | Cumulative lifecycle transitions |
| `cameleer.ingestion.buffer.size` | gauge | `value` | `type` (execution/processor/log/metrics) | Write buffer depth — spikes mean ingestion is lagging |
| `cameleer.ingestion.accumulator.pending` | gauge | `value` | — | Unfinalized execution chunks in the accumulator |
| `cameleer.ingestion.drops` | counter | `count` | `reason` (buffer_full/no_agent/no_identity) | Dropped payloads. Any non-zero rate here is bad. |
| `cameleer.ingestion.flush.duration` | timer | `count`, `total_time`/`total`, `max` | `type` (execution/processor/log) | Flush latency per type |
### Cameleer business metrics — deploy + auth
| Metric | Type | Statistic | Tags | Meaning |
|---|---|---|---|---|
| `cameleer.deployments.outcome` | counter | `count` | `status` (running/failed/degraded) | Deploy outcome tally since boot |
| `cameleer.deployments.duration` | timer | `count`, `total_time`/`total`, `max` | — | End-to-end deploy latency |
| `cameleer.auth.failures` | counter | `count` | `reason` (invalid_token/revoked/oidc_rejected) | Auth failure breakdown — watch for spikes |
### Alerting subsystem metrics
Source: `cameleer-server-app/.../alerting/metrics/AlertingMetrics.java`.
| Metric | Type | Statistic | Tags | Meaning |
|---|---|---|---|---|
| `alerting_rules_total` | gauge | `value` | `state` (enabled/disabled) | Cached 30 s from PostgreSQL `alert_rules` |
| `alerting_instances_total` | gauge | `value` | `state` (firing/resolved/ack'd etc.) | Cached 30 s from PostgreSQL `alert_instances` |
| `alerting_eval_errors_total` | counter | `count` | `kind` (condition kind) | Evaluator exceptions per kind |
| `alerting_circuit_opened_total` | counter | `count` | `kind` | Circuit-breaker open transitions per kind |
| `alerting_eval_duration_seconds` | timer | `count`, `total_time`/`total`, `max` | `kind` | Per-kind evaluation latency |
| `alerting_webhook_delivery_duration_seconds` | timer | `count`, `total_time`/`total`, `max` | — | Outbound webhook POST latency |
| `alerting_notifications_total` | counter | `count` | `status` (sent/failed/retry/giving_up) | Notification outcomes |
### JVM — memory, GC, threads, classes
From Spring Boot Actuator (`JvmMemoryMetrics`, `JvmGcMetrics`, `JvmThreadMetrics`, `ClassLoaderMetrics`).
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `jvm.memory.used` | gauge | `area` (heap/nonheap), `id` (pool name) | Bytes used per pool |
| `jvm.memory.committed` | gauge | `area`, `id` | Bytes committed per pool |
| `jvm.memory.max` | gauge | `area`, `id` | Pool max |
| `jvm.memory.usage.after.gc` | gauge | `area`, `id` | Usage right after the last collection |
| `jvm.buffer.memory.used` | gauge | `id` (direct/mapped) | NIO buffer bytes |
| `jvm.buffer.count` | gauge | `id` | NIO buffer count |
| `jvm.buffer.total.capacity` | gauge | `id` | NIO buffer capacity |
| `jvm.threads.live` | gauge | — | Current live thread count |
| `jvm.threads.daemon` | gauge | — | Current daemon thread count |
| `jvm.threads.peak` | gauge | — | Peak thread count since start |
| `jvm.threads.started` | counter | — | Cumulative threads started |
| `jvm.threads.states` | gauge | `state` (runnable/blocked/waiting/…) | Threads per state |
| `jvm.classes.loaded` | gauge | — | Currently-loaded classes |
| `jvm.classes.unloaded` | counter | — | Cumulative unloaded classes |
| `jvm.gc.pause` | timer | `action`, `cause` | Stop-the-world pause times — watch `max` |
| `jvm.gc.concurrent.phase.time` | timer | `action`, `cause` | Concurrent-phase durations (G1/ZGC) |
| `jvm.gc.memory.allocated` | counter | — | Bytes allocated in the young gen |
| `jvm.gc.memory.promoted` | counter | — | Bytes promoted to old gen |
| `jvm.gc.overhead` | gauge | — | Fraction of CPU spent in GC (01) |
| `jvm.gc.live.data.size` | gauge | — | Live data after last collection |
| `jvm.gc.max.data.size` | gauge | — | Max old-gen size |
| `jvm.info` | gauge | `vendor`, `runtime`, `version` | Constant `1.0`; tags carry the real info |
### Process and system
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `process.cpu.usage` | gauge | — | CPU share consumed by this JVM (01) |
| `process.cpu.time` | gauge | — | Cumulative CPU time (ns) |
| `process.uptime` | gauge | — | ms since start |
| `process.start.time` | gauge | — | Epoch start |
| `process.files.open` | gauge | — | Open FDs |
| `process.files.max` | gauge | — | FD ulimit |
| `system.cpu.count` | gauge | — | Cores visible to the JVM |
| `system.cpu.usage` | gauge | — | System-wide CPU (01) |
| `system.load.average.1m` | gauge | — | 1-min load (Unix only) |
| `disk.free` | gauge | `path` | Free bytes on the mount that holds the JAR |
| `disk.total` | gauge | `path` | Total bytes |
### HTTP server
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `http.server.requests` | timer | `method`, `uri`, `status`, `outcome`, `exception` | Inbound HTTP: count, total_time/total, max |
| `http.server.requests.active` | long_task_timer | `method`, `uri` | In-flight requests — `active_tasks` statistic |
`uri` is the Spring-templated path (`/api/v1/environments/{envSlug}/apps/{appSlug}`), not the raw URL — cardinality stays bounded.
### Tomcat
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `tomcat.sessions.active.current` | gauge | — | Currently active sessions |
| `tomcat.sessions.active.max` | gauge | — | Max concurrent sessions observed |
| `tomcat.sessions.alive.max` | gauge | — | Longest session lifetime (s) |
| `tomcat.sessions.created` | counter | — | Cumulative session creates |
| `tomcat.sessions.expired` | counter | — | Cumulative expirations |
| `tomcat.sessions.rejected` | counter | — | Session creates refused |
| `tomcat.threads.current` | gauge | `name` | Connector thread count |
| `tomcat.threads.busy` | gauge | `name` | Connector threads currently serving a request |
| `tomcat.threads.config.max` | gauge | `name` | Configured max |
### HikariCP (PostgreSQL pool)
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `hikaricp.connections` | gauge | `pool` | Total connections |
| `hikaricp.connections.active` | gauge | `pool` | In-use |
| `hikaricp.connections.idle` | gauge | `pool` | Idle |
| `hikaricp.connections.pending` | gauge | `pool` | Threads waiting for a connection |
| `hikaricp.connections.min` | gauge | `pool` | Configured min |
| `hikaricp.connections.max` | gauge | `pool` | Configured max |
| `hikaricp.connections.creation` | timer | `pool` | Time to open a new connection |
| `hikaricp.connections.acquire` | timer | `pool` | Time to acquire from the pool |
| `hikaricp.connections.usage` | timer | `pool` | Time a connection was in use |
| `hikaricp.connections.timeout` | counter | `pool` | Pool acquisition timeouts — any non-zero rate is a problem |
Pools are named. You'll see `HikariPool-1` (PostgreSQL) and a separate pool for ClickHouse (`clickHouseJdbcTemplate`).
### JDBC generic
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `jdbc.connections.min` | gauge | `name` | Same data as Hikari, surfaced generically |
| `jdbc.connections.max` | gauge | `name` | |
| `jdbc.connections.active` | gauge | `name` | |
| `jdbc.connections.idle` | gauge | `name` | |
### Logging
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `logback.events` | counter | `level` (error/warn/info/debug/trace) | Log events emitted since start — `{level=error}` is a useful panel |
### Spring Boot lifecycle
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `application.started.time` | timer | `main.application.class` | Cold-start duration |
| `application.ready.time` | timer | `main.application.class` | Time to ready |
### Flyway
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `flyway.migrations` | gauge | — | Number of migrations applied (current schema) |
### Executor pools (if any `@Async` executors exist)
When a `ThreadPoolTaskExecutor` bean is registered and tagged, Micrometer adds:
| Metric | Type | Tags | Meaning |
|---|---|---|---|
| `executor.active` | gauge | `name` | Currently-running tasks |
| `executor.queued` | gauge | `name` | Queued tasks |
| `executor.queue.remaining` | gauge | `name` | Queue headroom |
| `executor.pool.size` | gauge | `name` | Current pool size |
| `executor.pool.core` | gauge | `name` | Core size |
| `executor.pool.max` | gauge | `name` | Max size |
| `executor.completed` | counter | `name` | Completed tasks |
---
## Suggested dashboard panels
Below are 17 panels, each expressed as a single `POST /api/v1/admin/server-metrics/query` body. Tenant is implicit in the JWT — the server filters by tenant server-side. `{from}` and `{to}` are dashboard variables.
### Row: server health (top of dashboard)
1. **Agents by state** — stacked area.
```json
{ "metric": "cameleer.agents.connected", "statistic": "value",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["state"], "aggregation": "avg", "mode": "raw" }
```
2. **Ingestion buffer depth by type** — line chart.
```json
{ "metric": "cameleer.ingestion.buffer.size", "statistic": "value",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["type"], "aggregation": "avg", "mode": "raw" }
```
3. **Ingestion drops per minute** — bar chart.
```json
{ "metric": "cameleer.ingestion.drops", "statistic": "count",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["reason"], "mode": "delta" }
```
4. **Auth failures per minute** — same shape as drops, grouped by `reason`.
```json
{ "metric": "cameleer.auth.failures", "statistic": "count",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["reason"], "mode": "delta" }
```
### Row: JVM
5. **Heap used vs committed vs max** — area chart (three overlay queries).
```json
{ "metric": "jvm.memory.used", "statistic": "value",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"filterTags": { "area": "heap" }, "aggregation": "sum", "mode": "raw" }
```
Repeat with `"metric": "jvm.memory.committed"` and `"metric": "jvm.memory.max"`.
6. **CPU %** — line.
```json
{ "metric": "process.cpu.usage", "statistic": "value",
"from": "{from}", "to": "{to}", "stepSeconds": 60, "aggregation": "avg", "mode": "raw" }
```
Overlay with `"metric": "system.cpu.usage"`.
7. **GC pause — max per cause**.
```json
{ "metric": "jvm.gc.pause", "statistic": "max",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["cause"], "aggregation": "max", "mode": "raw" }
```
8. **Thread count** — three overlay lines: `jvm.threads.live`, `jvm.threads.daemon`, `jvm.threads.peak` each with `statistic=value, aggregation=avg, mode=raw`.
### Row: HTTP + DB
9. **HTTP mean latency by URI** — top-N URIs.
```json
{ "metric": "http.server.requests", "statistic": "mean",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["uri"], "filterTags": { "outcome": "SUCCESS" },
"aggregation": "avg", "mode": "raw" }
```
For p99 proxy, repeat with `"statistic": "max"`.
10. **HTTP error rate** — two queries, divide client-side: total requests and 5xx requests.
```json
{ "metric": "http.server.requests", "statistic": "count",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"mode": "delta", "aggregation": "sum" }
```
Then for the 5xx series, add `"filterTags": { "outcome": "SERVER_ERROR" }` and divide.
11. **HikariCP pool saturation** — overlay two queries.
```json
{ "metric": "hikaricp.connections.active", "statistic": "value",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["pool"], "aggregation": "avg", "mode": "raw" }
```
Overlay with `"metric": "hikaricp.connections.pending"`.
12. **Hikari acquire timeouts per minute**.
```json
{ "metric": "hikaricp.connections.timeout", "statistic": "count",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["pool"], "mode": "delta" }
```
### Row: alerting (collapsible)
13. **Alerting instances by state** — stacked.
```json
{ "metric": "alerting_instances_total", "statistic": "value",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["state"], "aggregation": "avg", "mode": "raw" }
```
14. **Eval errors per minute by kind**.
```json
{ "metric": "alerting_eval_errors_total", "statistic": "count",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"groupByTags": ["kind"], "mode": "delta" }
```
15. **Webhook delivery — max per minute**.
```json
{ "metric": "alerting_webhook_delivery_duration_seconds", "statistic": "max",
"from": "{from}", "to": "{to}", "stepSeconds": 60,
"aggregation": "max", "mode": "raw" }
```
### Row: deployments (runtime-enabled only)
16. **Deploy outcomes per hour**.
```json
{ "metric": "cameleer.deployments.outcome", "statistic": "count",
"from": "{from}", "to": "{to}", "stepSeconds": 3600,
"groupByTags": ["status"], "mode": "delta" }
```
17. **Deploy duration mean**.
```json
{ "metric": "cameleer.deployments.duration", "statistic": "mean",
"from": "{from}", "to": "{to}", "stepSeconds": 300,
"aggregation": "avg", "mode": "raw" }
```
For p99 proxy, repeat with `"statistic": "max"`.
---
## Notes for the dashboard implementer
- **Use the REST API.** The server handles tenant filtering, counter deltas, range bounds, and input validation. Direct ClickHouse is a fallback for the handful of cases the generic query can't express.
- **`total_time` vs `total`.** SimpleMeterRegistry and PrometheusMeterRegistry disagree on the tag value for Timer cumulative duration. The server uses PrometheusMeterRegistry in production, so expect `total_time`. The derived `statistic=mean` handles both transparently.
- **Cardinality warning:** `http.server.requests` tags include `uri` and `status`. The server templates URIs, but if someone adds an endpoint that embeds a high-cardinality path segment without `@PathVariable`, you'll see explosion here. The API caps responses at 500 series; you'll get a 400 if you blow past it.
- **The dashboard is read-only.** There's no write path — only the server writes into `server_metrics`.
---
## Changelog
- 2026-04-23 — initial write. Write-only backend.
- 2026-04-23 — added generic REST API (`/api/v1/admin/server-metrics/{catalog,instances,query}`) so dashboards don't need direct ClickHouse access. All 17 suggested panels now expressed as single-endpoint queries.
- 2026-04-24 — shipped the built-in `/admin/server-metrics` UI dashboard. Gated by `infrastructureendpoints` + ADMIN, identical visibility to `/admin/{database,clickhouse}`. Source: `ui/src/pages/Admin/ServerMetricsAdminPage.tsx`.
- 2026-04-24 — dashboard now uses the global time-range control (`useGlobalFilters`) instead of a page-local picker. Bucket size auto-scales with the selected window (10 s → 1 h). Query hooks now take a `ServerMetricsRange = { from: Date; to: Date }` instead of a `windowSeconds` number so they work for any absolute or rolling range the TopBar supplies.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,225 @@
# Deployment Strategies (blue-green + rolling) — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make `deploymentStrategy` actually affect runtime behavior. Support **blue-green** (all-at-once, default) and **rolling** (per-replica) deployments with correct semantics. Unblock real blue/green by giving each deployment a unique container-name generation suffix so old + new replicas can coexist during the swap.
**Current state (interim fix landed in `f8dccaae`):** strategy field exists but executor doesn't branch on it; a destroy-then-start flow runs regardless. This plan replaces that interim behavior.
**Architecture:**
- Append an 8-char **`gen`** suffix (first 8 chars of `deployment.id`) to container name AND `CAMELEER_AGENT_INSTANCEID`. Unique per deployment; no new DB state.
- Add a `cameleer.generation` Docker label so Grafana/Prometheus can pin deploy boundaries without regex on instance-id.
- Branch `DeploymentExecutor.executeAsync` on strategy:
- **blue-green**: start all N new → health-check all → stop all old. Strict all-healthy: partial = FAILED (old stays running).
- **rolling**: per-replica loop: start new[i] → health-check → stop old[i] → next. Mid-rollout failure → stop failed new[i], leave remaining old[i..n] running, mark FAILED.
- Keep destroy-then-start as the fallback for unknown strategy values (safety net).
**Reference:** interim-fix commit `f8dccaae`; investigation summary in the session log.
---
## File Structure
### Backend (new / modified)
- **Create:** `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java` — enum `BLUE_GREEN, ROLLING`; `fromWire(String)` with blue-green fallback; `toWire()` → "blue-green" / "rolling".
- **Modify:** `cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java` — add `gen` computation, strategy branching, per-strategy START_REPLICAS + HEALTH_CHECK + SWAP_TRAFFIC flows. Rewrite the body of `executeAsync` so stages 46 dispatch on strategy. Extract helper methods `deployBlueGreen` and `deployRolling` to keep each path readable.
- **Modify:** `cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java` — take `gen` argument; emit `cameleer.generation` label; `cameleer.instance-id` becomes `{envSlug}-{appSlug}-{replicaIndex}-{gen}`.
- **Modify:** `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java``containerName` stored on the row becomes `env.slug() + "-" + app.slug()` (unchanged — already just the group-name for DB/operator visibility; real Docker name is computed in the executor).
- **Modify:** `cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerIT.java` — update the single assertion that pins `container_name` format if any (spotted at line ~112 in the investigation).
- **Create:** `cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java` — two tests: all-replicas-healthy path stops old after new, and partial-healthy aborts preserving old.
- **Create:** `cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java` — two tests: happy rolling 3→3 replacement, and fail-on-replica-1 preserves remaining old replicas.
### UI
- **Modify:** `ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/ResourcesTab.tsx` — confirm the strategy dropdown offers "blue-green" and "rolling" with descriptive labels + a hint line.
- **Modify:** `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/StatusCard.tsx` — surface `deployment.deploymentStrategy` as a small text/badge near the version badge (read-only).
### Docs + rules
- **Modify:** `.claude/rules/docker-orchestration.md` — rewrite the "DeploymentExecutor Details" and "Blue/green strategy" sections to describe the new behavior and the `gen` suffix; retire the interim destroy-then-start note.
- **Modify:** `.claude/rules/app-classes.md` — update the `DeploymentExecutor` bullet under `runtime/`.
- **Modify:** `.claude/rules/core-classes.md` — note new `DeploymentStrategy` enum under `runtime/`.
---
## Phase 1 — Core: DeploymentStrategy enum + gen utility
### Task 1.1: DeploymentStrategy enum
**Files:** Create `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java`.
- [ ] Create enum with two constants `BLUE_GREEN`, `ROLLING`.
- [ ] Add `toWire()` returning `"blue-green"` / `"rolling"`.
- [ ] Add `fromWire(String)` — case-insensitive match; unknown or null → `BLUE_GREEN` with no throw (safety fallback). Returns enum, never null.
**Verification:** unit test covering known + unknown + null inputs.
### Task 1.2: Generation suffix helper
- [ ] Decide location — inline static helper on `DeploymentExecutor` is fine (`private static String gen(UUID id) { return id.toString().substring(0,8); }`). No new file needed.
---
## Phase 2 — Executor: gen-suffixed naming + `cameleer.generation` label
This phase is purely the naming change; no strategy branching yet. After this phase, redeploy still uses the destroy-then-start interim, but containers carry the new names + label.
### Task 2.1: TraefikLabelBuilder — accept `gen`, emit generation label
**Files:** Modify `TraefikLabelBuilder.java`.
- [ ] Add `String gen` as a new arg on `build(...)`.
- [ ] Change `instanceId` construction: `envSlug + "-" + appSlug + "-" + replicaIndex + "-" + gen`.
- [ ] Add label `cameleer.generation = gen`.
- [ ] Leave the Traefik router/service label keys using `svc = envSlug + "-" + appSlug` (unchanged — routing is generation-agnostic so load balancing across old+new works automatically).
### Task 2.2: DeploymentExecutor — compute gen once, thread through
**Files:** Modify `DeploymentExecutor.executeAsync`.
- [ ] At the top of the try block (after `env`, `app`, `config` resolution), compute `String gen = gen(deployment.id());`.
- [ ] In the replica loop: `String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + gen;` and `String containerName = tenantId + "-" + instanceId;`.
- [ ] Pass `gen` to `TraefikLabelBuilder.build(...)`.
- [ ] Set `CAMELEER_AGENT_INSTANCEID=instanceId` (already done, just verify the new value propagates).
- [ ] Leave `replicaStates[].containerName` stored as the new full name.
### Task 2.3: Update the one brittle test
**Files:** Modify `DeploymentControllerIT.java`.
- [ ] Relax the container-name assertion to `startsWith("default-default-deploy-test-")` or similar — verify behavior, not exact suffix.
**Verification after Phase 2:**
- `mvn -pl cameleer-server-app -am test -Dtest=DeploymentSnapshotIT,DeploymentControllerIT,PostgresDeploymentRepositoryIT`
- All green; container names now include gen; redeploy still works via the interim destroy-then-start flow (which will be replaced in Phase 3).
---
## Phase 3 — Blue-green strategy (default)
### Task 3.1: Extract `deployBlueGreen(...)` helper
**Files:** Modify `DeploymentExecutor.java`.
- [ ] Move the current START_REPLICAS → HEALTH_CHECK → SWAP_TRAFFIC body into a new `private void deployBlueGreen(...)` method.
- [ ] Signature: take `deployment`, `app`, `env`, `config`, `resolvedRuntimeType`, `mainClass`, `gen`, `primaryNetwork`, `additionalNets`.
### Task 3.2: Reorder for proper blue-green
- [ ] Remove the pre-flight "stop previous" block added in `f8dccaae` (will be replaced by post-health swap).
- [ ] Order: start all new → wait all healthy → find previous active (via `findActiveByAppIdAndEnvironmentIdExcluding`) → stop old containers + mark old row STOPPED.
- [ ] Strict all-healthy: if `healthyCount < config.replicas()`, stop the new containers we just started, mark deployment FAILED with `"blue-green: %d/%d replicas healthy; preserving previous deployment"`. Do **not** touch the old deployment.
### Task 3.3: Wire strategy dispatch
- [ ] At the point where `deployBlueGreen` is called, check `DeploymentStrategy.fromWire(config.deploymentStrategy())` and dispatch. For this phase, always call `deployBlueGreen`.
- [ ] `ROLLING` dispatches to `deployRolling(...)` implemented in Phase 4 (stub it to throw `UnsupportedOperationException` for now — will be replaced before this phase lands).
---
## Phase 4 — Rolling strategy
### Task 4.1: `deployRolling(...)` helper
**Files:** Modify `DeploymentExecutor.java`.
- [ ] Same signature as `deployBlueGreen`.
- [ ] Look up previous deployment once at entry via `findActiveByAppIdAndEnvironmentIdExcluding`. Capture its `replicaStates` into a map keyed by replica index.
- [ ] For `i` from 0 to `config.replicas() - 1`:
- [ ] Start new replica `i` (with gen-suffixed name).
- [ ] Wait for this single container to go healthy (per-replica `waitForOneHealthy(containerId, timeoutSeconds)`; reuse `healthCheckTimeout` per replica or introduce a smaller per-replica budget).
- [ ] On success: stop the corresponding old replica `i` by `containerId` from the previous deployment's replicaStates (if present); log continue.
- [ ] On failure: stop + remove all new replicas started so far, mark deployment FAILED with `"rolling: replica %d failed to reach healthy; preserved %d previous replicas"`. Do **not** touch the already-replaced replicas from previous deployment (they're already stopped) or the not-yet-replaced ones (they keep serving).
- [ ] After the loop succeeds for all replicas, mark the previous deployment row STOPPED (its containers are all stopped).
### Task 4.2: Add `waitForOneHealthy`
- [ ] Variant of `waitForAnyHealthy` that polls a single container id. Returns boolean. Same sleep cadence.
### Task 4.3: Replace the Phase 3 stub
- [ ] `ROLLING` dispatch calls `deployRolling` instead of throwing.
---
## Phase 5 — Integration tests
Each IT extends `AbstractPostgresIT`, uses `@MockBean RuntimeOrchestrator`, and overrides `cameleer.server.runtime.healthchecktimeout=2` via `@TestPropertySource`.
### Task 5.1: BlueGreenStrategyIT
**Files:** Create `BlueGreenStrategyIT.java`.
- [ ] **Test 1 `blueGreen_allHealthy_stopsOldAfterNew`:** seed a previous RUNNING deployment (2 replicas). Trigger redeploy with `containerConfig.deploymentStrategy=blue-green` + replicas=2. Mock orchestrator: new containers return `healthy`. Await new deployment RUNNING. Assert: previous deployment has status STOPPED, its container IDs had `stopContainer`+`removeContainer` called; new deployment replicaStates contain the two new container IDs; `cameleer.generation` label on both new container requests.
- [ ] **Test 2 `blueGreen_partialHealthy_preservesOldAndMarksFailed`:** seed previous RUNNING (2 replicas). New deploy with replicas=2. Mock: container A healthy, container B starting forever. Await new deployment FAILED. Assert: previous deployment still RUNNING; its container IDs were **not** stopped; new deployment errorMessage contains "1/2 replicas healthy".
### Task 5.2: RollingStrategyIT
**Files:** Create `RollingStrategyIT.java`.
- [ ] **Test 1 `rolling_allHealthy_replacesOneByOne`:** seed previous RUNNING (3 replicas). New deploy with strategy=rolling, replicas=3. Mock: new containers all healthy. Use `ArgumentCaptor` on `startContainer` to observe start order. Assert: start[0] → stop[old0] → start[1] → stop[old1] → start[2] → stop[old2]; new deployment RUNNING with 3 replicaStates; old deployment STOPPED.
- [ ] **Test 2 `rolling_failsMidRollout_preservesRemainingOld`:** seed previous RUNNING (3 replicas). New deploy strategy=rolling. Mock: new[0] healthy, new[1] never healthy. Await FAILED. Assert: new[0] was stopped during cleanup; old[0] was stopped (replaced before the failure); old[1] + old[2] still RUNNING; new deployment errorMessage contains "replica 1".
---
## Phase 6 — UI strategy indicator
### Task 6.1: Strategy dropdown polish
**Files:** Modify `ResourcesTab.tsx`.
- [ ] Verify the `<select>` has options `blue-green` and `rolling`.
- [ ] Add a one-line description under the dropdown: "Blue-green: start all new, swap when healthy. Rolling: replace one replica at a time."
### Task 6.2: Strategy on StatusCard
**Files:** Modify `DeploymentTab/StatusCard.tsx`.
- [ ] Add a small subtle text line in the grid: `<span>Strategy</span><span>{deployment.deploymentStrategy}</span>` (read-only, mono text ok).
---
## Phase 7 — Docs + rules updates
### Task 7.1: Update `.claude/rules/docker-orchestration.md`
- [ ] Replace the "DeploymentExecutor Details" section with the new flow (gen suffix, strategy dispatch, per-strategy ordering).
- [ ] Update the "Deployment Status Model" table — `DEGRADED` now means "post-deploy replica crashed"; failed-during-deploy is always `FAILED`.
- [ ] Add a short "Deployment Strategies" section: behavior of blue-green vs rolling, resource peak, failure semantics.
### Task 7.2: Update `.claude/rules/app-classes.md`
- [ ] Under `runtime/``DeploymentExecutor` bullet: add "branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`. Container name format: `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{gen}` where gen = 8-char prefix of deployment UUID."
### Task 7.3: Update `.claude/rules/core-classes.md`
- [ ] Add under `runtime/`: `DeploymentStrategy` — enum BLUE_GREEN, ROLLING; `fromWire` falls back to BLUE_GREEN; note stored as kebab-case string on config.
---
## Rollout sequence
1. Phase 1 (enum + helper) — trivial, land as one commit.
2. Phase 2 (naming + generation label) — one commit; interim destroy-then-start still active; regenerates no OpenAPI (no controller change).
3. Phase 3 (blue-green as default) — one commit replacing the interim flow. This is where real behavior changes.
4. Phase 4 (rolling) — one commit.
5. Phase 5 (4 ITs) — one commit; run `mvn test` against affected modules.
6. Phase 6 (UI) — one commit; `npx tsc` clean.
7. Phase 7 (docs) — one commit.
Total: 7 commits, all atomic.
## Acceptance
- Existing `DeploymentSnapshotIT` still passes.
- New `BlueGreenStrategyIT` (2 tests) and `RollingStrategyIT` (2 tests) pass.
- Browser QA: redeploy with `deploymentStrategy=blue-green` vs `rolling` produces the expected container timeline (inspect via `docker ps`); Prometheus metrics show continuity across deploys when queried by `{cameleer_app, cameleer_environment}`; the `cameleer_generation` label flips per deploy.
- `.claude/rules/docker-orchestration.md` reflects the new behavior.
## Non-goals
- Automatic rollback on blue-green partial failure (old is left running; user redeploys).
- Automatic rollback on rolling mid-failure (remaining old replicas keep running; user redeploys).
- Per-replica `HEALTH_CHECK` stage label in the UI progress bar — the 7-stage progress is reused as-is; strategy dictates internal looping.
- Strategy field validation at container-config save time (executor's `fromWire` fallback absorbs unknown values — consider a follow-up for strict validation if it becomes an issue).

View File

@@ -0,0 +1,252 @@
# Checkpoints in the Identity grid + locale time + remove History — design
**Date:** 2026-04-23
**Scope:** three targeted UX changes on the unified app deployment page, follow-up to `2026-04-23-deployment-page-polish-design.md`.
**Status:** Draft — pending user review.
## 1. Motivation
The previous polish shipped a collapsible `CheckpointsTable` as a standalone section below the Identity & Artifact block. That made the visual hierarchy noisy — Checkpoints became a third section between Identity and the config tabs, competing for attention. The proper home for "how many past deployments exist and what were they" is *inside* the Identity panel, as one more row in its config grid.
Three changes:
1. Move the checkpoints section into the Identity & Artifact config grid as an in-grid row.
2. Format the Deployed-column sub-line to the user's locale (replaces the raw ISO string).
3. Remove the redundant `HistoryDisclosure` from the Deployment tab — the checkpoints table covers the same information and the per-deployment log drill-down now lives in the drawer.
## 2. Design
### 2.1 Checkpoints row in the Identity config grid
**Current structure** (`IdentitySection.tsx`):
```tsx
<div className={styles.section}>
<SectionHeader>Identity & Artifact</SectionHeader>
<div className={styles.configGrid}>
... label + value cells (Application Name, Slug, Environment, External URL, Current Version, Application JAR) ...
</div>
{children} {/* CheckpointsTable + CheckpointDetailDrawer currently render here */}
</div>
```
**New structure:**
```tsx
<div className={styles.section}>
<SectionHeader>Identity & Artifact</SectionHeader>
<div className={styles.configGrid}>
... existing label + value cells ...
{checkpointsSlot} {/* NEW: rendered as direct grid children via React.Fragment */}
</div>
{children} {/* still used — for the portal-rendered CheckpointDetailDrawer */}
</div>
```
**Slot contract.** `IdentitySection` gains a new prop:
```ts
interface IdentitySectionProps {
// ... existing props ...
checkpointsSlot?: ReactNode;
children?: ReactNode;
}
```
`checkpointsSlot` is expected to be a React.Fragment whose children are grid-direct cells (spans / divs). React fragments are transparent to CSS grid, so the inner elements become direct children of `configGrid` and flow into grid cells like the existing rows.
**`CheckpointsTable` rewrite.** Instead of wrapping itself in `<div className={styles.checkpointsSection}>`, the component returns a Fragment of grid-ready children:
```tsx
if (checkpoints.length === 0) {
return null;
}
return (
<>
<span className={styles.configLabel}>Checkpoints</span>
<div className={styles.checkpointsTriggerCell}>
<button
type="button"
className={styles.checkpointsTrigger}
onClick={() => setOpen((v) => !v)}
aria-expanded={open}
>
<span className={styles.checkpointsChevron}>{open ? '\u25BE' : '\u25B8'}</span>
{open ? 'Collapse' : 'Expand'} ({checkpoints.length})
</button>
</div>
{open && (
<div className={styles.checkpointsTableFullRow}>
<table>...</table>
{hidden > 0 && !expanded && (
<button type="button" className={styles.showOlderBtn} onClick={...}>
Show older (N) archived, postmortem only
</button>
)}
</div>
)}
</>
);
```
**Why this layout.**
- The trigger button sits in the value column (180px label + 1fr value). When closed, the row reads `Checkpoints ▸ Expand (5)`.
- When opened, a third grid child appears: a div that spans both columns (`grid-column: 1 / -1`) containing the `<table>` + optional "Show older" button. This gives the 7-column table the full grid width so columns don't crush.
- The trigger remains in the value cell of the label row above — collapse/expand stays attached to its label.
**CSS changes** (`AppDeploymentPage.module.css`):
*Add:*
```css
.checkpointsTriggerCell {
display: flex;
align-items: center;
}
.checkpointsTrigger {
display: inline-flex;
align-items: center;
gap: 6px;
background: none;
border: none;
padding: 0;
color: var(--text-primary);
cursor: pointer;
font: inherit;
text-align: left;
}
.checkpointsTrigger:hover {
color: var(--amber);
}
.checkpointsTableFullRow {
grid-column: 1 / -1;
margin-top: 4px;
}
```
*Remove (no longer referenced):*
- `.checkpointsSection`
- `.checkpointsHeader` + `.checkpointsHeader:hover`
- `.checkpointsCount`
*Keep:* `.checkpointsChevron` (still used by the trigger for the arrow). `.checkpointsTable`, `.jarCell`, `.jarName`, `.jarStrike`, `.archivedHint`, `.isoSubline`, `.muted`, `.strategyPill`, `.outcomePill`, `.outcome-*`, `.chevron`, `.showOlderBtn`, `.checkpointArchived` — all still referenced by the table body.
*Also remove* (cleanup — unrelated dead weight from the retired `Checkpoints.tsx` row-list view, safe to delete because no TSX references remain):
- `.checkpointsRow`
- `.disclosureToggle`
- `.checkpointList`
- `.checkpointRow`
- `.checkpointMeta`
- Standalone `.checkpointArchived { color: var(--warning); font-size: 12px; }` (the table-row variant `.checkpointsTable tr.checkpointArchived { opacity: 0.55; }` stays)
- `.historyRow` (see §2.3)
### 2.2 Deployed-column locale sub-line
In `CheckpointsTable.tsx`, the Deployed `<td>` currently renders:
```tsx
<td>
{d.deployedAt && timeAgo(d.deployedAt)}
<div className={styles.isoSubline}>{d.deployedAt}</div>
</td>
```
Replace with:
```tsx
<td>
{d.deployedAt && timeAgo(d.deployedAt)}
<div className={styles.isoSubline}>
{d.deployedAt && new Date(d.deployedAt).toLocaleString()}
</div>
</td>
```
`new Date(iso).toLocaleString()` uses the browser's resolved locale via the Intl API. No locale plumbing, no new util.
Primary "5h ago" display stays unchanged.
### 2.3 Remove the History disclosure from the Deployment tab
`HistoryDisclosure.tsx` renders a collapsible `DataTable` + nested `StartupLogPanel`. It duplicates information now surfaced via `CheckpointsTable` + `CheckpointDetailDrawer` (which has its own LogsPanel).
**Changes:**
- Delete `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/HistoryDisclosure.tsx`.
- `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/DeploymentTab.tsx` — remove the import and the `<HistoryDisclosure ... />` render at the bottom of the tab.
- `ui/src/pages/AppsTab/AppDeploymentPage/AppDeploymentPage.module.css` — drop the `.historyRow` rule (covered in §2.1's CSS cleanup list).
## 3. Page wiring
`ui/src/pages/AppsTab/AppDeploymentPage/index.tsx` currently passes the table + drawer together as `children` to `IdentitySection`:
```tsx
<IdentitySection ...>
{app && (
<>
<CheckpointsTable ... />
{selectedDep && <CheckpointDetailDrawer ... />}
</>
)}
</IdentitySection>
```
After the change:
```tsx
<IdentitySection
...
checkpointsSlot={app ? <CheckpointsTable ... /> : undefined}
>
{app && selectedDep && <CheckpointDetailDrawer ... />}
</IdentitySection>
```
The drawer continues to pass through as `children` because `SideDrawer` uses `createPortal` — it can live at any DOM depth, but conceptually sits outside the Identity grid so it doesn't become a stray grid cell.
## 4. Files touched
| Path | Change |
|------|--------|
| `ui/src/pages/AppsTab/AppDeploymentPage/IdentitySection.tsx` | Add `checkpointsSlot?: ReactNode`; render inside `configGrid` after JAR row |
| `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx` | Return React.Fragment of grid-ready children; replace header wrapper with `checkpointsTrigger` button; locale sub-line in Deployed cell |
| `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.test.tsx` | Update `expand()` helper to target `/expand|collapse/i`; add test asserting locale sub-line differs from raw ISO |
| `ui/src/pages/AppsTab/AppDeploymentPage/AppDeploymentPage.module.css` | Add `.checkpointsTriggerCell`, `.checkpointsTrigger`, `.checkpointsTableFullRow`; remove obsolete classes listed in §2.1 |
| `ui/src/pages/AppsTab/AppDeploymentPage/index.tsx` | Split `checkpointsSlot` out of `children`; drawer stays in `children` |
| `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/DeploymentTab.tsx` | Remove `HistoryDisclosure` import + render |
| `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/HistoryDisclosure.tsx` | **Delete** |
## 5. Testing
**Unit (vitest + RTL):**
- Update `CheckpointsTable.test.tsx`:
- `expand()` helper targets `screen.getByRole('button', { name: /expand|collapse/i })`.
- The "defaults to collapsed" test asserts the trigger button exists and reads `Expand (1)`; rows hidden.
- The "clicking header expands" test clicks the button (now labeled `Expand`); after click, button label is `Collapse`; rows visible.
- One new test: render the table with `deployedAt: '2026-04-23T10:35:00Z'`, expand, grab the `.isoSubline` element, assert its text contains neither the raw ISO `T` nor `Z`, i.e. it was parsed into a localized form. (Avoids asserting the exact string — CI locales vary.)
**Manual smoke:**
- Page loads → `Checkpoints | ▸ Expand (N)` as a grid row under Application JAR. Collapsed by default.
- Click trigger → text swaps to `▾ Collapse (N)`; table appears below, spanning full grid width.
- Deployed column sub-line shows a local-format date/time (e.g. `4/23/2026, 12:35:00 PM` in `en-US`).
- Deployment tab no longer shows `▶ History (N)` below `Startup Logs`.
- `CheckpointDetailDrawer` still opens on row click (unaffected).
- Empty state: app with no checkpoints shows no Checkpoints row at all.
## 6. Non-goals
- No changes to `CheckpointDetailDrawer` layout or behavior.
- No changes to `timeAgo` (other components still use it).
- No new locale-formatting helpers; `toLocaleString()` inline at the one callsite.
- Not touching primary Deployed column display (keeps "5h ago").
- No changes to the `CheckpointsTable` columns themselves.
## 7. Open questions
None — all resolved during brainstorming.

View File

@@ -0,0 +1,264 @@
# Checkpoints table redesign + deployment audit gap closure
**Date:** 2026-04-23
**Status:** Spec — pending implementation
**Affects:** App deployment page, deployments backend, audit log
## Context
The Checkpoints disclosure on the unified app deployment page (`ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx`) currently renders past deployments as a cramped row list — a Badge, a "12m ago" label, and a Restore button. It hides the operator information that matters most when reasoning about a checkpoint: who deployed it, the JAR filename (not just the version number), the deployment outcome, and access to the logs and config snapshot the deployment ran with.
Investigating this also surfaced a **gap in the audit log**: `DeploymentController.deploy / stop / promote` make zero `auditService.log(...)` calls. Container deployments — the most consequential operations the server performs — leave no audit trail today. Closing this gap is in scope because it's prerequisite to the "Deployed by" column.
## Goals
1. Replace the cramped checkpoints list with a real table (DS `DataTable`) showing version, JAR filename, deployer, time, strategy, and outcome.
2. Capture and display "who deployed" — backend gains a `created_by` column on `deployments`, populated from `SecurityContextHolder`.
3. Audit deploy / stop / promote operations under a new `AuditCategory.DEPLOYMENT` value.
4. Provide an in-page detail view (side drawer) where the operator can review the deployment's logs and config snapshot before deciding to restore, with an optional diff against the current live config.
5. Cap the visible checkpoint list at the environment's JAR retention count, since older entries cannot be restored.
## Out of scope
- Sortable column headers (default newest-first is enough)
- Deep-linking via `?checkpoint=<id>` query param
- "Remember last drawer tab" preference
- Bulk actions on checkpoints
- Promoting `SideDrawer` into `@cameleer/design-system` (wait for a second consumer)
## Backend changes
### Audit category
Add `DEPLOYMENT` to `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java`:
```java
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE,
DEPLOYMENT
}
```
The `AuditCategory.valueOf(...)` lookup in `AuditLogController` picks this up automatically. The Admin → Audit page filter dropdown gets one new option in `ui/src/pages/Admin/AuditLogPage.tsx`.
### Audit calls in `DeploymentController`
Add `AuditService` injection and write audit rows on every successful and failed lifecycle operation. Action codes:
| Method | Action | Target | Details |
|---|---|---|---|
| `deploy` | `deploy_app` | `deployment.id().toString()` | `{ appSlug, envSlug, appVersionId, jarFilename, version }` |
| `stop` | `stop_deployment` | `deploymentId.toString()` | `{ appSlug, envSlug }` |
| `promote` | `promote_deployment` | `deploymentId.toString()` | `{ sourceEnv, targetEnv, appSlug, appVersionId }` |
Each `try` branch writes `AuditResult.SUCCESS`; `catch (IllegalArgumentException)` writes `AuditResult.FAILURE` with the exception message in details before returning the existing 404. Pattern matches `OutboundConnectionAdminController`.
### Flyway migration `V2__add_deployment_created_by.sql`
```sql
ALTER TABLE deployments ADD COLUMN created_by TEXT REFERENCES users(user_id);
CREATE INDEX idx_deployments_created_by ON deployments (created_by);
```
Nullable — existing rows stay `NULL` (rendered as `—` in UI). New rows always populated. No backfill: pre-V2 history is unrecoverable, and the column starts paying off from the next deploy onward.
### Service signature change
`DeploymentService.createDeployment(appId, appVersionId, envId, createdBy)` and `promote(targetAppId, sourceVersionId, targetEnvId, createdBy)` both gain a trailing `String createdBy` parameter. `PostgresDeploymentRepository` writes it to the new column.
`DeploymentController` resolves `createdBy` via the existing user-id convention: strip `"user:"` prefix from `SecurityContextHolder.getContext().getAuthentication().getName()`. Same helper pattern as `AlertRuleController` / `OutboundConnectionAdminController`.
### DTO change
`com.cameleer.server.core.runtime.Deployment` record gains `createdBy: String`. UI `Deployment` interface in `ui/src/api/queries/admin/apps.ts` gains `createdBy: string | null`.
### Log filter for the drawer
`LogQueryController.GET /api/v1/environments/{envSlug}/logs` accepts a new multi-value query param `instanceIds` (comma-split, OR-joined). Translates to `WHERE instance_id IN (...)` against the existing `LowCardinality(String)` index on `logs.instance_id` (already part of the `ORDER BY` key).
`LogSearchRequest` gains `instanceIds: List<String>` (null-normalized). Service layer adds the `IN (...)` clause when non-null and non-empty.
The drawer client computes the instance_id list from `Deployment.replicaStates`: for each replica, `instance_id = "{envSlug}-{appSlug}-{replicaIndex}-{generation}"` where generation is the first 8 chars of `deployment.id`. This is the documented format from `.claude/rules/docker-orchestration.md` — pure client-side derivation, no extra server endpoint.
## Drawer infrastructure
The design system provides `Modal` but no drawer. Building a project-local component is preferred over submitting to DS first (single consumer; easier to iterate locally).
**File:** `ui/src/components/SideDrawer.tsx` + `SideDrawer.module.css` (~120 LOC total).
**API:**
```tsx
<SideDrawer
open={!!selectedCheckpoint}
onClose={() => setSelectedCheckpoint(null)}
title={`Deployment v${version} · ${jarFilename}`}
size="lg" // 'md'=560px, 'lg'=720px, 'xl'=900px
footer={<Button onClick={handleRestore}>Restore this checkpoint</Button>}
>
{/* scrollable body */}
</SideDrawer>
```
**Behavior:**
- React portal to `document.body` (mirrors DS `Modal`).
- Slides in from right via `transform: translateX(100% → 0)` over 240ms ease-out.
- Click-blocking transparent backdrop (no dim — the parent table stays readable). Clicking outside closes.
- ESC closes.
- Focus trap on open; focus restored to trigger on close.
- Sticky header (title + close ×) and optional sticky footer.
- Body uses `overflow-y: auto`.
- All colors via DS CSS variables (`--bg`, `--border`, `--shadow-lg`).
**Unsaved-changes interaction:** Opening the drawer is unrestricted. The drawer is read-only — only Restore mutates form state, and Restore already triggers the existing unsaved-changes guard via `useUnsavedChangesBlocker`.
## Checkpoints table
**File:** `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx` — replaces `Checkpoints.tsx`.
**Columns** (left to right):
| Column | Source | Notes |
|---|---|---|
| Version | `versionMap.get(d.appVersionId).version` | Badge "v6" with auto-color (matches existing pattern) |
| JAR | `versionMap.get(d.appVersionId).jarFilename` | Monospace; truncate with tooltip on overflow |
| Deployed by | `d.createdBy` | Bare username; OIDC users show `oidc:<sub>` truncated with tooltip; null shows `—` muted |
| Deployed | `d.deployedAt` | Relative ("12m ago") + ISO subline |
| Strategy | `d.deploymentStrategy` | Small pill: "blue/green" or "rolling" |
| Outcome | `d.status` | Tinted pill: STOPPED (slate), DEGRADED (amber) |
| (chevron) | — | Visual affordance for "row click opens drawer" |
**Interaction:**
- Row click opens `CheckpointDetailDrawer` (no separate "View" button).
- No per-row Restore button — Restore lives inside the drawer to force review before action.
- Pruned-JAR rows (`!versionMap.has(d.appVersionId)`) render at 55% opacity with a strikethrough on the filename and an amber "archived — JAR pruned" hint. Row stays clickable; Restore inside the drawer is disabled with tooltip.
- Currently-running deployment is excluded (already represented by `StatusCard` above).
**Empty state:** When zero checkpoints, render a single full-width muted row: "No past deployments yet."
## Pagination
Visible cap = `Environment.jarRetentionCount` rows (newest first). Anything older has likely been pruned and is not restorable, so it's hidden by default.
- `total ≤ jarRetentionCount` → render all, no expander.
- `total > jarRetentionCount` → render newest `jarRetentionCount` rows + an expander row: **"Show older (N) — archived, postmortem only"**. Expanding renders the full list (older rows already styled as archived).
- `jarRetentionCount === 0` (unlimited or unconfigured) → fall back to a default cap of 10.
`jarRetentionCount` comes from `useEnvironments()` (already in the env-store).
## Drawer detail view
**File:** `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/index.tsx` plus three panel files: `LogsPanel.tsx`, `ConfigPanel.tsx`, `ComparePanel.tsx`.
**Header:**
- Version badge + JAR filename + outcome pill.
- Meta line: "Deployed by **{createdBy}** · {relative} ({ISO}) · Strategy: {strategy} · {N} replicas · ran for {duration}".
- Close × top-right.
**Tabs** (DS `Tabs`):
- **Logs** — default on open
- **Config** — read-only render of the live config sub-tabs, with a view-mode toggle for "Snapshot" vs "Diff vs current"
### Logs panel
Reuses `useInfiniteApplicationLogs` with the new `instanceIds` filter. The hook signature gets an optional `instanceIds: string[]` parameter that flows through to the `LogQueryController` query string.
**Filters** (in addition to `instanceIds`):
- Existing source/level multi-select pills
- New replica filter dropdown: "all (N)" / "0" / "1" / ... / "N-1" — narrows to a single replica when troubleshooting blue-green or rolling deploys.
**Default sort:** newest first (matches operator mental model when investigating a stopped deployment).
**Total line count** displayed in the filter bar.
### Config panel
Renders the five existing live config sub-tabs (`Monitoring`, `Resources`, `Variables`, `SensitiveKeys`, `Deployment`) **read-only**, hydrated from `deployedConfigSnapshot`.
Each sub-tab component (`ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/*`) gains an optional `readOnly?: boolean` prop. When `readOnly` is set:
- All inputs disabled (`disabled` attribute + visual styling)
- Save / edit buttons hidden
- Live banners (`LiveBanner`) hidden — these are not applicable to a frozen snapshot
If a sub-tab currently mixes derived state with form state in a way that makes a clean `readOnly` toggle awkward, refactor that sub-tab as part of this work. Don't proceed with leaky read-only behavior.
**View-mode toggle:** "Snapshot" / "Diff vs current". Default = Snapshot (full read-only render). Diff mode shows differences only — both old and new values per changed field, with red/green left borders, grouped by sub-tab. Each sub-tab pill shows a change-count badge (e.g. "Resources (2)"); sub-tabs with zero differences are dimmed and render a muted "No differences in this section" message when clicked.
Diff base = current live config, pulled via the existing `useApplicationConfig` hook the live form already uses. Algorithm: deep-equal field-level walk between snapshot and current.
The toggle is hidden entirely when JAR is pruned (the missing JAR makes "current vs snapshot" comparison incomplete and misleading).
**Footer:** Sticky. Single primary button "Restore this checkpoint" + helper text "Restoring hydrates the form — you'll still need to Redeploy."
When JAR is pruned: button disabled with tooltip "JAR was pruned by the environment retention policy".
Restore behavior is unchanged from today: closes the drawer + hydrates the form via the existing `onRestore(deploymentId)` callback. No backend call; the eventual Redeploy generates the next `deploy_app` audit row.
## Authorization
`DeploymentController` and `AppController` are already class-level `@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")`, so the deployment page is operator-gated. The new `instanceIds` filter on `LogQueryController` (which is VIEWER+) widens nothing — viewers can already query the same logs by `application + environment`; the filter just narrows.
## Real-time updates
When a new deployment lands, the previous "current" becomes a checkpoint. TanStack Query already polls deployments via the existing `useDeployments(appSlug, envSlug)` hook; the new table consumes the same data — auto-refresh comes for free.
## Tests
**Backend integration tests:**
| Test | What it asserts |
|---|---|
| `V2MigrationIT` | `created_by` column exists, FK valid, index exists |
| `DeploymentServiceCreatedByIT` | `createDeployment(...createdBy)` persists the value |
| `DeploymentControllerAuditIT` | All three lifecycle actions write the expected audit row (action, category, target, details, actor, result) including FAILURE branches |
| `LogQueryControllerInstanceIdsFilterIT` | `?instanceIds=a,b,c` returns only matching rows; empty/missing param preserves prior behavior |
**UI component tests:**
| Test | What it asserts |
|---|---|
| `SideDrawer.test.tsx` | open/close, ESC closes, backdrop click closes, focus trap |
| `CheckpointsTable.test.tsx` | row click opens drawer; pruned-JAR row dimmed + clickable; empty state |
| `CheckpointDetailDrawer.test.tsx` | renders correct logs (mocked instance_id list); Restore disabled when JAR pruned |
| `ConfigPanel.test.tsx` | snapshot mode renders all fields read-only; diff mode counts differences correctly per sub-tab; "no differences" message when section unchanged; toggle hidden when JAR pruned |
## Files touched
**Backend:**
- New: `cameleer-server-app/src/main/resources/db/migration/V2__add_deployment_created_by.sql`
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` (add `DEPLOYMENT`)
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java` (record field)
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java` (signature + impl)
- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java` (insert + map)
- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java` (audit calls + createdBy resolution)
- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java` (instanceIds param)
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` (instanceIds field)
- Regenerate: `cameleer-server-app/src/main/resources/openapi.json` (controller change → SPA types)
**UI:**
- New: `ui/src/components/SideDrawer.tsx` + `SideDrawer.module.css`
- New: `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx`
- New: `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/{index,LogsPanel,ConfigPanel}.tsx` (Compare is a view-mode inside ConfigPanel, not a separate file)
- Modified: `ui/src/pages/AppsTab/AppDeploymentPage/IdentitySection.tsx` (swap Checkpoints → CheckpointsTable)
- Deleted: `ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx`
- Modified: `ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/{Monitoring,Resources,Variables,SensitiveKeys,Deployment}Tab.tsx` (add `readOnly?` prop)
- Modified: `ui/src/api/queries/logs.ts` (`useInfiniteApplicationLogs` accepts `instanceIds`)
- Modified: `ui/src/api/queries/admin/apps.ts` (`Deployment.createdBy` field)
- Modified: `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` (regenerated)
- Modified: `ui/src/pages/Admin/AuditLogPage.tsx` (one new category in filter dropdown)
**Docs / rules:**
- Modified: `.claude/rules/app-classes.md` (DeploymentController audit calls + LogQueryController instanceIds param)
- Modified: `.claude/rules/ui.md` (CheckpointsTable + SideDrawer pattern)
- Modified: `.claude/rules/core-classes.md` (`AuditCategory.DEPLOYMENT`, `Deployment.createdBy`)
## Rollout
Two phases, ideally two PRs:
1. **Backend phase** — V2 migration, `AuditCategory.DEPLOYMENT`, audit calls in `DeploymentController`, `created_by` plumbing through `DeploymentService` / record / repository, `LogQueryController` `instanceIds` param. Ships independently because the column is nullable, the audit category is picked up automatically, and the new log filter is opt-in.
2. **UI phase**`SideDrawer`, `CheckpointsTable`, `CheckpointDetailDrawer`, `readOnly?` props on the five config sub-tabs, audit-page dropdown entry. Depends on the backend PR being merged + the OpenAPI schema regenerated.
Splitting in this order means production gets the audit trail and `created_by` capture immediately, even before the new UI lands, so the audit gap is closed as quickly as possible.

View File

@@ -0,0 +1,190 @@
# Deployment page polish — design
**Date:** 2026-04-23
**Scope:** six targeted UX improvements on the unified deployment page (`ui/src/pages/AppsTab/AppDeploymentPage/*`)
**Status:** Draft — pending user review
## 1. Motivation
The unified deployment page landed recently. Exercising it surfaced six rough edges:
1. No feedback during JAR upload — user clicks `Save` or `Redeploy`, the button spins, nothing happens visually until the upload finishes. On large JARs this feels broken.
2. Startup logs are fixed ascending with no way to see the newest line first and no manual refresh between polls.
3. The checkpoints table is always visible, pushing the config tabs far down even when the user doesn't care about history right now.
4. The replica dropdown in the checkpoint drawer uses a raw `<select>`, visually out of place vs. the rest of the design system.
5. The drawer opens on `Logs` — but `Config` is the restore-decision content. Users currently always click over.
6. (Shipped as `b3d1dd37`.) The "No past deployments yet." empty-state hint was noise — removed.
This spec covers changes 15. #6 is listed for completeness only.
## 2. Design
### 2.1 Upload progress inside the primary action button
**Rationale.** Putting upload progress inside the button the user just clicked keeps all state-machine feedback in a single locus. The button already advertises the active action (`Save``Redeploy``Deploying…`); adding `Uploading…` is a natural extension.
**State machine.**
```
PrimaryActionMode: 'save' | 'redeploy' | 'uploading' | 'deploying'
```
- `save` — config dirty, no active deploy, no active upload
- `redeploy` — server dirty against last deploy, no active upload, no active deploy
- `uploading` — a JAR upload is in flight (applies during both Save and Redeploy paths)
- `deploying` — a deployment row exists with status `STARTING`
**Progress propagation.**
- `useUploadJar` switches from `fetch()` to `XMLHttpRequest`. Fetch gives no upload-progress events; XHR's `upload.onprogress` does.
- `useUploadJar` mutation args gain `onProgress?: (pct: number) => void`.
- `AppDeploymentPage` holds `const [uploadPct, setUploadPct] = useState<number | null>(null)`. `handleSave` and `handleRedeploy` pass `onProgress: setUploadPct` into the mutation and clear to `null` in a `finally` block.
- `computeMode` learns a new input `uploading: boolean` (derived from `uploadPct !== null`). Order: `deploying` > `uploading` > `save|redeploy` choice.
- `PrimaryActionButton` gains an optional `progress?: number` prop and renders a progress overlay when `mode === 'uploading'`.
**Button visual during `uploading`.**
- Label: `Uploading… 42%` (rounded integer).
- Disabled (same UX as `deploying`).
- A tinted-primary fill grows from left to right behind the label. Implementation: wrap the DS `Button` children in a positioned container with an inner `<div>` whose `width: ${pct}%` and background is a translucent primary tint (via `color-mix(in srgb, var(--primary) 30%, transparent)` or equivalent CSS variable). Keeps DS `Button` unmodified.
**Edge cases.**
- Upload fails → `onProgress` stops, the XHR error rejects the mutation, existing `catch` block surfaces the toast, `uploadPct` is cleared in `finally`, button returns to whichever mode `computeMode` picks.
- User navigates away mid-upload → the unsaved-changes blocker already exists and challenges the navigation. Cancellation semantics (whether to `xhr.abort()` when the mutation is superseded) are handled by the mutation's own lifecycle — out of scope for this change.
- No staged JAR (redeploy-only) → `uploadPct` stays `null`, mode goes `redeploy``deploying` with no `uploading` in between (unchanged from today).
### 2.2 Startup log panel — sort + manual refresh
**Layout.** Mirror the Application Log panel in `AgentHealth.tsx:899-917`:
```
┌──────────────────────────────────────────────────────────────┐
│ STARTUP LOGS ● live polling every 3s 42 entries ↓ ↻ │
└──────────────────────────────────────────────────────────────┘
│ <LogViewer entries...> │
└──────────────────────────────────────────────────────────────┘
```
- Reuse `ui/src/styles/log-panel.module.css` (`logCard`, `logHeader`, `headerActions`).
- Sort toggle: DS `Button variant="ghost" size="sm"` with unicode arrow (`↓` desc / `↑` asc). `title` prop: `"Newest first"` / `"Oldest first"`.
- Refresh: DS `Button variant="ghost" size="sm"` wrapping `<RefreshCw size={14} />` from `lucide-react`. `title="Refresh"`.
**Sort semantics.**
- Default sort is **desc** (newest first). User's pain point is that the interesting lines are the most recent ones.
- `useStartupLogs` signature extends to:
```ts
useStartupLogs(application, environment, deployCreatedAt, isStarting, sort: 'asc' | 'desc')
```
- `sort` is passed straight into `LogSearchParams` so the **backend** fetch respects it. Limit remains 500. The 500-line cap applies from the sort direction, so desc gets the latest 500 and asc gets the oldest 500. (Pre-existing limitation; not addressed here.)
- Display direction matches fetch direction — `LogViewer` is passed whatever order the server returns.
**Refresh behavior.**
- Calls the TanStack Query `refetch()`.
- After the refetch resolves, scroll the panel's content container to the "latest" edge:
- `sort === 'asc'` → scroll to bottom (newest is at the bottom).
- `sort === 'desc'` → scroll to top (newest is at the top).
- Requires a `useRef<HTMLDivElement>` on a new scroll wrapper around `LogViewer` inside `StartupLogPanel`. `LogViewer` itself does not forward a scroll ref — confirmed in DS `index.es.d.ts` — so we add a wrapping `<div ref={scrollRef} className={...}>` with `overflow: auto` and call `scrollRef.current.scrollTo({ top, behavior: 'smooth' })` after the refetch resolves.
**Polling behavior unchanged.** 3-second polling while `isStarting` still happens via `refetchInterval`. Manual refresh is orthogonal.
### 2.3 Checkpoints table collapsible, default collapsed
**Rationale.** Deployments happen infrequently; when the user is on this page, they're usually tuning *current* config, not reviewing history. Collapsing by default reclaims vertical space for the config tabs.
**Behavior.**
- `CheckpointsTable` renders a clickable header row **above** the table body:
```
▸ Checkpoints (7) ← collapsed (default)
▾ Checkpoints (7) ← expanded
```
- Chevron and label are part of the same `<button>` (keyboard-accessible). No separate icon component needed — unicode ``/`` match the existing codebase style.
- Local component state: `const [open, setOpen] = useState(false)`.
- When `open === false`, the `<table>` and the "Show older" expander are not rendered — only the header row.
- The existing "no checkpoints" early-return (`null`) is preserved — no header row at all when there's nothing to show.
**Styling.** New `.checkpointsHeader` class in `AppDeploymentPage.module.css`:
- Same horizontal padding as the table cells, small gap between chevron and label, subdued color on hover.
- Muted count `(N)` in `var(--text-muted)`.
### 2.4 Replica dropdown uses DS Select
**Change.** In `CheckpointDetailDrawer/LogsPanel.tsx:36-47`, replace the native `<select>` with `Select` from `@cameleer/design-system`.
```tsx
<Select
value={String(replicaFilter)}
onChange={(e) => {
const v = e.target.value;
setReplicaFilter(v === 'all' ? 'all' : Number(v));
}}
options={[
{ value: 'all', label: `all (${deployment.replicaStates.length})` },
...deployment.replicaStates.map((_, i) => ({ value: String(i), label: String(i) })),
]}
/>
```
- Label stays `Replica:` as a sibling element (DS `Select` doesn't include an inline label slot).
- No behavior change beyond styling.
### 2.5 Drawer tabs — Config first, default Config
**Change.** In `CheckpointDetailDrawer/index.tsx`:
- Reverse the `tabs` array in the `<Tabs>` call so `Config` precedes `Logs`.
- Change the initial tab state from `useState<TabId>('logs')` to `useState<TabId>('config')`.
Rationale: `Config` is the restore-decision content (what variables, what resources, what monitoring settings did this checkpoint have). `Logs` is supporting/post-mortem material. The first tab should be the one users land on for the default question.
## 3. Files to touch
| File | Change |
|------|--------|
| `ui/src/pages/AppsTab/AppDeploymentPage/PrimaryActionButton.tsx` | Add `'uploading'` mode + `progress` prop; `computeMode` takes `uploading` input |
| `ui/src/pages/AppsTab/AppDeploymentPage/AppDeploymentPage.module.css` | `.checkpointsHeader`; progress-overlay styles for the primary button |
| `ui/src/pages/AppsTab/AppDeploymentPage/index.tsx` | `uploadPct` state; pass `onProgress` into both upload call sites |
| `ui/src/api/queries/admin/apps.ts` | `useUploadJar` → XHR; `onProgress` mutation arg |
| `ui/src/components/StartupLogPanel.tsx` | Header layout rewrite; sort state; refresh handler; scroll ref |
| `ui/src/components/StartupLogPanel.module.css` | Header-action styles if not covered by shared `log-panel.module.css` |
| `ui/src/api/queries/logs.ts` | `useStartupLogs` adds `sort` parameter |
| `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx` | Collapse state + header row |
| `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/LogsPanel.tsx` | DS `Select` for replica filter |
| `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/index.tsx` | Tab reorder + default `'config'` |
## 4. Testing
**Unit.**
- `useUploadJar` — mock XHR, assert `onProgress` fires during `upload.onprogress`, resolves on 2xx, rejects on non-2xx and on `xhr.onerror`.
- `CheckpointsTable` — collapse toggle: header-only when collapsed, full table visible when open; count in header matches `checkpoints.length`.
- `StartupLogPanel` — sort toggle flips the query sort parameter; refresh calls `refetch` and calls `scrollTo` on the right end per sort direction.
**Component rendering.**
- `PrimaryActionButton` — renders `Uploading… 42%` in `'uploading'` mode with the overlay element width bound to `progress`.
**Manual smoke.**
- Save-with-JAR: button transitions `Save` → `Uploading…` → back to whatever the post-save mode is.
- Redeploy-with-JAR: button transitions `Redeploy` → `Uploading…` → `Deploying…`.
- Redeploy-only (no staged JAR): button transitions `Redeploy` → `Deploying…` (no `Uploading…`).
- Upload fails (simulate 500 from backend): button returns to pre-click mode; error toast shown.
- Startup logs: flip sort; refresh; confirm latest line is visible in each direction.
- Checkpoints: collapsed on load; expanding shows the 10-row table + "Show older" expander if applicable.
- Drawer: open it → lands on `Config`; switching to `Logs` works; replica filter looks like DS components.
## 5. Non-goals
- No change to the startup-logs backend endpoint; the 500-line cap and 3s polling stay.
- No change to the checkpoint drawer's footer (`Restore` button), header, or meta line.
- No change to deployment creation, stop, or delete flows.
- No new test-infra scaffolding (XHR mocking uses what's already in Vitest).
- Already-shipped: the empty-checkpoints `null` return (commit `b3d1dd37`) — not touched again.
## 6. Open questions
None — all decisions resolved during brainstorming.

View File

@@ -73,6 +73,7 @@
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
<useIncrementalCompilation>true</useIncrementalCompilation>
</configuration>
</plugin>
<plugin>

View File

@@ -7,7 +7,8 @@
"dev": "vite",
"dev:local": "cross-env VITE_API_TARGET=http://localhost:8081 vite",
"dev:remote": "cross-env VITE_API_TARGET=http://192.168.50.86:30090 vite",
"build": "tsc -p tsconfig.app.json --noEmit && vite build",
"build": "vite build",
"typecheck": "tsc -p tsconfig.app.json --noEmit",
"lint": "eslint .",
"preview": "vite preview",
"generate-api": "openapi-typescript src/api/openapi.json -o src/api/schema.d.ts",

File diff suppressed because one or more lines are too long

View File

@@ -47,6 +47,7 @@ export interface Deployment {
containerConfig: Record<string, unknown>;
sensitiveKeys: string[] | null;
} | null;
createdBy: string | null;
}
/**
@@ -139,21 +140,47 @@ export function useAppVersions(envSlug: string | undefined, appSlug: string | un
export function useUploadJar() {
const qc = useQueryClient();
return useMutation({
mutationFn: async ({ envSlug, appSlug, file }: { envSlug: string; appSlug: string; file: File }) => {
mutationFn: ({ envSlug, appSlug, file, onProgress }: {
envSlug: string;
appSlug: string;
file: File;
onProgress?: (pct: number) => void;
}) => {
const token = useAuthStore.getState().accessToken;
const form = new FormData();
form.append('file', file);
const res = await fetch(
`${config.apiBaseUrl}${envBase(envSlug)}/${encodeURIComponent(appSlug)}/versions`, {
method: 'POST',
headers: {
...(token ? { Authorization: `Bearer ${token}` } : {}),
'X-Cameleer-Protocol-Version': '1',
},
body: form,
return new Promise<AppVersion>((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.open(
'POST',
`${config.apiBaseUrl}${envBase(envSlug)}/${encodeURIComponent(appSlug)}/versions`,
);
if (token) xhr.setRequestHeader('Authorization', `Bearer ${token}`);
xhr.setRequestHeader('X-Cameleer-Protocol-Version', '1');
xhr.upload.onprogress = (e) => {
if (!onProgress || !e.lengthComputable) return;
onProgress(Math.round((e.loaded / e.total) * 100));
};
xhr.onload = () => {
if (xhr.status < 200 || xhr.status >= 300) {
reject(new Error(`Upload failed: ${xhr.status}`));
return;
}
try {
resolve(JSON.parse(xhr.responseText) as AppVersion);
} catch (err) {
reject(err instanceof Error ? err : new Error('Invalid response'));
}
};
xhr.onerror = () => reject(new Error('Upload network error'));
xhr.onabort = () => reject(new Error('Upload aborted'));
xhr.send(form);
});
if (!res.ok) throw new Error(`Upload failed: ${res.status}`);
return res.json() as Promise<AppVersion>;
},
onSuccess: (_data, { envSlug, appSlug }) =>
qc.invalidateQueries({ queryKey: ['apps', envSlug, appSlug, 'versions'] }),

View File

@@ -0,0 +1,125 @@
import { useQuery } from '@tanstack/react-query';
import { adminFetch } from './admin-api';
import { useRefreshInterval } from '../use-refresh-interval';
// ── Types ──────────────────────────────────────────────────────────────
export interface ServerMetricCatalogEntry {
metricName: string;
metricType: string;
statistics: string[];
tagKeys: string[];
}
export interface ServerInstanceInfo {
serverInstanceId: string;
firstSeen: string;
lastSeen: string;
}
export interface ServerMetricPoint {
t: string;
v: number;
}
export interface ServerMetricSeries {
tags: Record<string, string>;
points: ServerMetricPoint[];
}
export interface ServerMetricQueryResponse {
metric: string;
statistic: string;
aggregation: string;
mode: string;
stepSeconds: number;
series: ServerMetricSeries[];
}
export interface ServerMetricQueryRequest {
metric: string;
statistic?: string | null;
from: string;
to: string;
stepSeconds?: number | null;
groupByTags?: string[] | null;
filterTags?: Record<string, string> | null;
aggregation?: string | null;
mode?: string | null;
serverInstanceIds?: string[] | null;
}
// ── Range helper ───────────────────────────────────────────────────────
/**
* Time range driving every hook below. Callers pass the window they want
* to render; the hooks never invent their own "now" — that's the job of
* the global time-range control.
*/
export interface ServerMetricsRange {
from: Date;
to: Date;
}
function serializeRange(range: ServerMetricsRange) {
return {
from: range.from.toISOString(),
to: range.to.toISOString(),
};
}
// ── Query Hooks ────────────────────────────────────────────────────────
export function useServerMetricsCatalog(range: ServerMetricsRange) {
const refetchInterval = useRefreshInterval(60_000);
const { from, to } = serializeRange(range);
return useQuery({
queryKey: ['admin', 'server-metrics', 'catalog', from, to],
queryFn: () => {
const params = new URLSearchParams({ from, to });
return adminFetch<ServerMetricCatalogEntry[]>(`/server-metrics/catalog?${params}`);
},
refetchInterval,
});
}
export function useServerMetricsInstances(range: ServerMetricsRange) {
const refetchInterval = useRefreshInterval(60_000);
const { from, to } = serializeRange(range);
return useQuery({
queryKey: ['admin', 'server-metrics', 'instances', from, to],
queryFn: () => {
const params = new URLSearchParams({ from, to });
return adminFetch<ServerInstanceInfo[]>(`/server-metrics/instances?${params}`);
},
refetchInterval,
});
}
/**
* Generic time-series query against the server_metrics table.
*
* The caller owns the window — passing the globally-selected range keeps
* every panel aligned with the app-wide time control and allows inspection
* of historical windows, not just "last N seconds from now".
*/
export function useServerMetricsSeries(
request: Omit<ServerMetricQueryRequest, 'from' | 'to'>,
range: ServerMetricsRange,
opts?: { enabled?: boolean },
) {
const refetchInterval = useRefreshInterval(30_000);
const { from, to } = serializeRange(range);
return useQuery({
queryKey: ['admin', 'server-metrics', 'query', request, from, to],
queryFn: () => {
const body: ServerMetricQueryRequest = { ...request, from, to };
return adminFetch<ServerMetricQueryResponse>('/server-metrics/query', {
method: 'POST',
body: JSON.stringify(body),
});
},
refetchInterval,
enabled: opts?.enabled ?? true,
});
}

View File

@@ -38,14 +38,16 @@ export interface CatalogApp {
deployment: DeploymentSummary | null;
}
export function useCatalog(environment?: string) {
export function useCatalog(environment?: string, from?: string, to?: string) {
const refetchInterval = useRefreshInterval(15_000);
return useQuery({
queryKey: ['catalog', environment],
queryKey: ['catalog', environment, from, to],
queryFn: async () => {
const token = useAuthStore.getState().accessToken;
const params = new URLSearchParams();
if (environment) params.set('environment', environment);
if (from) params.set('from', from);
if (to) params.set('to', to);
const qs = params.toString();
const res = await fetch(`${config.apiBaseUrl}/catalog${qs ? `?${qs}` : ''}`, {
headers: {

View File

@@ -143,13 +143,14 @@ export function useStartupLogs(
environment: string | undefined,
deployCreatedAt: string | undefined,
isStarting: boolean,
sort: 'asc' | 'desc' = 'desc',
) {
const params: LogSearchParams = {
application: application || undefined,
environment: environment ?? '',
source: 'container',
from: deployCreatedAt || undefined,
sort: 'asc',
sort,
limit: 500,
};
@@ -162,8 +163,9 @@ export function useStartupLogs(
export interface UseInfiniteApplicationLogsArgs {
application?: string;
agentId?: string;
sources?: string[]; // multi-select, server-side OR
levels?: string[]; // multi-select, server-side OR
sources?: string[]; // multi-select, server-side OR
levels?: string[]; // multi-select, server-side OR
instanceIds?: string[]; // multi-select instance_id filter, server-side OR (e.g. drawer scopes to one deployment's replicas)
exchangeId?: string;
sort?: 'asc' | 'desc';
isAtTop: boolean;
@@ -191,8 +193,10 @@ export function useInfiniteApplicationLogs(
const sortedSources = (args.sources ?? []).slice().sort();
const sortedLevels = (args.levels ?? []).slice().sort();
const sortedInstanceIds = (args.instanceIds ?? []).slice().sort();
const sourcesParam = sortedSources.join(',');
const levelsParam = sortedLevels.join(',');
const instanceIdsParam = sortedInstanceIds.join(',');
const pageSize = args.pageSize ?? 100;
const sort = args.sort ?? 'desc';
@@ -204,6 +208,7 @@ export function useInfiniteApplicationLogs(
args.agentId ?? '',
args.exchangeId ?? '',
sourcesParam,
instanceIdsParam,
levelsParam,
fromIso ?? '',
toIso ?? '',
@@ -220,6 +225,7 @@ export function useInfiniteApplicationLogs(
if (args.exchangeId) qp.set('exchangeId', args.exchangeId);
if (sourcesParam) qp.set('source', sourcesParam);
if (levelsParam) qp.set('level', levelsParam);
if (instanceIdsParam) qp.set('instanceIds', instanceIdsParam);
if (fromIso) qp.set('from', fromIso);
const effectiveTo = isLiveRange ? new Date().toISOString() : toIso;
if (effectiveTo) qp.set('to', effectiveTo);

Some files were not shown because too many files have changed in this diff Show More