cameleer/cameleer-server

Fork 0

Files

hsiegeln 35319dc666

CI / cleanup-branch (push) Has been skipped

Details

CI / build (push) Successful in 1m31s

Details

CI / docker (push) Successful in 1m10s

Details

CI / deploy-feature (push) Has been skipped

Details

CI / deploy (push) Successful in 44s

Details

refactor(ui): server metrics page uses global time range

Drop the page-local DS Select window picker. Drive from() / to() off
useGlobalFilters().timeRange so the dashboard tracks the same TopBar range
as Exchanges / Dashboard / Runtime. Bucket size auto-scales via
stepSecondsFor(windowSeconds) (10 s for ≤30 min → 1 h for >48 h). Query
hooks now take ServerMetricsRange = { from: Date; to: Date } instead of a
windowSeconds number, so they support arbitrary absolute or rolling ranges
the TopBar may supply (not just "now − N"). Toolbar collapses to just the
server-instance badges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-24 09:19:20 +02:00

24 KiB

Raw Blame History

Server Self-Metrics — Reference for Dashboard Builders

This is the reference for anyone building a server-health dashboard on top of the Cameleer server. It documents the server_metrics ClickHouse table, every series you can expect to find in it, and the queries we recommend for each dashboard panel.

tl;dr — Every 60 s, every meter in the server's Micrometer registry (all cameleer.*, all alerting_*, and the full Spring Boot Actuator set) is written into ClickHouse as one row per (meter, statistic) pair. No external Prometheus required.

Built-in admin dashboard

The server ships a ready-to-use dashboard at /admin/server-metrics in the web UI. It renders the 17 panels listed below using ThemedChart from the design system. The window is driven by the app-wide time-range control in the TopBar (same one used by Exchanges, Dashboard, and Runtime), so every panel automatically reflects the range you've selected globally. Visibility mirrors the Database and ClickHouse admin pages:

Requires the ADMIN role.
Hidden when cameleer.server.security.infrastructureendpoints=false (both the backend endpoints and the sidebar entry disappear).

Use this page for single-tenant installs and dev/staging — it's the fastest path to "is the server healthy right now?". For multi-tenant control planes, cross-environment rollups, or embedding metrics inside an existing operations console, call the REST API below instead.

Table schema

server_metrics (
    tenant_id          LowCardinality(String) DEFAULT 'default',
    collected_at       DateTime64(3),
    server_instance_id LowCardinality(String),
    metric_name        LowCardinality(String),
    metric_type        LowCardinality(String),   -- counter|gauge|timer|distribution_summary|long_task_timer|other
    statistic          LowCardinality(String) DEFAULT 'value',
    metric_value       Float64,
    tags               Map(String, String) DEFAULT map(),
    server_received_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
PARTITION BY (tenant_id, toYYYYMM(collected_at))
ORDER BY (tenant_id, collected_at, server_instance_id, metric_name, statistic)
TTL toDateTime(collected_at) + INTERVAL 90 DAY DELETE

What each column means

Column	Notes
`tenant_id`	Always filter by this. One tenant per server deployment.
`server_instance_id`	Stable id per server process: property → `HOSTNAME` env → DNS → random UUID. Rotates on restart, so counters restart cleanly.
`metric_name`	Raw Micrometer meter name. Dots, not underscores.
`metric_type`	Lowercase Micrometer `Meter.Type`.
`statistic`	Which `Measurement` this row is. Counters/gauges → `value` or `count`. Timers → three rows per tick: `count`, `total_time` (or `total`), `max`. Distribution summaries → same shape.
`metric_value`	`Float64`. Non-finite values (NaN / ±∞) are dropped before insert.
`tags`	`Map(String, String)`. Micrometer tags copied verbatim.

Counter semantics (important)

Counters are cumulative totals since meter registration, same convention as Prometheus. To get a rate, compute a delta within a server_instance_id:

SELECT
    toStartOfMinute(collected_at) AS minute,
    metric_value - any(metric_value) OVER (
        PARTITION BY server_instance_id, metric_name, tags
        ORDER BY collected_at
        ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
    ) AS per_minute_delta
FROM server_metrics
WHERE metric_name = 'cameleer.ingestion.drops'
  AND statistic = 'count'
ORDER BY minute;

On restart the server_instance_id rotates, so a simple LAG() partitioned by server_instance_id gives monotonic segments without fighting counter resets.

Retention

90 days, TTL-enforced. Long-term trend analysis is out of scope — ship raw data to an external warehouse if you need more.

How to query

Use the REST API — /api/v1/admin/server-metrics/**. It does the tenant filter, range bounding, counter-delta math, and input validation for you, so the dashboard never needs direct ClickHouse access. ADMIN role required (standard /api/v1/admin/** RBAC gate).

`GET /catalog`

Enumerate every metric_name observed in a window, with its metric_type, the set of statistics emitted, and the union of tag keys.

GET /api/v1/admin/server-metrics/catalog?from=2026-04-22T00:00:00Z&to=2026-04-23T00:00:00Z
Authorization: Bearer <admin-jwt>

[
  {
    "metricName": "cameleer.agents.connected",
    "metricType": "gauge",
    "statistics": ["value"],
    "tagKeys": ["state"]
  },
  {
    "metricName": "cameleer.ingestion.drops",
    "metricType": "counter",
    "statistics": ["count"],
    "tagKeys": ["reason"]
  },
  ...
]

from/to are optional; default is the last 1 h.

`GET /instances`

Enumerate the server_instance_id values that wrote at least one sample in the window, with firstSeen / lastSeen. Use this when you need to annotate restarts on a graph or reason about counter-delta partitions.

GET /api/v1/admin/server-metrics/instances?from=2026-04-22T00:00:00Z&to=2026-04-23T00:00:00Z

[
  { "serverInstanceId": "srv-prod-b", "firstSeen": "2026-04-22T14:30:00Z", "lastSeen": "2026-04-23T00:00:00Z" },
  { "serverInstanceId": "srv-prod-a", "firstSeen": "2026-04-22T00:00:00Z", "lastSeen": "2026-04-22T14:25:00Z" }
]

`POST /query` — generic time-series

The workhorse. One endpoint covers every panel in the dashboard.

POST /api/v1/admin/server-metrics/query
Authorization: Bearer <admin-jwt>
Content-Type: application/json

Request body:

{
  "metric":          "cameleer.ingestion.drops",
  "statistic":       "count",
  "from":            "2026-04-22T00:00:00Z",
  "to":              "2026-04-23T00:00:00Z",
  "stepSeconds":     60,
  "groupByTags":     ["reason"],
  "filterTags":      { },
  "aggregation":     "sum",
  "mode":            "delta",
  "serverInstanceIds": null
}

Response:

{
  "metric":      "cameleer.ingestion.drops",
  "statistic":   "count",
  "aggregation": "sum",
  "mode":        "delta",
  "stepSeconds": 60,
  "series": [
    {
      "tags":   { "reason": "buffer_full" },
      "points": [
        { "t": "2026-04-22T00:00:00.000Z", "v": 0.0 },
        { "t": "2026-04-22T00:01:00.000Z", "v": 5.0 },
        { "t": "2026-04-22T00:02:00.000Z", "v": 5.0 }
      ]
    }
  ]
}

Request field reference

Field	Type	Required	Description
`metric`	string	yes	Metric name. Regex `^[a-zA-Z0-9._]+$`.
`statistic`	string	no	`value` / `count` / `total` / `total_time` / `max` / `mean`. `mean` is a derived statistic for timers: `sum(total_time \| total) / sum(count)` per bucket.
`from`, `to`	ISO-8601 instant	yes	Half-open window. `to - from ≤ 31 days`.
`stepSeconds`	int	no	Bucket size. Clamped to [10, 3600]. Default 60.
`groupByTags`	string[]	no	Emit one series per unique combination of these tag values. Tag keys regex `^[a-zA-Z0-9._]+$`.
`filterTags`	map<string,string>	no	Narrow to samples whose tag map contains every entry. Values bound via parameter — no injection.
`aggregation`	string	no	Within-bucket reducer for raw mode: `avg` (default), `sum`, `max`, `min`, `latest`. For `mode=delta` this controls cross-instance aggregation (defaults to `sum` of per-instance deltas).
`mode`	string	no	`raw` (default) or `delta`. Delta mode computes per-`server_instance_id` positive-clipped differences and then aggregates across instances — so you get a rate-like time series that survives server restarts.
`serverInstanceIds`	string[]	no	Allow-list. When null or empty, every instance in the window is included.

Validation errors

Any IllegalArgumentException surfaces as 400 Bad Request with {"error": "…"}. Triggers:

unsafe characters in identifiers
from ≥ to or range > 31 days
stepSeconds outside [10, 3600]
result cardinality > 500 series (reduce groupByTags or tighten filterTags)

Direct ClickHouse (fallback)

If you need something the generic query can't express (complex joins, percentile aggregates, materialized-view rollups), reach for /api/v1/admin/clickhouse/query (infrastructureendpoints=true, ADMIN) or a dedicated read-only CH user scoped to server_metrics. All direct queries must filter by tenant_id.

Metric catalog

Every series below is populated. Names follow Micrometer conventions (dots, not underscores). Use these as the starting point for dashboard panels — pick the handful you care about, ignore the rest.

Cameleer business metrics — agent + ingestion

Source: cameleer-server-app/.../metrics/ServerMetrics.java.

Metric	Type	Statistic	Tags	Meaning
`cameleer.agents.connected`	gauge	`value`	`state` (live/stale/dead/shutdown)	Count of agents in each lifecycle state
`cameleer.agents.sse.active`	gauge	`value`	—	Active SSE connections (command channel)
`cameleer.agents.transitions`	counter	`count`	`transition` (went_stale/went_dead/recovered)	Cumulative lifecycle transitions
`cameleer.ingestion.buffer.size`	gauge	`value`	`type` (execution/processor/log/metrics)	Write buffer depth — spikes mean ingestion is lagging
`cameleer.ingestion.accumulator.pending`	gauge	`value`	—	Unfinalized execution chunks in the accumulator
`cameleer.ingestion.drops`	counter	`count`	`reason` (buffer_full/no_agent/no_identity)	Dropped payloads. Any non-zero rate here is bad.
`cameleer.ingestion.flush.duration`	timer	`count`, `total_time`/`total`, `max`	`type` (execution/processor/log)	Flush latency per type

Cameleer business metrics — deploy + auth

Metric	Type	Statistic	Tags	Meaning
`cameleer.deployments.outcome`	counter	`count`	`status` (running/failed/degraded)	Deploy outcome tally since boot
`cameleer.deployments.duration`	timer	`count`, `total_time`/`total`, `max`	—	End-to-end deploy latency
`cameleer.auth.failures`	counter	`count`	`reason` (invalid_token/revoked/oidc_rejected)	Auth failure breakdown — watch for spikes

Alerting subsystem metrics

Source: cameleer-server-app/.../alerting/metrics/AlertingMetrics.java.

Metric	Type	Statistic	Tags	Meaning
`alerting_rules_total`	gauge	`value`	`state` (enabled/disabled)	Cached 30 s from PostgreSQL `alert_rules`
`alerting_instances_total`	gauge	`value`	`state` (firing/resolved/ack'd etc.)	Cached 30 s from PostgreSQL `alert_instances`
`alerting_eval_errors_total`	counter	`count`	`kind` (condition kind)	Evaluator exceptions per kind
`alerting_circuit_opened_total`	counter	`count`	`kind`	Circuit-breaker open transitions per kind
`alerting_eval_duration_seconds`	timer	`count`, `total_time`/`total`, `max`	`kind`	Per-kind evaluation latency
`alerting_webhook_delivery_duration_seconds`	timer	`count`, `total_time`/`total`, `max`	—	Outbound webhook POST latency
`alerting_notifications_total`	counter	`count`	`status` (sent/failed/retry/giving_up)	Notification outcomes

JVM — memory, GC, threads, classes

From Spring Boot Actuator (JvmMemoryMetrics, JvmGcMetrics, JvmThreadMetrics, ClassLoaderMetrics).

Metric	Type	Tags	Meaning
`jvm.memory.used`	gauge	`area` (heap/nonheap), `id` (pool name)	Bytes used per pool
`jvm.memory.committed`	gauge	`area`, `id`	Bytes committed per pool
`jvm.memory.max`	gauge	`area`, `id`	Pool max
`jvm.memory.usage.after.gc`	gauge	`area`, `id`	Usage right after the last collection
`jvm.buffer.memory.used`	gauge	`id` (direct/mapped)	NIO buffer bytes
`jvm.buffer.count`	gauge	`id`	NIO buffer count
`jvm.buffer.total.capacity`	gauge	`id`	NIO buffer capacity
`jvm.threads.live`	gauge	—	Current live thread count
`jvm.threads.daemon`	gauge	—	Current daemon thread count
`jvm.threads.peak`	gauge	—	Peak thread count since start
`jvm.threads.started`	counter	—	Cumulative threads started
`jvm.threads.states`	gauge	`state` (runnable/blocked/waiting/…)	Threads per state
`jvm.classes.loaded`	gauge	—	Currently-loaded classes
`jvm.classes.unloaded`	counter	—	Cumulative unloaded classes
`jvm.gc.pause`	timer	`action`, `cause`	Stop-the-world pause times — watch `max`
`jvm.gc.concurrent.phase.time`	timer	`action`, `cause`	Concurrent-phase durations (G1/ZGC)
`jvm.gc.memory.allocated`	counter	—	Bytes allocated in the young gen
`jvm.gc.memory.promoted`	counter	—	Bytes promoted to old gen
`jvm.gc.overhead`	gauge	—	Fraction of CPU spent in GC (0–1)
`jvm.gc.live.data.size`	gauge	—	Live data after last collection
`jvm.gc.max.data.size`	gauge	—	Max old-gen size
`jvm.info`	gauge	`vendor`, `runtime`, `version`	Constant `1.0`; tags carry the real info

Process and system

Metric	Type	Tags	Meaning
`process.cpu.usage`	gauge	—	CPU share consumed by this JVM (0–1)
`process.cpu.time`	gauge	—	Cumulative CPU time (ns)
`process.uptime`	gauge	—	ms since start
`process.start.time`	gauge	—	Epoch start
`process.files.open`	gauge	—	Open FDs
`process.files.max`	gauge	—	FD ulimit
`system.cpu.count`	gauge	—	Cores visible to the JVM
`system.cpu.usage`	gauge	—	System-wide CPU (0–1)
`system.load.average.1m`	gauge	—	1-min load (Unix only)
`disk.free`	gauge	`path`	Free bytes on the mount that holds the JAR
`disk.total`	gauge	`path`	Total bytes

HTTP server

Metric	Type	Tags	Meaning
`http.server.requests`	timer	`method`, `uri`, `status`, `outcome`, `exception`	Inbound HTTP: count, total_time/total, max
`http.server.requests.active`	long_task_timer	`method`, `uri`	In-flight requests — `active_tasks` statistic

uri is the Spring-templated path (/api/v1/environments/{envSlug}/apps/{appSlug}), not the raw URL — cardinality stays bounded.

Tomcat

Metric	Type	Tags	Meaning
`tomcat.sessions.active.current`	gauge	—	Currently active sessions
`tomcat.sessions.active.max`	gauge	—	Max concurrent sessions observed
`tomcat.sessions.alive.max`	gauge	—	Longest session lifetime (s)
`tomcat.sessions.created`	counter	—	Cumulative session creates
`tomcat.sessions.expired`	counter	—	Cumulative expirations
`tomcat.sessions.rejected`	counter	—	Session creates refused
`tomcat.threads.current`	gauge	`name`	Connector thread count
`tomcat.threads.busy`	gauge	`name`	Connector threads currently serving a request
`tomcat.threads.config.max`	gauge	`name`	Configured max

HikariCP (PostgreSQL pool)

Metric	Type	Tags	Meaning
`hikaricp.connections`	gauge	`pool`	Total connections
`hikaricp.connections.active`	gauge	`pool`	In-use
`hikaricp.connections.idle`	gauge	`pool`	Idle
`hikaricp.connections.pending`	gauge	`pool`	Threads waiting for a connection
`hikaricp.connections.min`	gauge	`pool`	Configured min
`hikaricp.connections.max`	gauge	`pool`	Configured max
`hikaricp.connections.creation`	timer	`pool`	Time to open a new connection
`hikaricp.connections.acquire`	timer	`pool`	Time to acquire from the pool
`hikaricp.connections.usage`	timer	`pool`	Time a connection was in use
`hikaricp.connections.timeout`	counter	`pool`	Pool acquisition timeouts — any non-zero rate is a problem

Pools are named. You'll see HikariPool-1 (PostgreSQL) and a separate pool for ClickHouse (clickHouseJdbcTemplate).

JDBC generic

Metric	Type	Tags	Meaning
`jdbc.connections.min`	gauge	`name`	Same data as Hikari, surfaced generically
`jdbc.connections.max`	gauge	`name`
`jdbc.connections.active`	gauge	`name`
`jdbc.connections.idle`	gauge	`name`

Logging

Metric	Type	Tags	Meaning
`logback.events`	counter	`level` (error/warn/info/debug/trace)	Log events emitted since start — `{level=error}` is a useful panel

Spring Boot lifecycle

Metric	Type	Tags	Meaning
`application.started.time`	timer	`main.application.class`	Cold-start duration
`application.ready.time`	timer	`main.application.class`	Time to ready

Flyway

Metric	Type	Tags	Meaning
`flyway.migrations`	gauge	—	Number of migrations applied (current schema)

Executor pools (if any `@Async` executors exist)

When a ThreadPoolTaskExecutor bean is registered and tagged, Micrometer adds:

Metric	Type	Tags	Meaning
`executor.active`	gauge	`name`	Currently-running tasks
`executor.queued`	gauge	`name`	Queued tasks
`executor.queue.remaining`	gauge	`name`	Queue headroom
`executor.pool.size`	gauge	`name`	Current pool size
`executor.pool.core`	gauge	`name`	Core size
`executor.pool.max`	gauge	`name`	Max size
`executor.completed`	counter	`name`	Completed tasks

Suggested dashboard panels

Below are 17 panels, each expressed as a single POST /api/v1/admin/server-metrics/query body. Tenant is implicit in the JWT — the server filters by tenant server-side. {from} and {to} are dashboard variables.

Row: server health (top of dashboard)

Agents by state — stacked area.

{ "metric": "cameleer.agents.connected", "statistic": "value",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["state"], "aggregation": "avg", "mode": "raw" }

Ingestion buffer depth by type — line chart.

{ "metric": "cameleer.ingestion.buffer.size", "statistic": "value",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["type"], "aggregation": "avg", "mode": "raw" }

Ingestion drops per minute — bar chart.

{ "metric": "cameleer.ingestion.drops", "statistic": "count",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["reason"], "mode": "delta" }

Auth failures per minute — same shape as drops, grouped by reason.

{ "metric": "cameleer.auth.failures", "statistic": "count",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["reason"], "mode": "delta" }

Row: JVM

Heap used vs committed vs max — area chart (three overlay queries).

{ "metric": "jvm.memory.used", "statistic": "value",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "filterTags": { "area": "heap" }, "aggregation": "sum", "mode": "raw" }

Repeat with "metric": "jvm.memory.committed" and "metric": "jvm.memory.max".

CPU % — line.

{ "metric": "process.cpu.usage", "statistic": "value",
  "from": "{from}", "to": "{to}", "stepSeconds": 60, "aggregation": "avg", "mode": "raw" }

Overlay with "metric": "system.cpu.usage".

GC pause — max per cause.

{ "metric": "jvm.gc.pause", "statistic": "max",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["cause"], "aggregation": "max", "mode": "raw" }

Thread count — three overlay lines: jvm.threads.live, jvm.threads.daemon, jvm.threads.peak each with statistic=value, aggregation=avg, mode=raw.

Row: HTTP + DB

HTTP mean latency by URI — top-N URIs.

{ "metric": "http.server.requests", "statistic": "mean",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["uri"], "filterTags": { "outcome": "SUCCESS" },
  "aggregation": "avg", "mode": "raw" }

For p99 proxy, repeat with "statistic": "max".

HTTP error rate — two queries, divide client-side: total requests and 5xx requests.

{ "metric": "http.server.requests", "statistic": "count",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "mode": "delta", "aggregation": "sum" }

Then for the 5xx series, add "filterTags": { "outcome": "SERVER_ERROR" } and divide.

HikariCP pool saturation — overlay two queries.

{ "metric": "hikaricp.connections.active", "statistic": "value",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["pool"], "aggregation": "avg", "mode": "raw" }

Overlay with "metric": "hikaricp.connections.pending".

Hikari acquire timeouts per minute.

{ "metric": "hikaricp.connections.timeout", "statistic": "count",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["pool"], "mode": "delta" }

Row: alerting (collapsible)

Alerting instances by state — stacked.

{ "metric": "alerting_instances_total", "statistic": "value",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["state"], "aggregation": "avg", "mode": "raw" }

Eval errors per minute by kind.

{ "metric": "alerting_eval_errors_total", "statistic": "count",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "groupByTags": ["kind"], "mode": "delta" }

Webhook delivery — max per minute.

{ "metric": "alerting_webhook_delivery_duration_seconds", "statistic": "max",
  "from": "{from}", "to": "{to}", "stepSeconds": 60,
  "aggregation": "max", "mode": "raw" }

Row: deployments (runtime-enabled only)

Deploy outcomes per hour.

{ "metric": "cameleer.deployments.outcome", "statistic": "count",
  "from": "{from}", "to": "{to}", "stepSeconds": 3600,
  "groupByTags": ["status"], "mode": "delta" }

Deploy duration mean.

{ "metric": "cameleer.deployments.duration", "statistic": "mean",
  "from": "{from}", "to": "{to}", "stepSeconds": 300,
  "aggregation": "avg", "mode": "raw" }

For p99 proxy, repeat with "statistic": "max".

Notes for the dashboard implementer

Use the REST API. The server handles tenant filtering, counter deltas, range bounds, and input validation. Direct ClickHouse is a fallback for the handful of cases the generic query can't express.
total_time vs total. SimpleMeterRegistry and PrometheusMeterRegistry disagree on the tag value for Timer cumulative duration. The server uses PrometheusMeterRegistry in production, so expect total_time. The derived statistic=mean handles both transparently.
Cardinality warning: http.server.requests tags include uri and status. The server templates URIs, but if someone adds an endpoint that embeds a high-cardinality path segment without @PathVariable, you'll see explosion here. The API caps responses at 500 series; you'll get a 400 if you blow past it.
The dashboard is read-only. There's no write path — only the server writes into server_metrics.

Changelog

2026-04-23 — initial write. Write-only backend.
2026-04-23 — added generic REST API (/api/v1/admin/server-metrics/{catalog,instances,query}) so dashboards don't need direct ClickHouse access. All 17 suggested panels now expressed as single-endpoint queries.
2026-04-24 — shipped the built-in /admin/server-metrics UI dashboard. Gated by infrastructureendpoints + ADMIN, identical visibility to /admin/{database,clickhouse}. Source: ui/src/pages/Admin/ServerMetricsAdminPage.tsx.
2026-04-24 — dashboard now uses the global time-range control (useGlobalFilters) instead of a page-local picker. Bucket size auto-scales with the selected window (10 s → 1 h). Query hooks now take a ServerMetricsRange = { from: Date; to: Date } instead of a windowSeconds number so they work for any absolute or rolling range the TopBar supplies.

24 KiB Raw Blame History Unescape Escape

Server Self-Metrics — Reference for Dashboard Builders

Built-in admin dashboard

Table schema

What each column means

Counter semantics (important)

Retention

How to query

GET /catalog

GET /instances

POST /query — generic time-series

Request field reference

Validation errors

Direct ClickHouse (fallback)

Metric catalog

Cameleer business metrics — agent + ingestion

Cameleer business metrics — deploy + auth

Alerting subsystem metrics

JVM — memory, GC, threads, classes

Process and system

HTTP server

Tomcat

HikariCP (PostgreSQL pool)

JDBC generic

Logging

Spring Boot lifecycle

Flyway

Executor pools (if any @Async executors exist)

Suggested dashboard panels

Row: server health (top of dashboard)

Row: JVM

Row: HTTP + DB

Row: alerting (collapsible)

Row: deployments (runtime-enabled only)

Notes for the dashboard implementer

Changelog

24 KiB

Raw Blame History

`GET /catalog`

`GET /instances`

`POST /query` — generic time-series

Executor pools (if any `@Async` executors exist)