--- paths: - "cameleer-server-app/**/metrics/**" - "cameleer-server-app/**/ServerMetrics*" - "ui/src/pages/RuntimeTab/**" - "ui/src/pages/DashboardTab/**" --- # Prometheus Metrics Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component. The same `MeterRegistry` is also snapshotted to ClickHouse every 60 s by `ServerMetricsSnapshotScheduler` (see "Server self-metrics persistence" at the bottom of this file) — so historical server-health data survives restarts without an external Prometheus. ## Gauges (auto-polled) | Metric | Tags | Source | |--------|------|--------| | `cameleer.agents.connected` | `state` (live, stale, dead, shutdown) | `AgentRegistryService.findByState()` | | `cameleer.agents.sse.active` | — | `SseConnectionManager.getConnectionCount()` | | `cameleer.ingestion.buffer.size` | `type` (execution, processor, log, metrics) | `WriteBuffer.size()` | | `cameleer.ingestion.accumulator.pending` | — | `ChunkAccumulator.getPendingCount()` | ## Counters | Metric | Tags | Instrumented in | |--------|------|-----------------| | `cameleer.ingestion.drops` | `reason` (buffer_full, no_agent, no_identity) | `LogIngestionController` | | `cameleer.agents.transitions` | `transition` (went_stale, went_dead, recovered) | `AgentLifecycleMonitor` | | `cameleer.deployments.outcome` | `status` (running, failed, degraded) | `DeploymentExecutor` | | `cameleer.auth.failures` | `reason` (invalid_token, revoked, oidc_rejected) | `JwtAuthenticationFilter` | ## Timers | Metric | Tags | Instrumented in | |--------|------|-----------------| | `cameleer.ingestion.flush.duration` | `type` (execution, processor, log) | `ExecutionFlushScheduler` | | `cameleer.deployments.duration` | — | `DeploymentExecutor` | ## Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time) | Runtime Type | `prometheus.path` | `prometheus.port` | |---|---|---| | `spring-boot` | `/actuator/prometheus` | `8081` | | `quarkus` / `native` | `/q/metrics` | `9000` | | `plain-java` | `/metrics` | `9464` | All containers also get `prometheus.scrape=true`. These labels enable Prometheus `docker_sd_configs` auto-discovery. ## Agent Metric Names (Micrometer) Agents send `MetricsSnapshot` records with Micrometer-convention metric names. The server stores them generically (ClickHouse `agent_metrics.metric_name`). The UI references specific names in `AgentInstance.tsx` for JVM charts. ### JVM metrics (used by UI) | Metric name | UI usage | |---|---| | `process.cpu.usage.value` | CPU % stat card + chart | | `jvm.memory.used.value` | Heap MB stat card + chart (tags: `area=heap`) | | `jvm.memory.max.value` | Heap max for % calculation (tags: `area=heap`) | | `jvm.threads.live.value` | Thread count chart | | `jvm.gc.pause.total_time` | GC time chart | ### Camel route metrics (stored, queried by dashboard) | Metric name | Type | Tags | |---|---|---| | `camel.exchanges.succeeded.count` | counter | `routeId`, `camelContext` | | `camel.exchanges.failed.count` | counter | `routeId`, `camelContext` | | `camel.exchanges.total.count` | counter | `routeId`, `camelContext` | | `camel.exchanges.failures.handled.count` | counter | `routeId`, `camelContext` | | `camel.route.policy.count` | count | `routeId`, `camelContext` | | `camel.route.policy.total_time` | total | `routeId`, `camelContext` | | `camel.route.policy.max` | gauge | `routeId`, `camelContext` | | `camel.routes.running.value` | gauge | — | Mean processing time = `camel.route.policy.total_time / camel.route.policy.count`. Min processing time is not available (Micrometer does not track minimums). ### Cameleer agent metrics | Metric name | Type | Tags | |---|---|---| | `cameleer.chunks.exported.count` | counter | `instanceId` | | `cameleer.chunks.dropped.count` | counter | `instanceId`, `reason` | | `cameleer.sse.reconnects.count` | counter | `instanceId` | | `cameleer.taps.evaluated.count` | counter | `instanceId` | | `cameleer.metrics.exported.count` | counter | `instanceId` | ## Server self-metrics persistence `ServerMetricsSnapshotScheduler` walks `MeterRegistry.getMeters()` every 60 s (configurable via `cameleer.server.self-metrics.interval-ms`) and writes one row per Micrometer `Measurement` to the ClickHouse `server_metrics` table. Full registry is captured — Spring Boot Actuator series (`jvm.*`, `process.*`, `http.server.requests`, `hikaricp.*`, `jdbc.*`, `tomcat.*`, `logback.events`, `system.*`) plus `cameleer.*` and `alerting_*`. **Table** (`cameleer-server-app/src/main/resources/clickhouse/init.sql`): ``` server_metrics(tenant_id, collected_at, server_instance_id, metric_name, metric_type, statistic, metric_value, tags Map(String,String), server_received_at) ``` - `metric_type` — lowercase Micrometer `Meter.Type` (counter, gauge, timer, distribution_summary, long_task_timer, other) - `statistic` — Micrometer `Statistic.getTagValueRepresentation()` (value, count, total, total_time, max, mean, active_tasks, duration). Timers emit 3 rows per tick (count + total_time + max); gauges/counters emit 1 (`statistic='value'` or `'count'`). - No `environment` column — the server is env-agnostic. - `tenant_id` threaded from `cameleer.server.tenant.id` (single-tenant per server). - `server_instance_id` resolved once at boot by `ServerInstanceIdConfig` (property → HOSTNAME → localhost → UUID fallback). Rotates across restarts so counter resets are unambiguous. - TTL: 90 days (vs 365 for `agent_metrics`). Write-only in v1 — no query endpoint or UI page. Inspect via ClickHouse admin: `/api/v1/admin/clickhouse/query` or direct SQL. - Toggle off entirely with `cameleer.server.self-metrics.enabled=false` (uses `@ConditionalOnProperty`).