Files
cameleer-server/.claude/rules/metrics.md

86 lines
3.8 KiB
Markdown
Raw Normal View History

---
paths:
- "cameleer-server-app/**/metrics/**"
- "cameleer-server-app/**/ServerMetrics*"
- "ui/src/pages/RuntimeTab/**"
- "ui/src/pages/DashboardTab/**"
---
# Prometheus Metrics
Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
## Gauges (auto-polled)
| Metric | Tags | Source |
|--------|------|--------|
| `cameleer.agents.connected` | `state` (live, stale, dead, shutdown) | `AgentRegistryService.findByState()` |
| `cameleer.agents.sse.active` | — | `SseConnectionManager.getConnectionCount()` |
| `cameleer.ingestion.buffer.size` | `type` (execution, processor, log, metrics) | `WriteBuffer.size()` |
| `cameleer.ingestion.accumulator.pending` | — | `ChunkAccumulator.getPendingCount()` |
## Counters
| Metric | Tags | Instrumented in |
|--------|------|-----------------|
| `cameleer.ingestion.drops` | `reason` (buffer_full, no_agent, no_identity) | `LogIngestionController` |
| `cameleer.agents.transitions` | `transition` (went_stale, went_dead, recovered) | `AgentLifecycleMonitor` |
| `cameleer.deployments.outcome` | `status` (running, failed, degraded) | `DeploymentExecutor` |
| `cameleer.auth.failures` | `reason` (invalid_token, revoked, oidc_rejected) | `JwtAuthenticationFilter` |
## Timers
| Metric | Tags | Instrumented in |
|--------|------|-----------------|
| `cameleer.ingestion.flush.duration` | `type` (execution, processor, log) | `ExecutionFlushScheduler` |
| `cameleer.deployments.duration` | — | `DeploymentExecutor` |
## Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time)
| Runtime Type | `prometheus.path` | `prometheus.port` |
|---|---|---|
| `spring-boot` | `/actuator/prometheus` | `8081` |
| `quarkus` / `native` | `/q/metrics` | `9000` |
| `plain-java` | `/metrics` | `9464` |
All containers also get `prometheus.scrape=true`. These labels enable Prometheus `docker_sd_configs` auto-discovery.
## Agent Metric Names (Micrometer)
Agents send `MetricsSnapshot` records with Micrometer-convention metric names. The server stores them generically (ClickHouse `agent_metrics.metric_name`). The UI references specific names in `AgentInstance.tsx` for JVM charts.
### JVM metrics (used by UI)
| Metric name | UI usage |
|---|---|
| `process.cpu.usage.value` | CPU % stat card + chart |
| `jvm.memory.used.value` | Heap MB stat card + chart (tags: `area=heap`) |
| `jvm.memory.max.value` | Heap max for % calculation (tags: `area=heap`) |
| `jvm.threads.live.value` | Thread count chart |
| `jvm.gc.pause.total_time` | GC time chart |
### Camel route metrics (stored, queried by dashboard)
| Metric name | Type | Tags |
|---|---|---|
| `camel.exchanges.succeeded.count` | counter | `routeId`, `camelContext` |
| `camel.exchanges.failed.count` | counter | `routeId`, `camelContext` |
| `camel.exchanges.total.count` | counter | `routeId`, `camelContext` |
| `camel.exchanges.failures.handled.count` | counter | `routeId`, `camelContext` |
| `camel.route.policy.count` | count | `routeId`, `camelContext` |
| `camel.route.policy.total_time` | total | `routeId`, `camelContext` |
| `camel.route.policy.max` | gauge | `routeId`, `camelContext` |
| `camel.routes.running.value` | gauge | — |
Mean processing time = `camel.route.policy.total_time / camel.route.policy.count`. Min processing time is not available (Micrometer does not track minimums).
### Cameleer agent metrics
| Metric name | Type | Tags |
|---|---|---|
| `cameleer.chunks.exported.count` | counter | `instanceId` |
| `cameleer.chunks.dropped.count` | counter | `instanceId`, `reason` |
| `cameleer.sse.reconnects.count` | counter | `instanceId` |
| `cameleer.taps.evaluated.count` | counter | `instanceId` |
| `cameleer.metrics.exported.count` | counter | `instanceId` |