docs: add persistent route catalog design spec
Routes with zero executions (sub-routes) vanish from the sidebar after server restart because the catalog is purely in-memory with a ClickHouse stats fallback that only covers executed routes. This spec describes a persistent route_catalog table in ClickHouse with lifecycle tracking (first_seen/last_seen) to reconstruct the sidebar without agent reconnection and support historical time-window queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,118 @@
|
||||
# Persistent Route Catalog
|
||||
|
||||
## Problem
|
||||
|
||||
The route catalog is assembled at query time from two ephemeral sources: the in-memory agent registry and ClickHouse `stats_1m_route` execution stats. Routes with zero executions (sub-routes) exist only in the agent registry. When the server restarts and no agent reconnects, these routes vanish from the sidebar. The ClickHouse stats fallback only covers routes that have at least one recorded execution.
|
||||
|
||||
This forces operators to restart agents after a server outage just to restore the sidebar, even though all historical execution data is still intact in ClickHouse.
|
||||
|
||||
## Solution
|
||||
|
||||
Add a `route_catalog` table in ClickHouse that persistently tracks which routes exist per application and environment, with `first_seen` and `last_seen` timestamps. This becomes a third source for the catalog endpoints, filling the gap between the in-memory registry and execution stats.
|
||||
|
||||
### Historical Lifecycle Support
|
||||
|
||||
When a new app version drops a route, the route's `last_seen` stops being updated. The sidebar query uses time-range overlap: any route whose `[first_seen, last_seen]` intersects the user's selected `[from, to]` window is included. This means historical routes remain visible when browsing past time windows, even if they no longer exist in the current app version.
|
||||
|
||||
## ClickHouse Table Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS route_catalog (
|
||||
tenant_id LowCardinality(String) DEFAULT 'default',
|
||||
environment LowCardinality(String) DEFAULT 'default',
|
||||
application_id LowCardinality(String),
|
||||
route_id LowCardinality(String),
|
||||
first_seen DateTime64(3),
|
||||
last_seen DateTime64(3)
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(last_seen)
|
||||
ORDER BY (tenant_id, environment, application_id, route_id);
|
||||
```
|
||||
|
||||
- `ReplacingMergeTree(last_seen)` keeps the row with the highest `last_seen` on merge.
|
||||
- No TTL: catalog entries live until explicitly dismissed.
|
||||
- No partitioning: table stays small (one row per route per app per environment).
|
||||
- Added to `init.sql`, idempotent via `IF NOT EXISTS`.
|
||||
|
||||
## Write Path
|
||||
|
||||
### Interface (`cameleer-server-core`)
|
||||
|
||||
```java
|
||||
public interface RouteCatalogStore {
|
||||
void upsert(String applicationId, String environment, Collection<String> routeIds);
|
||||
List<RouteCatalogEntry> findByEnvironment(String environment, Instant from, Instant to);
|
||||
List<RouteCatalogEntry> findAll(Instant from, Instant to);
|
||||
void deleteByApplication(String applicationId);
|
||||
}
|
||||
```
|
||||
|
||||
`RouteCatalogEntry` is a record: `(applicationId, routeId, environment, firstSeen, lastSeen)`.
|
||||
|
||||
### Implementation (`cameleer-server-app`)
|
||||
|
||||
`ClickHouseRouteCatalogStore` follows the `ClickHouseDiagramStore` pattern:
|
||||
|
||||
- Maintains a `ConcurrentHashMap<String, Instant>` as `firstSeenCache`, keyed by `tenant_id + "\0" + environment + "\0" + application_id + "\0" + route_id`.
|
||||
- Warm-loaded on startup from ClickHouse (`SELECT ... FROM route_catalog WHERE tenant_id = ?`).
|
||||
- On `upsert()`: for each route ID, look up `firstSeen` from cache (use `now` if absent), batch insert all rows with `first_seen` from cache and `last_seen = now`, update cache for new routes.
|
||||
|
||||
### Write Triggers
|
||||
|
||||
Two existing code paths already have the route list available:
|
||||
|
||||
1. **`AgentRegistrationController.register()`** -- has the full `routeIds` list from the registration request.
|
||||
2. **`AgentRegistrationController.heartbeat()`** -- has `routeStates.keySet()` as route IDs, including the auto-heal path after server restart.
|
||||
|
||||
Both call `routeCatalogStore.upsert(application, environment, routeIds)` after the existing registry logic. No new HTTP calls or scheduled jobs.
|
||||
|
||||
## Read Path
|
||||
|
||||
### Query
|
||||
|
||||
```sql
|
||||
SELECT application_id, route_id, first_seen, last_seen
|
||||
FROM route_catalog FINAL
|
||||
WHERE tenant_id = ? AND first_seen <= ? AND last_seen >= ?
|
||||
AND environment = ?
|
||||
```
|
||||
|
||||
`FINAL` forces dedup at read time. The table is small, so the cost is negligible.
|
||||
|
||||
### Merge Logic
|
||||
|
||||
In both `CatalogController` and `RouteCatalogController`, after the existing ClickHouse stats merge, add the catalog as a third source:
|
||||
|
||||
```java
|
||||
for (RouteCatalogEntry entry : catalogEntries) {
|
||||
routesByApp.computeIfAbsent(entry.applicationId(), k -> new LinkedHashSet<>())
|
||||
.add(entry.routeId());
|
||||
}
|
||||
```
|
||||
|
||||
Routes already known from the agent registry or stats are deduplicated by the `Set`.
|
||||
|
||||
### Time Range
|
||||
|
||||
Both controllers already receive `from`/`to` query params (default: last 24h). These same bounds are passed to the catalog query, giving the lifecycle overlap behavior.
|
||||
|
||||
## Dismiss Path
|
||||
|
||||
`CatalogController.dismissApplication()` adds `route_catalog` to the existing `tablesWithAppId` array. The existing `ALTER TABLE ... DELETE WHERE tenant_id = ? AND application_id = ?` loop handles deletion. The `firstSeenCache` is also cleared for the dismissed app.
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `clickhouse/init.sql` | Add `CREATE TABLE IF NOT EXISTS route_catalog` |
|
||||
| **New:** `core/.../storage/RouteCatalogStore.java` | Interface |
|
||||
| **New:** `core/.../storage/RouteCatalogEntry.java` | Record |
|
||||
| **New:** `app/.../storage/ClickHouseRouteCatalogStore.java` | Implementation with cache |
|
||||
| `app/.../config/StorageBeanConfig.java` | Wire bean |
|
||||
| `app/.../controller/AgentRegistrationController.java` | Call `upsert()` on register + heartbeat |
|
||||
| `app/.../controller/CatalogController.java` | Inject store, query, merge, add to dismiss list |
|
||||
| `app/.../controller/RouteCatalogController.java` | Same catalog merge |
|
||||
| `.claude/rules/core-classes.md` | Add to storage section |
|
||||
| `.claude/rules/app-classes.md` | Add to ClickHouse stores section |
|
||||
|
||||
No new dependencies. No PostgreSQL migration. No UI changes -- the sidebar already consumes routes from the catalog endpoints.
|
||||
Reference in New Issue
Block a user