Routes with zero executions (sub-routes) vanish from the sidebar after server restart because the catalog is purely in-memory with a ClickHouse stats fallback that only covers executed routes. This spec describes a persistent route_catalog table in ClickHouse with lifecycle tracking (first_seen/last_seen) to reconstruct the sidebar without agent reconnection and support historical time-window queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.6 KiB
Persistent Route Catalog
Problem
The route catalog is assembled at query time from two ephemeral sources: the in-memory agent registry and ClickHouse stats_1m_route execution stats. Routes with zero executions (sub-routes) exist only in the agent registry. When the server restarts and no agent reconnects, these routes vanish from the sidebar. The ClickHouse stats fallback only covers routes that have at least one recorded execution.
This forces operators to restart agents after a server outage just to restore the sidebar, even though all historical execution data is still intact in ClickHouse.
Solution
Add a route_catalog table in ClickHouse that persistently tracks which routes exist per application and environment, with first_seen and last_seen timestamps. This becomes a third source for the catalog endpoints, filling the gap between the in-memory registry and execution stats.
Historical Lifecycle Support
When a new app version drops a route, the route's last_seen stops being updated. The sidebar query uses time-range overlap: any route whose [first_seen, last_seen] intersects the user's selected [from, to] window is included. This means historical routes remain visible when browsing past time windows, even if they no longer exist in the current app version.
ClickHouse Table Schema
CREATE TABLE IF NOT EXISTS route_catalog (
tenant_id LowCardinality(String) DEFAULT 'default',
environment LowCardinality(String) DEFAULT 'default',
application_id LowCardinality(String),
route_id LowCardinality(String),
first_seen DateTime64(3),
last_seen DateTime64(3)
)
ENGINE = ReplacingMergeTree(last_seen)
ORDER BY (tenant_id, environment, application_id, route_id);
ReplacingMergeTree(last_seen)keeps the row with the highestlast_seenon merge.- No TTL: catalog entries live until explicitly dismissed.
- No partitioning: table stays small (one row per route per app per environment).
- Added to
init.sql, idempotent viaIF NOT EXISTS.
Write Path
Interface (cameleer-server-core)
public interface RouteCatalogStore {
void upsert(String applicationId, String environment, Collection<String> routeIds);
List<RouteCatalogEntry> findByEnvironment(String environment, Instant from, Instant to);
List<RouteCatalogEntry> findAll(Instant from, Instant to);
void deleteByApplication(String applicationId);
}
RouteCatalogEntry is a record: (applicationId, routeId, environment, firstSeen, lastSeen).
Implementation (cameleer-server-app)
ClickHouseRouteCatalogStore follows the ClickHouseDiagramStore pattern:
- Maintains a
ConcurrentHashMap<String, Instant>asfirstSeenCache, keyed bytenant_id + "\0" + environment + "\0" + application_id + "\0" + route_id. - Warm-loaded on startup from ClickHouse (
SELECT ... FROM route_catalog WHERE tenant_id = ?). - On
upsert(): for each route ID, look upfirstSeenfrom cache (usenowif absent), batch insert all rows withfirst_seenfrom cache andlast_seen = now, update cache for new routes.
Write Triggers
Two existing code paths already have the route list available:
AgentRegistrationController.register()-- has the fullrouteIdslist from the registration request.AgentRegistrationController.heartbeat()-- hasrouteStates.keySet()as route IDs, including the auto-heal path after server restart.
Both call routeCatalogStore.upsert(application, environment, routeIds) after the existing registry logic. No new HTTP calls or scheduled jobs.
Read Path
Query
SELECT application_id, route_id, first_seen, last_seen
FROM route_catalog FINAL
WHERE tenant_id = ? AND first_seen <= ? AND last_seen >= ?
AND environment = ?
FINAL forces dedup at read time. The table is small, so the cost is negligible.
Merge Logic
In both CatalogController and RouteCatalogController, after the existing ClickHouse stats merge, add the catalog as a third source:
for (RouteCatalogEntry entry : catalogEntries) {
routesByApp.computeIfAbsent(entry.applicationId(), k -> new LinkedHashSet<>())
.add(entry.routeId());
}
Routes already known from the agent registry or stats are deduplicated by the Set.
Time Range
Both controllers already receive from/to query params (default: last 24h). These same bounds are passed to the catalog query, giving the lifecycle overlap behavior.
Dismiss Path
CatalogController.dismissApplication() adds route_catalog to the existing tablesWithAppId array. The existing ALTER TABLE ... DELETE WHERE tenant_id = ? AND application_id = ? loop handles deletion. The firstSeenCache is also cleared for the dismissed app.
Files Changed
| File | Change |
|---|---|
clickhouse/init.sql |
Add CREATE TABLE IF NOT EXISTS route_catalog |
New: core/.../storage/RouteCatalogStore.java |
Interface |
New: core/.../storage/RouteCatalogEntry.java |
Record |
New: app/.../storage/ClickHouseRouteCatalogStore.java |
Implementation with cache |
app/.../config/StorageBeanConfig.java |
Wire bean |
app/.../controller/AgentRegistrationController.java |
Call upsert() on register + heartbeat |
app/.../controller/CatalogController.java |
Inject store, query, merge, add to dismiss list |
app/.../controller/RouteCatalogController.java |
Same catalog merge |
.claude/rules/core-classes.md |
Add to storage section |
.claude/rules/app-classes.md |
Add to ClickHouse stores section |
No new dependencies. No PostgreSQL migration. No UI changes -- the sidebar already consumes routes from the catalog endpoints.