Files
cameleer-server/docs/superpowers/specs/2026-04-16-persistent-route-catalog-design.md
hsiegeln 2542e430ac docs: add persistent route catalog design spec
Routes with zero executions (sub-routes) vanish from the sidebar after
server restart because the catalog is purely in-memory with a ClickHouse
stats fallback that only covers executed routes. This spec describes a
persistent route_catalog table in ClickHouse with lifecycle tracking
(first_seen/last_seen) to reconstruct the sidebar without agent
reconnection and support historical time-window queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:39:49 +02:00

5.6 KiB

Persistent Route Catalog

Problem

The route catalog is assembled at query time from two ephemeral sources: the in-memory agent registry and ClickHouse stats_1m_route execution stats. Routes with zero executions (sub-routes) exist only in the agent registry. When the server restarts and no agent reconnects, these routes vanish from the sidebar. The ClickHouse stats fallback only covers routes that have at least one recorded execution.

This forces operators to restart agents after a server outage just to restore the sidebar, even though all historical execution data is still intact in ClickHouse.

Solution

Add a route_catalog table in ClickHouse that persistently tracks which routes exist per application and environment, with first_seen and last_seen timestamps. This becomes a third source for the catalog endpoints, filling the gap between the in-memory registry and execution stats.

Historical Lifecycle Support

When a new app version drops a route, the route's last_seen stops being updated. The sidebar query uses time-range overlap: any route whose [first_seen, last_seen] intersects the user's selected [from, to] window is included. This means historical routes remain visible when browsing past time windows, even if they no longer exist in the current app version.

ClickHouse Table Schema

CREATE TABLE IF NOT EXISTS route_catalog (
    tenant_id        LowCardinality(String) DEFAULT 'default',
    environment      LowCardinality(String) DEFAULT 'default',
    application_id   LowCardinality(String),
    route_id         LowCardinality(String),
    first_seen       DateTime64(3),
    last_seen        DateTime64(3)
)
ENGINE = ReplacingMergeTree(last_seen)
ORDER BY (tenant_id, environment, application_id, route_id);
  • ReplacingMergeTree(last_seen) keeps the row with the highest last_seen on merge.
  • No TTL: catalog entries live until explicitly dismissed.
  • No partitioning: table stays small (one row per route per app per environment).
  • Added to init.sql, idempotent via IF NOT EXISTS.

Write Path

Interface (cameleer-server-core)

public interface RouteCatalogStore {
    void upsert(String applicationId, String environment, Collection<String> routeIds);
    List<RouteCatalogEntry> findByEnvironment(String environment, Instant from, Instant to);
    List<RouteCatalogEntry> findAll(Instant from, Instant to);
    void deleteByApplication(String applicationId);
}

RouteCatalogEntry is a record: (applicationId, routeId, environment, firstSeen, lastSeen).

Implementation (cameleer-server-app)

ClickHouseRouteCatalogStore follows the ClickHouseDiagramStore pattern:

  • Maintains a ConcurrentHashMap<String, Instant> as firstSeenCache, keyed by tenant_id + "\0" + environment + "\0" + application_id + "\0" + route_id.
  • Warm-loaded on startup from ClickHouse (SELECT ... FROM route_catalog WHERE tenant_id = ?).
  • On upsert(): for each route ID, look up firstSeen from cache (use now if absent), batch insert all rows with first_seen from cache and last_seen = now, update cache for new routes.

Write Triggers

Two existing code paths already have the route list available:

  1. AgentRegistrationController.register() -- has the full routeIds list from the registration request.
  2. AgentRegistrationController.heartbeat() -- has routeStates.keySet() as route IDs, including the auto-heal path after server restart.

Both call routeCatalogStore.upsert(application, environment, routeIds) after the existing registry logic. No new HTTP calls or scheduled jobs.

Read Path

Query

SELECT application_id, route_id, first_seen, last_seen
FROM route_catalog FINAL
WHERE tenant_id = ? AND first_seen <= ? AND last_seen >= ?
  AND environment = ?

FINAL forces dedup at read time. The table is small, so the cost is negligible.

Merge Logic

In both CatalogController and RouteCatalogController, after the existing ClickHouse stats merge, add the catalog as a third source:

for (RouteCatalogEntry entry : catalogEntries) {
    routesByApp.computeIfAbsent(entry.applicationId(), k -> new LinkedHashSet<>())
               .add(entry.routeId());
}

Routes already known from the agent registry or stats are deduplicated by the Set.

Time Range

Both controllers already receive from/to query params (default: last 24h). These same bounds are passed to the catalog query, giving the lifecycle overlap behavior.

Dismiss Path

CatalogController.dismissApplication() adds route_catalog to the existing tablesWithAppId array. The existing ALTER TABLE ... DELETE WHERE tenant_id = ? AND application_id = ? loop handles deletion. The firstSeenCache is also cleared for the dismissed app.

Files Changed

File Change
clickhouse/init.sql Add CREATE TABLE IF NOT EXISTS route_catalog
New: core/.../storage/RouteCatalogStore.java Interface
New: core/.../storage/RouteCatalogEntry.java Record
New: app/.../storage/ClickHouseRouteCatalogStore.java Implementation with cache
app/.../config/StorageBeanConfig.java Wire bean
app/.../controller/AgentRegistrationController.java Call upsert() on register + heartbeat
app/.../controller/CatalogController.java Inject store, query, merge, add to dismiss list
app/.../controller/RouteCatalogController.java Same catalog merge
.claude/rules/core-classes.md Add to storage section
.claude/rules/app-classes.md Add to ClickHouse stores section

No new dependencies. No PostgreSQL migration. No UI changes -- the sidebar already consumes routes from the catalog endpoints.