Files
cameleer-server/cameleer-server-app
hsiegeln fb54f9cbd2
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m5s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
fix(agent): revive DEAD agents on heartbeat (not just STALE)
Reproduction: pause a container long enough to cross both the stale
and dead thresholds, then unpause. The agent resumes sending heartbeats
but the server keeps it shown as DEAD. Only a full container restart
(which re-registers) fixes it.

Root cause: AgentRegistryService.heartbeat() only revived STALE → LIVE.
A DEAD agent's heartbeat updated lastHeartbeat but left state unchanged.
checkLifecycle() never downgrades DEAD either (no-op in that branch),
so the agent was permanently stuck in DEAD until a register() call.

Fix: extend the revival branch to also cover DEAD. Same process; a
heartbeat is proof of liveness regardless of the previous state.

Also: AgentLifecycleMonitor.mapTransitionEvent() now emits RECOVERED
for DEAD → LIVE, mirroring its behavior for STALE → LIVE, so the
lifecycle timeline captures the transition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 20:55:47 +02:00
..
2026-04-19 19:26:57 +02:00