mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-07-01 12:02:05 +00:00
The external-drain marker .drain_request.json is written under HERMES_HOME, which on Hermes Cloud is a persistent Fly volume (/opt/data). A begin-drain marker therefore SURVIVES the post-update machine restart. But the disruptive lifecycle actions a drain protects (auto-update / image migrate / env edit / profile change) all restart the machine — which is exactly the signal the drain is over. The freshly-restarted gateway re-read the orphaned marker on its startup reconcile and parked itself back in 'draining', refusing every new turn indefinitely (NS-570: ~52 min until manually cleared). Fix: stamp the marker with an identity of THIS container/VM instantiation (kernel boot_id + PID 1 start time, read from /proc) and treat a marker whose epoch differs from the current instantiation as absent. A deliberate restart → new PID 1 → new epoch → stale marker ignored → gateway boots 'running'. A marker written during the current instantiation (the live drain) still matches; an s6 respawn of just the gateway (PID 1/init unchanged) keeps the same epoch, so an in-flight drain is still honoured (D4a reversibility preserved). The staleness check is lenient and never fail-closed: a legacy marker with no epoch, a corrupt/contentless marker, or an environment with no /proc (epoch unavailable) all degrade to the original presence-only behaviour. NAS is untouched — it only ever POSTs begin/cancel-drain over HTTP; the marker file is purely gateway-internal IPC. The fix is entirely within gateway/drain_control.py; the watcher and the dashboard endpoint go through the same drain_requested()/write_drain_request() chokepoints and need no functional change. |
||
|---|---|---|
| .. | ||
| assets | ||
| builtin_hooks | ||
| platforms | ||
| relay | ||
| __init__.py | ||
| authz_mixin.py | ||
| channel_directory.py | ||
| code_skew.py | ||
| config.py | ||
| delivery.py | ||
| display_config.py | ||
| drain_control.py | ||
| hooks.py | ||
| kanban_watchers.py | ||
| memory_monitor.py | ||
| message_timestamps.py | ||
| mirror.py | ||
| pairing.py | ||
| platform_registry.py | ||
| response_filters.py | ||
| restart.py | ||
| rich_sent_store.py | ||
| run.py | ||
| runtime_footer.py | ||
| scale_to_zero.py | ||
| session.py | ||
| session_context.py | ||
| shutdown_forensics.py | ||
| slash_access.py | ||
| slash_commands.py | ||
| status.py | ||
| sticker_cache.py | ||
| stream_consumer.py | ||
| stream_dispatch.py | ||
| stream_events.py | ||
| whatsapp_identity.py | ||