Long-running gateway processes that survive 'hermes update' keep
pre-update modules cached in sys.modules. When new tool files on
disk then try to 'from hermes_cli.config import cfg_get' (added in
PR #17304), the import resolves against the stale module object
and raises ImportError — hitting users on Matrix, Telegram, Feishu,
and other platforms.
Two defenses:
1. Gateway self-check (gateway/run.py). On __init__, snapshot the
newest mtime across sentinel source files (hermes_cli/config.py,
run_agent.py, gateway/run.py, etc.). On every inbound message,
re-read those mtimes; if any is newer than boot time + 2s slack,
request a graceful restart via the normal drain path and return
a one-line ack to the user. Idempotent, works regardless of how
the update happened (hermes update, manual git pull, installer).
2. Post-restart survivor sweep ('hermes update'). After the existing
restart loop, sleep 3s, rescan for gateway PIDs we already tried
to kill, and SIGKILL any survivors. The detached profile watchers
and systemd then relaunch with fresh code instead of waiting out
the 120s watcher timeout.
Closes#17648.