mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix: write update exit code before gateway restart (cgroup kill race) (#8288)
When /update runs via Telegram, hermes update --gateway is spawned inside
the gateway's systemd cgroup. The update process itself calls
systemctl restart hermes-gateway, which tears down the cgroup with
KillMode=mixed — SIGKILL to all remaining processes. The wrapping bash
shell is killed before it can execute the exit-code epilogue, so
.update_exit_code is never created. The new gateway's update watcher
then polls for 30 minutes and sends a spurious timeout message.
Fix: write .update_exit_code from Python inside cmd_update() immediately
after the git pull + pip install succeed ("Update complete!"), before
attempting the gateway restart. The shell epilogue still writes it too
(idempotent overwrite), but now the marker exists even when the process
is killed mid-restart.
This commit is contained in:
parent
b321330362
commit
56e3ee2440
2 changed files with 137 additions and 0 deletions
|
|
@ -3939,6 +3939,26 @@ def cmd_update(args):
|
|||
print()
|
||||
print("✓ Update complete!")
|
||||
|
||||
# Write exit code *before* the gateway restart attempt.
|
||||
# When running as ``hermes update --gateway`` (spawned by the gateway's
|
||||
# /update command), this process lives inside the gateway's systemd
|
||||
# cgroup. ``systemctl restart hermes-gateway`` kills everything in the
|
||||
# cgroup (KillMode=mixed → SIGKILL to remaining processes), including
|
||||
# us and the wrapping bash shell. The shell never reaches its
|
||||
# ``printf $status > .update_exit_code`` epilogue, so the exit-code
|
||||
# marker file is never created. The new gateway's update watcher then
|
||||
# polls for 30 minutes and sends a spurious timeout message.
|
||||
#
|
||||
# Writing the marker here — after git pull + pip install succeed but
|
||||
# before we attempt the restart — ensures the new gateway sees it
|
||||
# regardless of how we die.
|
||||
if gateway_mode:
|
||||
_exit_code_path = get_hermes_home() / ".update_exit_code"
|
||||
try:
|
||||
_exit_code_path.write_text("0")
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
# Auto-restart ALL gateways after update.
|
||||
# The code update (git pull) is shared across all profiles, so every
|
||||
# running gateway needs restarting to pick up the new code.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue