mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
cmd_update no longer SIGKILLs in-flight agent runs, and users get 'still working' status every 3 min instead of 10. Two long-standing sources of '@user — agent gives up mid-task' reports on Telegram and other gateways. Drain-aware update: - New helper hermes_cli.gateway._graceful_restart_via_sigusr1(pid, drain_timeout) sends SIGUSR1 to the gateway and polls os.kill(pid, 0) until the process exits or the budget expires. - cmd_update's systemd loop now reads MainPID via 'systemctl show --property=MainPID --value' and tries the graceful path first. The gateway's existing SIGUSR1 handler -> request_restart(via_service= True) -> drain -> exit(75) is wired in gateway/run.py and is respawned by systemd's Restart=on-failure (and the explicit RestartForceExitStatus=75 on newer units). - Falls back to 'systemctl restart' when MainPID is unknown, the drain budget elapses, or the unit doesn't respawn after exit (older units missing Restart=on-failure). Old install behavior preserved. - Drain budget = max(restart_drain_timeout, 30s) + 15s margin so the drain loop in run_agent + final exit have room before fallback fires. Composes with #14728's tool-subprocess reaping. Notification interval: - agent.gateway_notify_interval default 600 -> 180. - HERMES_AGENT_NOTIFY_INTERVAL env-var fallback in gateway/run.py matched. - 9-minute weak-model spinning runs now ping at 3 min and 6 min instead of 27 seconds before completion, removing the 'is the bot dead?' reflex that drives gateway-restart cycles. Tests: - Two new tests in tests/hermes_cli/test_update_gateway_restart.py: one asserts SIGUSR1 is sent and 'systemctl restart' is NOT called when MainPID is known and the helper succeeds; one asserts the fallback fires when the helper returns False. - E2E: spawned detached bash processes confirm the helper returns True on SIGUSR1-handling exit (~0.5s) and False on SIGUSR1-ignoring processes (timeout). Verified non-existent PID and pid=0 edge cases. - 41/41 in test_update_gateway_restart.py (was 39, +2 new). - 154/154 in shutdown-related suites including #14728's new tests. Reported by @GeoffWellman and @ANT_1515 on X. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| auth.py | ||
| auth_commands.py | ||
| backup.py | ||
| banner.py | ||
| callbacks.py | ||
| claw.py | ||
| cli_output.py | ||
| clipboard.py | ||
| codex_models.py | ||
| colors.py | ||
| commands.py | ||
| completion.py | ||
| config.py | ||
| copilot_auth.py | ||
| cron.py | ||
| curses_ui.py | ||
| debug.py | ||
| default_soul.py | ||
| dingtalk_auth.py | ||
| doctor.py | ||
| dump.py | ||
| env_loader.py | ||
| gateway.py | ||
| hooks.py | ||
| logs.py | ||
| main.py | ||
| mcp_config.py | ||
| memory_setup.py | ||
| model_normalize.py | ||
| model_switch.py | ||
| models.py | ||
| nous_subscription.py | ||
| pairing.py | ||
| platforms.py | ||
| plugins.py | ||
| plugins_cmd.py | ||
| profiles.py | ||
| providers.py | ||
| runtime_provider.py | ||
| setup.py | ||
| skills_config.py | ||
| skills_hub.py | ||
| skin_engine.py | ||
| status.py | ||
| timeouts.py | ||
| tips.py | ||
| tools_config.py | ||
| uninstall.py | ||
| web_server.py | ||
| webhook.py | ||