mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-10 08:32:09 +00:00
* fix(gateway): auto-start after container restart via planned-stop marker
On Docker (s6-overlay), the gateway runs as a dynamically-registered s6
service. When the container stops/restarts/upgrades, s6 sends the gateway
a plain SIGTERM. The shutdown path (_stop_impl) ended with an
unconditional _update_runtime_status("stopped"), persisting
gateway_state=stopped to the volume. container_boot.py reads that on the
next boot and only auto-starts gateways whose last state was "running"
(_AUTOSTART_STATES) — so after a routine `docker compose up
--force-recreate` the gateway stays down and messaging channels silently
go dark, with no error surfaced (issue #42675).
The codebase already distinguishes intentional stops from unexpected
signals via the planned-stop marker (write_planned_stop_marker /
consume_planned_stop_marker_for_self): `hermes gateway stop`,
systemd/launchd ExecStop, and Ctrl+C write a marker before signalling,
so the handler classifies them as planned. An unmarked SIGTERM
(container/s6 restart, OOM, bare kill) is signal-initiated.
This wires that existing classification through to the state persist,
rather than adding unreliable signal-source inference:
- run.py: GatewayRunner._signal_initiated_shutdown, set in
shutdown_signal_handler's unmarked-signal branch. In _stop_impl, a
signal-initiated (non-restart) teardown now persists "running" instead
of "stopped" — preserving the operator's run-intent and overwriting the
mid-shutdown "draining" marker so _AUTOSTART_STATES matches on reboot.
Operator stops and restarts persist "stopped" as before.
- service_manager.py: S6ServiceManager.stop() now writes the planned-stop
marker for the supervised PID (read from s6-svstat) before `s6-svc -d`,
so an in-container `hermes gateway stop` is correctly classified as
intentional (parity with the systemd/launchd/host stop paths, which
already mark). Best-effort: a marker-write failure falls back to the
safe signal-initiated path.
Tests: shutdown persist-decision table (signal→running, operator→stopped,
restart→stopped), s6 stop marker write + svstat PID parse + failure
tolerance. The signal→running and s6-marker tests fail without the
respective source change. Verified end-to-end against a container built
from this branch: an unmarked SIGTERM to the live gateway leaves
gateway_state=running (shutdown-context log confirms signal path);
existing real container-restart suite still green.
* docs(docker): clarify gateway autostart distinguishes operator-stop from container-kill
The per-profile-supervision section described the autostart-across-restart
contract as "running gateways come back, stopped stay stopped" without
spelling out what records 'stopped'. That contract was the source of
#42675 confusion: users expected a restart to bring the gateway back and
it didn't. With the write-side fix, only an explicit `hermes gateway stop`
records 'stopped'; container/s6 restart SIGTERMs (incl. image upgrades and
unexpected exits) leave the state 'running' so the gateway auto-starts.
Make that distinction explicit in both the multi-profile and
per-profile-supervision sections.
* test(docker): real-restart autostart E2E for #42675
Adds test_live_gateway_autostarts_after_real_restart_without_manual_state_stamp:
a live s6-supervised gateway is killed by an actual `docker restart`
SIGTERM (no manual gateway_state stamp, no planned-stop marker) and must
auto-start on the next boot. Exercises the WRITE side of the fix that the
existing stamp-based tests bypass.
Verified to FAIL against an origin/main image (reconciler logs
prior_state=stopped action=registered — the #42675 bug) and PASS against
the fixed image (prior_state=running action=started).
|
||
|---|---|---|
| .. | ||
| acp | ||
| acp_adapter | ||
| agent | ||
| cli | ||
| cron | ||
| docker | ||
| e2e | ||
| fakes | ||
| fixtures/plugins/example-dashboard/dashboard | ||
| gateway | ||
| hermes_cli | ||
| hermes_state | ||
| honcho_plugin | ||
| integration | ||
| openviking_plugin | ||
| plugins | ||
| providers | ||
| run_agent | ||
| scripts | ||
| skills | ||
| stress | ||
| tools | ||
| tui_gateway | ||
| website | ||
| __init__.py | ||
| conftest.py | ||
| run_interrupt_test.py | ||
| test_account_usage.py | ||
| test_atomic_replace_symlinks.py | ||
| test_base_url_hostname.py | ||
| test_batch_runner_checkpoint.py | ||
| test_bitwarden_secrets.py | ||
| test_cli_file_drop.py | ||
| test_cli_manual_compress.py | ||
| test_cli_skin_integration.py | ||
| test_ctx_halving_fix.py | ||
| test_dashboard_sidecar_close_on_disconnect.py | ||
| test_desktop_mac_entitlements.py | ||
| test_docker_home_override_scripts.py | ||
| test_docker_stage2_browser_discovery.py | ||
| test_dockerfile_tini_compat_shim.py | ||
| test_empty_model_fallback.py | ||
| test_env_loader_secret_sources.py | ||
| test_evidence_store.py | ||
| test_gateway_streaming_nested_config.py | ||
| test_get_tool_definitions_cache_isolation.py | ||
| test_hermes_bootstrap.py | ||
| test_hermes_constants.py | ||
| test_hermes_home_profile_warning.py | ||
| test_hermes_logging.py | ||
| test_hermes_state.py | ||
| test_hermes_state_compression_locks.py | ||
| test_hermes_state_wal_fallback.py | ||
| test_honcho_client_concurrency.py | ||
| test_honcho_client_config.py | ||
| test_honcho_session_context.py | ||
| test_honcho_startup_fail_open.py | ||
| test_install_no_initial_commit.py | ||
| test_install_sh_browser_install.py | ||
| test_install_sh_pythonpath_sanitization.py | ||
| test_install_sh_root_fhs_uv_python_path.py | ||
| test_install_sh_setup_wizard_tty_probe.py | ||
| test_install_sh_symlink_stomp.py | ||
| test_install_sh_termux_network_prereqs.py | ||
| test_ipv4_preference.py | ||
| test_lazy_session_regressions.py | ||
| test_lint_config.py | ||
| test_live_system_guard_self_test.py | ||
| test_mcp_serve.py | ||
| test_mini_swe_runner.py | ||
| test_minimax_model_validation.py | ||
| test_minimax_oauth.py | ||
| test_minisweagent_path.py | ||
| test_model_picker_scroll.py | ||
| test_model_tools.py | ||
| test_model_tools_async_bridge.py | ||
| test_ollama_num_ctx.py | ||
| test_output_cap_parsing.py | ||
| test_package_json_lazy_deps.py | ||
| test_packaging_metadata.py | ||
| test_plugin_skills.py | ||
| test_plugin_utils.py | ||
| test_process_loop_event_loop_warning.py | ||
| test_project_metadata.py | ||
| test_retry_utils.py | ||
| test_run_tests_parallel.py | ||
| test_sanitize_tool_error.py | ||
| test_slash_worker_watchdog.py | ||
| test_sql_injection.py | ||
| test_state_db_malformed_repair.py | ||
| test_subprocess_home_isolation.py | ||
| test_termux_all_extra_compat.py | ||
| test_timezone.py | ||
| test_toolset_distributions.py | ||
| test_toolsets.py | ||
| test_trajectory_compressor.py | ||
| test_trajectory_compressor_async.py | ||
| test_transform_llm_output_hook.py | ||
| test_transform_tool_result_hook.py | ||
| test_tui_gateway_server.py | ||
| test_tui_gateway_ws.py | ||
| test_utils_truthy_values.py | ||
| test_web_server.py | ||
| test_wheel_locales_e2e.py | ||
| test_yuanbao_integration.py | ||
| test_yuanbao_markdown.py | ||
| test_yuanbao_pipeline.py | ||
| test_yuanbao_proto.py | ||
| test_yuanbao_shutdown.py | ||