mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-09 08:21:50 +00:00
#40909 added `CREATE_BREAKAWAY_FROM_JOB` to `windows_detach_flags()`, which fixed the headline bug (gateway dies after Desktop GUI update and never comes back). The flag's own docstring acknowledges that restrictive parent job objects can still refuse breakaway with `ERROR_ACCESS_DENIED`, surfacing as `OSError` on the `subprocess.Popen` call: "Callers in this codebase already wrap detached spawns in try/except OSError and fall back to a cmd.exe wrapper, so the breakaway-denied case degrades gracefully rather than crashing." That's true for `_spawn_detached` in `gateway_windows.py` (the `hermes gateway start` path), which has both the breakaway bit AND a retry-without-breakaway fallback. It's NOT true for the post-update watcher path in `launch_detached_profile_gateway_restart` (`hermes_cli/gateway.py`), which only has `except OSError: return False` and gives up entirely. If a user's shell/terminal/container wraps Hermes in a breakaway-denying job, the gateway-respawn watcher silently fails to launch instead of trying again without breakaway. This PR closes that gap and adds the regression tests that were missing from the original fix. ## Changes ### `hermes_cli/_subprocess_compat.py` Adds a sibling helper `windows_detach_flags_without_breakaway()` so callers can express the fallback symbolically (via the helper) rather than coding the magic `& ~0x01000000` mask at every site. Documented on `windows_detach_flags` and `windows_detach_flags_without_breakaway` with the recommended try/except pattern. ### `hermes_cli/gateway.py::launch_detached_profile_gateway_restart` Two changes, both aligned with the canonical pattern in `gateway_windows._spawn_detached`: 1. The outer watcher Popen now wraps in `try/except OSError`, and on failure retries with `windows_detach_flags_without_breakaway()` (POSIX never reaches this branch — `start_new_session=True` can't raise OSError). 2. The inlined respawn payload (the `python -c` watcher) also wraps its CreateProcess in try/except OSError and retries with `_flags & ~_CREATE_BREAKAWAY_FROM_JOB` on failure. This matters because the watcher's job-object inheritance is independent of the outer process's — even if the outer Popen succeeds with breakaway, the respawned gateway might inherit a job that doesn't. ### Regression tests in `tests/tools/test_windows_native_support.py` #40909 shipped the fix without any test that the breakaway bit is present (the existing `test_windows_detach_flags_has_expected_win32_bits` asserted only the three legacy bits). Four new tests close that: - `test_windows_detach_flags_includes_breakaway_from_job` — explicit assertion that the breakaway bit is in the default bundle, with the rationale spelled out in the docstring so a future maintainer staring at this test understands why removing it would resurrect the gateway-dies-after-GUI-update bug. - `test_windows_detach_flags_without_breakaway_drops_only_that_bit` — fallback payload keeps the other three detach bits intact. - `test_launch_detached_profile_gateway_restart_inlined_watcher_uses_breakaway` — static-text check on the stringified watcher payload. The inlined Python program isn't reachable via normal import-time inspection because it lives in a `textwrap.dedent("""...""")` literal that gets passed to a separate `python -c` interpreter. Asserting that both `_CREATE_BREAKAWAY_FROM_JOB` (symbolic) and `0x01000000` (hex literal) appear inside the dedent block is a sufficient regression guard against accidental refactors. - `test_launch_detached_profile_gateway_restart_outer_popen_has_access_denied_fallback` — static check that this PR's fallback retry is wired up symbolically. Without standing up a real Windows job object that refuses breakaway, we can't trigger the OSError in a unit test; the text guard catches the case where a future refactor removes the helper import or the `& ~_CREATE_BREAKAWAY_FROM_JOB` retry. Also extends `test_windows_detach_flags_has_expected_win32_bits` to include the breakaway bit assertion and updates `test_windows_flags_zero_on_posix` to cover the new helper. ## Tests Locally on Windows: 8/8 in the `-k "detach or breakaway or popen_kwargs or launch_detached or gateway_run_update or hermes_cli_gateway"` slice pass. Broader `tests/hermes_cli/test_gateway*.py + test_windows_native_support.py`: 172 passed, 10 failed. All 10 failures are pre-existing POSIX-only tests running on a Windows host (os.geteuid, SIGKILL fallback, is_linux fixture mismatches). Stashing this PR and re-running on bare post-#40909 main reproduces all 10 identically — none are regressions. POSIX paths unchanged: `windows_detach_flags()` and `windows_detach_flags_without_breakaway()` both return 0 off Windows, `windows_detach_popen_kwargs()` still yields `{"start_new_session": True}`. ## Out of scope - The other detached-spawn site in `hermes_cli/gateway.py` (around line 3068) also uses `windows_detach_popen_kwargs()` + `except OSError`. It deserves the same fallback treatment but the codepath is different enough (not the update-flow watcher) that it warrants a separate PR with its own scrutiny. - `gateway/run.py` has Windows branches with `windows_detach_popen_kwargs` too — same reasoning. ## Context Follow-up to #40909 (merged). I had a parallel PR (#40934, closed) that duplicated the core breakaway fix; the bits unique to that PR that #40909 didn't cover are the contents of this one. Closing #40934 and opening this slimmed-down version as the focused follow-up. |
||
|---|---|---|
| .. | ||
| dashboard_auth | ||
| proxy | ||
| __init__.py | ||
| _parser.py | ||
| _subprocess_compat.py | ||
| auth.py | ||
| auth_commands.py | ||
| azure_detect.py | ||
| backup.py | ||
| banner.py | ||
| browser_connect.py | ||
| build_info.py | ||
| bundles.py | ||
| callbacks.py | ||
| checkpoints.py | ||
| claw.py | ||
| cli_output.py | ||
| clipboard.py | ||
| codex_models.py | ||
| codex_runtime_plugin_migration.py | ||
| codex_runtime_switch.py | ||
| colors.py | ||
| commands.py | ||
| completion.py | ||
| config.py | ||
| container_boot.py | ||
| copilot_auth.py | ||
| cron.py | ||
| curator.py | ||
| curses_ui.py | ||
| dashboard_register.py | ||
| debug.py | ||
| default_soul.py | ||
| dep_ensure.py | ||
| dingtalk_auth.py | ||
| doctor.py | ||
| dump.py | ||
| env_loader.py | ||
| fallback_cmd.py | ||
| fallback_config.py | ||
| gateway.py | ||
| gateway_windows.py | ||
| goals.py | ||
| gui_uninstall.py | ||
| hooks.py | ||
| inventory.py | ||
| kanban.py | ||
| kanban_db.py | ||
| kanban_decompose.py | ||
| kanban_diagnostics.py | ||
| kanban_specify.py | ||
| kanban_swarm.py | ||
| logs.py | ||
| main.py | ||
| managed_uv.py | ||
| mcp_catalog.py | ||
| mcp_config.py | ||
| mcp_picker.py | ||
| mcp_startup.py | ||
| memory_setup.py | ||
| middleware.py | ||
| migrate.py | ||
| model_catalog.py | ||
| model_normalize.py | ||
| model_switch.py | ||
| models.py | ||
| nous_account.py | ||
| nous_subscription.py | ||
| oneshot.py | ||
| pairing.py | ||
| partial_compress.py | ||
| platforms.py | ||
| plugins.py | ||
| plugins_cmd.py | ||
| portal_cli.py | ||
| profile_describer.py | ||
| profile_distribution.py | ||
| profiles.py | ||
| prompt_size.py | ||
| providers.py | ||
| psutil_android.py | ||
| pt_input_extras.py | ||
| pty_bridge.py | ||
| relaunch.py | ||
| runtime_provider.py | ||
| secret_prompt.py | ||
| secrets_cli.py | ||
| security_advisories.py | ||
| security_audit.py | ||
| send_cmd.py | ||
| service_manager.py | ||
| session_recap.py | ||
| setup.py | ||
| skills_config.py | ||
| skills_hub.py | ||
| skin_engine.py | ||
| slack_cli.py | ||
| status.py | ||
| stdio.py | ||
| telegram_managed_bot.py | ||
| timeouts.py | ||
| tips.py | ||
| tools_config.py | ||
| uninstall.py | ||
| voice.py | ||
| web_server.py | ||
| webhook.py | ||
| xai_retirement.py | ||