hermes-agent/tests
Teknium fc086da8bd
fix(gateway,windows): reliability — JOB breakaway + status --deep probes + test-leak fix (#40909)
* fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep

Three coordinated fixes for the Windows gateway reliability story:

1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn

   The 'hermes update' triggered from the Electron Desktop GUI ran inside
   Electron's job object. Without breakaway, the post-update gateway
   watcher spawned by update — already DETACHED_PROCESS — was still
   reaped when Electron's job tore down, so the gateway never came back
   after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000)
   to:
     - hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by
       every helper that calls windows_detach_popen_kwargs(), including
       launch_detached_profile_gateway_restart()
     - The watcher subprocess's own respawn snippet in
       hermes_cli/gateway.py (inlined flags so the watcher's child
       respawn also breaks away)

   _spawn_detached() in gateway_windows.py already had the flag; this
   change brings the rest of the codebase to parity.

2. Per-minute supervisor Scheduled Task — Windows equivalent of
   systemd Restart=always

   Introduces hermes_cli/gateway_supervisor.py and registers it as a
   second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1,
   LIMITED rights) alongside the existing ONLOGON task. Every minute,
   the supervisor uses the same gateway.status.get_running_pid() probe
   as 'hermes gateway status' and, if no gateway is alive, calls
   gateway_windows._spawn_detached() (which now includes BREAKAWAY) to
   bring one back.

   Covers every crash mode, not just 'machine rebooted': taskkill,
   OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw
   startup per minute when down, one PID-existence check per minute
   when up.

   Wired into both the schtasks-success and Startup-folder-fallback
   install paths via _install_supervisor_best_effort(), and removed in
   uninstall(). Best-effort: a failing supervisor install logs a
   warning but doesn't roll back the primary install.

3. 'hermes gateway status --deep' shows per-probe PASS/FAIL

   Replaces the existing terse '--deep' output (which only printed
   paths) with an actual diagnostic table:
     [1] PID file present
     [2] Lock file held by a live process
     [3] get_running_pid() result
     [4] _pid_exists(pid) — OS-level liveness
     [5] gateway_state.json (state + age)
     [6] Last lifecycle event from gateway-exit-diag.log

   When the high-level summary disagrees with reality, the user can
   see exactly which signal is lying.

Test-leak fix
-------------

tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages
monkey-patched is_linux/is_wsl/supports_systemd_services to simulate
WSL but did NOT stub is_windows(). On a Windows host, the dispatcher
in _gateway_command_inner takes the is_windows() branch BEFORE the
WSL guidance branch, so the test invoked gateway_windows.install()
for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd
— the REAL user Startup folder, never sandboxed by tmp_path — pointing
at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper.
When pytest tore down the tmp_path, every subsequent Windows login
flashed a cmd.exe window that failed to find the missing target.

Stubs is_windows=False on all four affected tests:
  test_install_wsl_no_systemd
  test_start_wsl_no_systemd
  test_status_wsl_running_manual
  test_status_wsl_not_running

Defense-in-depth: _build_startup_launcher() now prefixes the launcher
with 'if not exist <target> exit /b 0', so any future stale Startup
entry silently no-ops instead of flashing a console window.

Status enhancements
-------------------

- status() now reports supervisor task presence alongside the existing
  schtasks/Startup info, and nudges the user to reinstall if the
  supervisor isn't registered.
- Deep mode dumps both the supervisor task name + script path.

* fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes

Earlier in this branch we added a per-minute schtasks-based supervisor to
respawn the gateway after crashes / GUI-update SIGTERMs. The implementation
flashed a brief console window on every firing, which stole window focus.
We tried several variants:

  - cmd.exe wrapper invoking pythonw  -> flashes (cmd.exe is console-subsystem)
  - schtasks /TR pointing at pythonw  -> flashes (uv venv launcher pythonw is
    actually subsystem=Console, not GUI; it respawns the real pythonw)
  - schtasks /TR pointing at base uv  -> still flashes (Task Scheduler-side
    conhost preallocation; documented Windows quirk)
  - XML registration with <Hidden>true>  -> still flashes (<Hidden> only hides
    the task in the Task Scheduler UI, not the spawned window)

Researched what leading projects do:

  - Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor.
  - Tailscale: real Windows Service via SCM. Session 0, no console possible.
  - Syncthing: --no-console flag inside the binary + Startup folder.
  - openclaw: VBS Run(..., 0, False) wrapper. Suppresses the *window* but
    Super User Q971162 confirms focus-steal still occurs in some cases.

None of these use a per-minute polling scheduled task. The 'auto-restart on
crash' responsibility belongs INSIDE the daemon (Tailscale's in-process
recovery / Ollama's monitor+worker pair) OR is delegated to the Windows
Service Control Manager — not Task Scheduler.

So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB
fix in _subprocess_compat.py (from commit c1e5fa433) survives — that is the
*real* fix for problem #2 (GUI-update kills gateway): the post-update
watcher in launch_detached_profile_gateway_restart() now breaks out of
Electron's job object, so the gateway respawn watcher survives the GUI
quit and successfully respawns the gateway.

Surviving from c1e5fa433:
  * CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes #2)
  * Inlined breakaway flag in the watcher respawn snippet in gateway.py
  * hermes gateway status --deep PASS/FAIL probes (fixes #1 — visibility)
  * 'if not exist <target> exit /b 0' guard in _build_startup_launcher
    (fixes #3 — silent no-op for stale Startup entries)
  * tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause
    of #3 — pytest WSL tests no longer leak Startup entries on Win hosts)

Removed in this commit:
  * hermes_cli/gateway_supervisor.py (entire file)
  * Supervisor section in hermes_cli/gateway_windows.py (~180 lines):
      get_supervisor_task_name, get_supervisor_script_path,
      _build_supervisor_cmd_script, _write_supervisor_script,
      _install_supervisor_task, is_supervisor_task_registered,
      _install_supervisor_best_effort
  * _install_supervisor_best_effort() calls in install() (3 spots)
  * supervisor cleanup block in uninstall()
  * supervisor display lines in status() / status(deep=True)

Future direction (out of scope for this PR): the right place for Windows
'Restart=always' semantics is a real Windows Service installed via
pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM
auto-restart, no console window possible. That's a meaningful next-PR
project, not a band-aid.

Tests: 51 pass / 2 pre-existing failures in
tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are
TestSupportsSystemdServicesWSL cases that fail on origin/main too —
unrelated to this PR).
2026-06-06 19:53:58 -07:00
..
acp fix(acp): replace direct db._lock/_conn access with public update_session_meta() 2026-06-04 17:54:59 -07:00
acp_adapter feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
agent feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011) 2026-06-06 13:18:18 +05:30
cli Add /version slash command across CLI, gateway, TUI, and desktop. 2026-06-05 18:05:05 -07:00
cron fix(dashboard): populate cron delivery dropdown from configured platforms (#40218) 2026-06-05 20:23:54 -07:00
docker feat(dashboard): always enable embedded chat; remove dashboard --tui flag 2026-06-04 03:03:35 -07:00
e2e chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
fakes
fixtures/plugins/example-dashboard/dashboard feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383) 2026-06-02 12:37:40 -04:00
gateway fix(qqbot): stop 100% CPU spin when WebSocket is closed but not None (#31193, #31771) (#40574) 2026-06-06 18:44:44 -07:00
hermes_cli fix(gateway,windows): reliability — JOB breakaway + status --deep probes + test-leak fix (#40909) 2026-06-06 19:53:58 -07:00
hermes_state feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590) 2026-05-17 23:28:45 -07:00
honcho_plugin test(honcho): de-flake prewarm smoke test's thread wait (#37614) 2026-06-02 17:00:04 -07:00
integration refactor(gateway): migrate Home Assistant adapter to bundled plugin 2026-06-06 11:46:24 -07:00
openviking_plugin fix(openviking): add missing /agent/{agent}/ segment to memory URI — fixes #36969 2026-06-04 17:40:33 -07:00
plugins fix(image_gen): use gpt-5.5 for Codex image host 2026-06-06 19:31:51 -07:00
providers chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
run_agent fix(agent): import SimpleNamespace for hook payload sanitization 2026-06-06 19:32:36 -07:00
scripts feat(acp-registry): switch to uvx distribution, drop npm launcher 2026-05-14 22:27:09 -07:00
skills fix(google-workspace): fall back to uv when venv has no pip (#39516) 2026-06-05 13:30:02 +10:00
stress chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
tools fix(config): preserve custom-provider models maps and metadata through v11->v12 migration (#40573) 2026-06-06 18:43:20 -07:00
tui_gateway fix(desktop+gateway): full multi-profile support over one global-remote dashboard (#39921) 2026-06-05 12:22:55 -05:00
website feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143) 2026-06-01 19:52:28 -07:00
__init__.py
conftest.py fix: batch of small robustness/correctness fixes from @kyssta-exe 2026-06-01 19:51:03 -07:00
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py
test_base_url_hostname.py
test_batch_runner_checkpoint.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_bitwarden_secrets.py fix(bitwarden): prevent zip-slip path traversal when extracting bws binary (#40569) 2026-06-06 18:33:44 -07:00
test_cli_file_drop.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cli_manual_compress.py fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) 2026-05-18 21:43:59 -07:00
test_cli_skin_integration.py
test_ctx_halving_fix.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_desktop_mac_entitlements.py test(desktop): assert macOS device entitlements are inherited 2026-06-03 07:32:00 +07:00
test_docker_home_override_scripts.py feat(dashboard): always enable embedded chat; remove dashboard --tui flag 2026-06-04 03:03:35 -07:00
test_docker_stage2_browser_discovery.py fix(docker): discover Playwright headless_shell browser (#35717) 2026-06-01 16:06:44 +10:00
test_dockerfile_tini_compat_shim.py fix(docker): add /usr/bin/tini compatibility shim for legacy wrappers (#34192) (#34382) 2026-06-01 13:32:55 +10:00
test_empty_model_fallback.py test(models): guard Nous silent default against expensive-flagship escalation 2026-06-05 02:54:34 -07:00
test_env_loader_secret_sources.py fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271) 2026-05-25 15:18:55 -07:00
test_evidence_store.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_gateway_streaming_nested_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_get_tool_definitions_cache_isolation.py
test_hermes_bootstrap.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_hermes_constants.py fix(constants): use windows native default hermes home 2026-06-03 19:37:29 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py Add Hermes desktop app (#20059) 2026-05-31 17:46:56 -05:00
test_hermes_state.py fix(search): sanitize ":" in FTS5 queries so colon searches don't silently return empty 2026-06-06 09:32:55 -07:00
test_hermes_state_compression_locks.py fix(compression): prevent session-id fork from concurrent compressions (#34351) 2026-05-28 21:40:39 -07:00
test_hermes_state_wal_fallback.py fix(kanban): skip redundant WAL pragma on already-WAL connections 2026-05-27 14:31:55 -07:00
test_honcho_client_config.py fix(honcho): harden self-hosted setup paths 2026-05-29 22:29:48 -07:00
test_honcho_session_context.py fix(honcho): align user context peer perspective 2026-05-27 10:49:33 -07:00
test_honcho_startup_fail_open.py fix: make Honcho startup fail open 2026-06-01 20:13:42 -07:00
test_install_sh_browser_install.py fix(install): support non-sudo service-user installs on apt distros (#25814) 2026-05-14 09:05:31 -07:00
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py test(install): harden uv-python-path regression test against future drift 2026-05-27 13:55:51 -07:00
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_install_sh_termux_network_prereqs.py
test_ipv4_preference.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lazy_session_regressions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lint_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_live_system_guard_self_test.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
test_mcp_serve.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_model_tools.py feat(middleware): add adaptive execution intercepts 2026-06-03 11:22:06 -07:00
test_model_tools_async_bridge.py fix(web): run URL SSRF checks off the event loop in async paths 2026-06-04 18:04:47 -07:00
test_ollama_num_ctx.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_package_json_lazy_deps.py fix(update): make Camofox lazy-installed instead of eager (#27055) 2026-05-16 12:15:45 -07:00
test_packaging_metadata.py fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) 2026-06-03 12:00:27 -07:00
test_plugin_skills.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_process_loop_event_loop_warning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_project_metadata.py fix(deps): exclude dev tooling from all extra 2026-06-04 08:54:38 -07:00
test_retry_utils.py
test_run_tests_parallel.py test: use subprocesses for each test file (#29016) 2026-05-21 16:40:04 +05:30
test_sanitize_tool_error.py security: sanitize tool error strings before injecting into model context (#26823) 2026-05-16 00:57:39 -07:00
test_sql_injection.py
test_subprocess_home_isolation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_termux_all_extra_compat.py
test_timezone.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolset_distributions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolsets.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_trajectory_compressor.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_trajectory_compressor_async.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py test: stub has_hook in transform_tool_result hook tests 2026-06-03 06:36:46 -07:00
test_tui_gateway_server.py test(tui): cover _terminal_task_cwd remote-backend branches 2026-06-06 18:40:43 -07:00
test_utils_truthy_values.py
test_wheel_locales_e2e.py fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) 2026-06-03 12:00:27 -07:00
test_yuanbao_integration.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_markdown.py
test_yuanbao_pipeline.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_proto.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00