mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-12 03:42:08 +00:00
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
This commit is contained in:
parent
b53bd12fe4
commit
e93bfc6c93
15 changed files with 955 additions and 70 deletions
124
gateway/run.py
124
gateway/run.py
|
|
@ -2880,6 +2880,48 @@ class GatewayRunner:
|
|||
return
|
||||
|
||||
current_pid = os.getpid()
|
||||
|
||||
# On Windows there's no bash/setsid chain — spawn a tiny Python
|
||||
# watcher directly via sys.executable instead. The watcher polls
|
||||
# current_pid, waits for our exit, then runs `hermes gateway
|
||||
# restart` with detach flags so the respawn survives the CLI
|
||||
# that triggered the /restart command closing its console.
|
||||
if sys.platform == "win32":
|
||||
import textwrap
|
||||
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
|
||||
|
||||
cmd_argv = [*hermes_cmd, "gateway", "restart"]
|
||||
watcher = textwrap.dedent(
|
||||
"""
|
||||
import os, subprocess, sys, time
|
||||
pid = int(sys.argv[1])
|
||||
cmd = sys.argv[2:]
|
||||
deadline = time.monotonic() + 120
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
os.kill(pid, 0)
|
||||
except (ProcessLookupError, PermissionError, OSError):
|
||||
break
|
||||
time.sleep(0.2)
|
||||
_CREATE_NEW_PROCESS_GROUP = 0x00000200
|
||||
_DETACHED_PROCESS = 0x00000008
|
||||
_CREATE_NO_WINDOW = 0x08000000
|
||||
subprocess.Popen(
|
||||
cmd,
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
creationflags=_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW,
|
||||
)
|
||||
"""
|
||||
).strip()
|
||||
subprocess.Popen(
|
||||
[sys.executable, "-c", watcher, str(current_pid), *cmd_argv],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
**windows_detach_popen_kwargs(),
|
||||
)
|
||||
return
|
||||
|
||||
cmd = " ".join(shlex.quote(part) for part in hermes_cmd)
|
||||
shell_cmd = (
|
||||
f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; "
|
||||
|
|
@ -11464,30 +11506,78 @@ class GatewayRunner:
|
|||
# where systemd-run --user fails due to missing D-Bus session).
|
||||
# PYTHONUNBUFFERED ensures output is flushed line-by-line so the
|
||||
# gateway can stream it to the messenger in near-real-time.
|
||||
hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd)
|
||||
update_cmd = (
|
||||
f"PYTHONUNBUFFERED=1 {hermes_cmd_str} update --gateway"
|
||||
f" > {shlex.quote(str(output_path))} 2>&1; "
|
||||
f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}"
|
||||
)
|
||||
# Spawn `hermes update --gateway` detached so it survives gateway restart.
|
||||
# --gateway enables file-based IPC for interactive prompts (stash
|
||||
# restore, config migration) so the gateway can forward them to the
|
||||
# user instead of silently skipping them.
|
||||
# Use setsid for portable session detach (works under system services
|
||||
# where systemd-run --user fails due to missing D-Bus session).
|
||||
# PYTHONUNBUFFERED ensures output is flushed line-by-line so the
|
||||
# gateway can stream it to the messenger in near-real-time.
|
||||
#
|
||||
# Windows: no bash/setsid chain. Run `hermes update --gateway`
|
||||
# directly via sys.executable; redirect stdout/stderr to the same
|
||||
# output files via Popen file handles; write the exit code in a
|
||||
# follow-up write. A tiny Python watcher would be cleaner but
|
||||
# we're already inside gateway/run.py's update path which is async,
|
||||
# so the simplest correct thing is: launch an inline Python helper
|
||||
# that runs the command and writes both outputs.
|
||||
try:
|
||||
setsid_bin = shutil.which("setsid")
|
||||
if setsid_bin:
|
||||
# Preferred: setsid creates a new session, fully detached
|
||||
if sys.platform == "win32":
|
||||
import textwrap
|
||||
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
|
||||
|
||||
# hermes_cmd is a list of argv parts we can pass directly
|
||||
# (no shell-quoting needed).
|
||||
helper = textwrap.dedent(
|
||||
"""
|
||||
import os, subprocess, sys
|
||||
output_path = sys.argv[1]
|
||||
exit_code_path = sys.argv[2]
|
||||
cmd = sys.argv[3:]
|
||||
env = dict(os.environ)
|
||||
env["PYTHONUNBUFFERED"] = "1"
|
||||
with open(output_path, "wb") as f:
|
||||
proc = subprocess.Popen(cmd, stdout=f, stderr=subprocess.STDOUT, env=env)
|
||||
rc = proc.wait()
|
||||
with open(exit_code_path, "w") as f:
|
||||
f.write(str(rc))
|
||||
"""
|
||||
).strip()
|
||||
subprocess.Popen(
|
||||
[setsid_bin, "bash", "-c", update_cmd],
|
||||
[
|
||||
sys.executable, "-c", helper,
|
||||
str(output_path), str(exit_code_path),
|
||||
*hermes_cmd, "update", "--gateway",
|
||||
],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
start_new_session=True,
|
||||
**windows_detach_popen_kwargs(),
|
||||
)
|
||||
else:
|
||||
# Fallback: start_new_session=True calls os.setsid() in child
|
||||
subprocess.Popen(
|
||||
["bash", "-c", update_cmd],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
start_new_session=True,
|
||||
hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd)
|
||||
update_cmd = (
|
||||
f"PYTHONUNBUFFERED=1 {hermes_cmd_str} update --gateway"
|
||||
f" > {shlex.quote(str(output_path))} 2>&1; "
|
||||
f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}"
|
||||
)
|
||||
setsid_bin = shutil.which("setsid")
|
||||
if setsid_bin:
|
||||
# Preferred: setsid creates a new session, fully detached
|
||||
subprocess.Popen(
|
||||
[setsid_bin, "bash", "-c", update_cmd],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
start_new_session=True,
|
||||
)
|
||||
else:
|
||||
# Fallback: start_new_session=True calls os.setsid() in child
|
||||
subprocess.Popen(
|
||||
["bash", "-c", update_cmd],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
start_new_session=True,
|
||||
)
|
||||
except Exception as e:
|
||||
pending_path.unlink(missing_ok=True)
|
||||
exit_code_path.unlink(missing_ok=True)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue