mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-10 03:22:05 +00:00
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
This commit is contained in:
parent
b53bd12fe4
commit
e93bfc6c93
15 changed files with 955 additions and 70 deletions
|
|
@ -708,7 +708,16 @@ def _run_chrome_fallback_command(
|
|||
)
|
||||
return {"success": False, "error": hint}
|
||||
|
||||
cmd_prefix = ["npx", "agent-browser"] if browser_cmd == "npx agent-browser" else [browser_cmd]
|
||||
# On Windows npx is npx.cmd — use shutil.which so CreateProcessW can
|
||||
# execute the batch shim. shutil.which honours PATHEXT on Windows and
|
||||
# returns the plain executable on POSIX. If npx isn't on PATH (Termux,
|
||||
# bare container), fall back to the bare name and let Popen raise with
|
||||
# a readable "FileNotFoundError: 'npx'" rather than WinError 193.
|
||||
if browser_cmd == "npx agent-browser":
|
||||
_npx_bin = shutil.which("npx") or "npx"
|
||||
cmd_prefix = [_npx_bin, "agent-browser"]
|
||||
else:
|
||||
cmd_prefix = [browser_cmd]
|
||||
base_args = cmd_prefix + ["--engine", "chrome", "--session", tmp_session, "--json"]
|
||||
|
||||
task_socket_dir = os.path.join(_socket_safe_tmpdir(), f"agent-browser-{tmp_session}")
|
||||
|
|
@ -1768,7 +1777,12 @@ def _run_browser_command(
|
|||
|
||||
# Keep concrete executable paths intact, even when they contain spaces.
|
||||
# Only the synthetic npx fallback needs to expand into multiple argv items.
|
||||
cmd_prefix = ["npx", "agent-browser"] if browser_cmd == "npx agent-browser" else [browser_cmd]
|
||||
# shutil.which resolves npx → npx.cmd on Windows; bare "npx" stays on POSIX.
|
||||
if browser_cmd == "npx agent-browser":
|
||||
_npx_bin = shutil.which("npx") or "npx"
|
||||
cmd_prefix = [_npx_bin, "agent-browser"]
|
||||
else:
|
||||
cmd_prefix = [browser_cmd]
|
||||
|
||||
cmd_parts = cmd_prefix + backend_args + [
|
||||
"--json",
|
||||
|
|
|
|||
|
|
@ -235,10 +235,27 @@ _call_lock = threading.Lock()
|
|||
''' + _COMMON_HELPERS + '''\
|
||||
|
||||
def _connect():
|
||||
"""Connect to the parent's RPC server via the transport it picked.
|
||||
|
||||
HERMES_RPC_SOCKET can be either:
|
||||
- a filesystem path (POSIX Unix domain socket — the default on
|
||||
Linux and macOS)
|
||||
- a string of the form ``tcp://127.0.0.1:<port>`` (Windows, where
|
||||
AF_UNIX is unreliable — the parent falls back to loopback TCP)
|
||||
"""
|
||||
global _sock
|
||||
if _sock is None:
|
||||
_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
_sock.connect(os.environ["HERMES_RPC_SOCKET"])
|
||||
endpoint = os.environ["HERMES_RPC_SOCKET"]
|
||||
if endpoint.startswith("tcp://"):
|
||||
# tcp://host:port (host is always 127.0.0.1 in practice — we
|
||||
# only bind loopback server-side)
|
||||
_host_port = endpoint[len("tcp://"):]
|
||||
_host, _, _port = _host_port.rpartition(":")
|
||||
_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
_sock.connect((_host or "127.0.0.1", int(_port)))
|
||||
else:
|
||||
_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
_sock.connect(endpoint)
|
||||
_sock.settimeout(300)
|
||||
return _sock
|
||||
|
||||
|
|
@ -988,8 +1005,22 @@ def execute_code(
|
|||
# Use /tmp on macOS to avoid the long /var/folders/... path that pushes
|
||||
# Unix domain socket paths past the 104-byte macOS AF_UNIX limit.
|
||||
# On Linux, tempfile.gettempdir() already returns /tmp.
|
||||
#
|
||||
# Windows: Python 3.9+ added partial AF_UNIX support but the file-backed
|
||||
# variant is flaky across Windows builds (requires Windows 10 1803+,
|
||||
# still fails under some configurations, and the socket file can't live
|
||||
# on the same temp drive as the script). Fall back to loopback TCP —
|
||||
# same ephemeral port, same 1-connection listen queue, same serialized
|
||||
# request/response framing. The generated client reads the transport
|
||||
# selector from HERMES_RPC_SOCKET (path vs. ``tcp://host:port``).
|
||||
_sock_tmpdir = "/tmp" if sys.platform == "darwin" else tempfile.gettempdir()
|
||||
sock_path = os.path.join(_sock_tmpdir, f"hermes_rpc_{uuid.uuid4().hex}.sock")
|
||||
_use_tcp_rpc = _IS_WINDOWS
|
||||
if _use_tcp_rpc:
|
||||
sock_path = None # not used on Windows; TCP endpoint stored below
|
||||
rpc_endpoint = None # set after bind()
|
||||
else:
|
||||
sock_path = os.path.join(_sock_tmpdir, f"hermes_rpc_{uuid.uuid4().hex}.sock")
|
||||
rpc_endpoint = sock_path
|
||||
|
||||
tool_call_log: list = []
|
||||
tool_call_counter = [0] # mutable so the RPC thread can increment
|
||||
|
|
@ -1008,10 +1039,24 @@ def execute_code(
|
|||
with open(os.path.join(tmpdir, "script.py"), "w") as f:
|
||||
f.write(code)
|
||||
|
||||
# --- Start UDS server ---
|
||||
server_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
server_sock.bind(sock_path)
|
||||
os.chmod(sock_path, 0o600)
|
||||
# --- Start RPC server ---
|
||||
# Two transports:
|
||||
# POSIX: AF_UNIX stream socket on sock_path, chmod 0600 for
|
||||
# owner-only access. Filesystem permissions gate the socket.
|
||||
# Windows: AF_INET stream socket on 127.0.0.1 with an ephemeral
|
||||
# port. No filesystem permission story, but loopback-only bind
|
||||
# means only the current user's processes (not remote) can
|
||||
# connect. HERMES_RPC_SOCKET is set to ``tcp://127.0.0.1:<port>``
|
||||
# which the generated client parses to pick AF_INET.
|
||||
if _use_tcp_rpc:
|
||||
server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
server_sock.bind(("127.0.0.1", 0)) # ephemeral port
|
||||
_host, _port = server_sock.getsockname()[:2]
|
||||
rpc_endpoint = f"tcp://{_host}:{_port}"
|
||||
else:
|
||||
server_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
server_sock.bind(sock_path)
|
||||
os.chmod(sock_path, 0o600)
|
||||
server_sock.listen(1)
|
||||
|
||||
rpc_thread = threading.Thread(
|
||||
|
|
@ -1053,7 +1098,7 @@ def execute_code(
|
|||
# Allow vars with known safe prefixes.
|
||||
if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES):
|
||||
child_env[k] = v
|
||||
child_env["HERMES_RPC_SOCKET"] = sock_path
|
||||
child_env["HERMES_RPC_SOCKET"] = rpc_endpoint
|
||||
child_env["PYTHONDONTWRITEBYTECODE"] = "1"
|
||||
# Ensure the hermes-agent root is importable in the sandbox so
|
||||
# repo-root modules are available to child scripts. We also prepend
|
||||
|
|
@ -1302,7 +1347,10 @@ def execute_code(
|
|||
import shutil
|
||||
shutil.rmtree(tmpdir, ignore_errors=True)
|
||||
try:
|
||||
os.unlink(sock_path)
|
||||
# Only UDS has a filesystem socket to unlink; TCP sockets are
|
||||
# freed by server_sock.close() above.
|
||||
if sock_path:
|
||||
os.unlink(sock_path)
|
||||
except OSError:
|
||||
pass # already cleaned up or never created
|
||||
|
||||
|
|
|
|||
|
|
@ -9,6 +9,7 @@ import signal
|
|||
import subprocess
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from tools.environments.base import BaseEnvironment, _pipe_stdin
|
||||
|
||||
|
|
@ -249,7 +250,15 @@ def _make_run_env(env: dict) -> dict:
|
|||
elif k not in _HERMES_PROVIDER_ENV_BLOCKLIST or _is_passthrough(k):
|
||||
run_env[k] = v
|
||||
existing_path = run_env.get("PATH", "")
|
||||
if "/usr/bin" not in existing_path.split(":"):
|
||||
# The "/usr/bin not already present → inject sane POSIX path" heuristic
|
||||
# only makes sense on POSIX. On Windows the PATH separator is ";"
|
||||
# (the split(":") above turns a full Windows PATH into a single
|
||||
# unrecognisable chunk, which then triggers prepending POSIX paths
|
||||
# to a Windows PATH — completely wrong). Skip the injection entirely
|
||||
# on Windows; the native PATH already points at whatever shell
|
||||
# Hermes is driving via _find_bash (Git Bash), and Git Bash itself
|
||||
# prepends its MSYS2 /usr/bin equivalent via the shell-init files.
|
||||
if not _IS_WINDOWS and "/usr/bin" not in existing_path.split(":"):
|
||||
run_env["PATH"] = f"{existing_path}:{_SANE_PATH}" if existing_path else _SANE_PATH
|
||||
|
||||
# Per-profile HOME isolation: redirect system tool configs (git, ssh, gh,
|
||||
|
|
@ -371,7 +380,29 @@ class LocalEnvironment(BaseEnvironment):
|
|||
Check the environment configured for this backend first so callers can
|
||||
override the temp root explicitly (for example via terminal.env or a
|
||||
custom TMPDIR), then fall back to the host process environment.
|
||||
|
||||
**Windows:** hardcoded ``/tmp`` is wrong in two ways — native Python
|
||||
can't open the path, and the Windows default temp (``%TEMP%``) often
|
||||
contains spaces (``C:\\Users\\Some Name\\AppData\\Local\\Temp``) that
|
||||
break unquoted bash interpolations. Use a dedicated cache dir under
|
||||
``HERMES_HOME`` instead — single-word path, guaranteed to exist, same
|
||||
string resolves in both Git Bash and native Python.
|
||||
"""
|
||||
if _IS_WINDOWS:
|
||||
# Derive a Windows-safe temp dir under HERMES_HOME. Using
|
||||
# forward slashes makes the same string work unchanged in bash
|
||||
# command interpolations AND in Python ``open()`` — Windows
|
||||
# accepts forward slashes in filesystem paths, and we control
|
||||
# the path so we can guarantee no spaces.
|
||||
try:
|
||||
from hermes_constants import get_hermes_home
|
||||
cache_dir = get_hermes_home() / "cache" / "terminal"
|
||||
except Exception:
|
||||
cache_dir = Path(tempfile.gettempdir()) / "hermes_terminal"
|
||||
cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
# Force forward slashes so the same string serves both contexts.
|
||||
return str(cache_dir).replace("\\", "/")
|
||||
|
||||
for env_var in ("TMPDIR", "TMP", "TEMP"):
|
||||
candidate = self.env.get(env_var) or os.environ.get(env_var)
|
||||
if candidate and candidate.startswith("/"):
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue