feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags

Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.

Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.

## New module

- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
  windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
  All no-ops on non-Windows.

## CRITICAL fixes (would crash or silently break on Windows)

- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
  AttributeError on import on Windows, breaking `hermes --tui` entirely (it
  spawns this module as a subprocess).  Guard each signal.signal() call with
  hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.

- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
  unguarded.  os.WNOHANG doesn't exist on Windows.  Gate the whole reap loop
  behind `os.name != "nt"` — Windows has no zombies anyway.

- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
  most Windows builds.  Fall back to loopback TCP (AF_INET on 127.0.0.1:0
  ephemeral port) when _IS_WINDOWS.  HERMES_RPC_SOCKET env var now accepts
  either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
  Generated sandbox client parses both.

- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded.  Use
  shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
  readable error when bash is genuinely absent.

- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
  (npm install + node version probe), browser_tool.py x2.  On Windows npm
  is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
  fails with WinError 193.  shutil.which(...) returns the absolute .cmd
  path which CreateProcessW accepts because the extension routes through
  cmd.exe /c.  POSIX behaviour unchanged (shutil.which still returns the
  same path subprocess would resolve itself).

## HIGH fixes (silent misbehaviour on Windows)

- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
  Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
  via MSYS2's virtual /tmp but native Python couldn't open.  Result: cwd
  tracking silently broken — `cd` in terminal tool did nothing.  Windows
  branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
  (works in both bash and Python, guaranteed no spaces).

- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
  in split(":")` heuristic mangles Windows PATH (";" separator).  Gate
  the injection behind `not _IS_WINDOWS`.

- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
  Popen + watcher-script Popen both used start_new_session=True, which
  Windows silently ignores.  Watcher stayed attached to CLI's console,
  died when user closed terminal after `hermes update`, left gateway
  stale.  Now branches through windows_detach_popen_kwargs() helper
  (CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
  Windows, start_new_session=True on POSIX — identical to main).

## MEDIUM fixes

- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
  chain crashes on Windows when user triggers /update in-gateway.  Now
  has sys.platform=="win32" branch using sys.executable + a tiny
  Python watcher with proper detach flags.  POSIX path is unchanged.

- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
  style paths that break subprocess.Popen(cwd=...) and Path().resolve().
  Added _normalize_git_bash_path() helper that translates /c/Users,
  /cygdrive/c, /mnt/c variants to native C:\Users form.  POSIX no-op.
  _git_repo_root() now routes every result through it.

- cli.py worktree .worktreeinclude: os.symlink on directories failed
  hard on Windows (requires admin or Developer Mode).  Falls back to
  shutil.copytree with a warning log.

## Tests

- 29 new tests in tests/tools/test_windows_native_support.py covering:
  subprocess_compat helpers, TUI entry signal guards, kanban waitpid
  guard, code_execution TCP fallback source-level invariants, cron bash
  resolution, npm/npx bare-spawn lint per-file, local env Windows temp
  dir, PATH injection gating, git bash path normalization, symlink
  fallback, gateway detached watcher flags.

- One existing test assertion adjusted in test_browser_homebrew_paths:
  it compared captured Popen argv to the BARE `"npx"` literal; after the
  shutil.which() change argv[0] is the absolute path.  New assertion
  checks the shape (two items, second is `agent-browser`) rather than
  the exact first-item string.  Behaviour unchanged; test was too strict.

All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.

## What's still deferred (LOW priority)

- Visible cmd-window flashes on short-lived console apps (~14 sites) —
  cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
  hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
  reachable only when all env-var candidates fail.
This commit is contained in:
Teknium 2026-05-07 17:29:31 -07:00
parent 1da89528e7
commit eeb723fff2
No known key found for this signature in database
15 changed files with 955 additions and 70 deletions

72
cli.py
View file

@ -728,8 +728,43 @@ def _run_cleanup():
_active_worktree: Optional[Dict[str, str]] = None _active_worktree: Optional[Dict[str, str]] = None
def _normalize_git_bash_path(p: Optional[str]) -> Optional[str]:
"""Translate a Git Bash-style path (``/c/Users/...``) to the native
Windows form (``C:\\Users\\...``) that Python's ``subprocess.Popen``
and ``pathlib.Path`` accept.
No-op on non-Windows and for paths that already look native. Git on
native Windows normally emits forward-slash Windows paths
(``C:/Users/...``) which both bash and Python handle, but certain
configurations (Git Bash shells, MSYS2, WSL-mounted repos) surface
``/c/...`` or ``/cygdrive/c/...`` variants.
"""
if not p:
return p
if sys.platform != "win32":
return p
import re as _re
# /c/Users/... or /C/Users/...
m = _re.match(r"^/([a-zA-Z])/(.*)$", p)
if m:
drive, rest = m.group(1), m.group(2)
return f"{drive.upper()}:\\{rest.replace('/', chr(92))}"
# /cygdrive/c/... or /mnt/c/...
m = _re.match(r"^/(?:cygdrive|mnt)/([a-zA-Z])/(.*)$", p)
if m:
drive, rest = m.group(1), m.group(2)
return f"{drive.upper()}:\\{rest.replace('/', chr(92))}"
return p
def _git_repo_root() -> Optional[str]: def _git_repo_root() -> Optional[str]:
"""Return the git repo root for CWD, or None if not in a repo.""" """Return the git repo root for CWD, or None if not in a repo.
Runs through :func:`_normalize_git_bash_path` so callers can pass
the result directly to ``Path``/``subprocess.Popen(cwd=...)`` on
Windows without hitting ``C:\\c\\Users\\...`` style resolution
mistakes.
"""
import subprocess import subprocess
try: try:
result = subprocess.run( result = subprocess.run(
@ -737,7 +772,7 @@ def _git_repo_root() -> Optional[str]:
capture_output=True, text=True, timeout=5, capture_output=True, text=True, timeout=5,
) )
if result.returncode == 0: if result.returncode == 0:
return result.stdout.strip() return _normalize_git_bash_path(result.stdout.strip())
except Exception: except Exception:
pass pass
return None return None
@ -832,10 +867,39 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
dst.parent.mkdir(parents=True, exist_ok=True) dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(str(src), str(dst)) shutil.copy2(str(src), str(dst))
elif src.is_dir(): elif src.is_dir():
# Symlink directories (faster, saves disk) # Symlink directories (faster, saves disk). On Windows,
# symlink creation requires Developer Mode or elevation,
# and fails with OSError otherwise — fall back to a
# recursive copy so the worktree is still usable. The
# copy is slower and uses disk, but it doesn't require
# admin and matches the Linux/macOS symlink outcome
# functionally.
if not dst.exists(): if not dst.exists():
dst.parent.mkdir(parents=True, exist_ok=True) dst.parent.mkdir(parents=True, exist_ok=True)
os.symlink(str(src_resolved), str(dst)) try:
os.symlink(str(src_resolved), str(dst))
except (OSError, NotImplementedError) as _sym_err:
if sys.platform == "win32":
logger.info(
".worktreeinclude: symlink failed (%s) — "
"falling back to copytree on Windows.",
_sym_err,
)
try:
shutil.copytree(
str(src_resolved),
str(dst),
symlinks=True,
dirs_exist_ok=False,
)
except Exception as _copy_err:
logger.warning(
".worktreeinclude: copy fallback "
"also failed for %s -> %s: %s",
src, dst, _copy_err,
)
else:
raise
except Exception as e: except Exception as e:
logger.debug("Error copying .worktreeinclude entries: %s", e) logger.debug("Error copying .worktreeinclude entries: %s", e)

View file

@ -14,6 +14,7 @@ import contextvars
import json import json
import logging import logging
import os import os
import shutil
import subprocess import subprocess
import sys import sys
@ -714,7 +715,21 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
# choice explicit here keeps the allowed surface small and auditable. # choice explicit here keeps the allowed surface small and auditable.
suffix = path.suffix.lower() suffix = path.suffix.lower()
if suffix in (".sh", ".bash"): if suffix in (".sh", ".bash"):
argv = ["/bin/bash", str(path)] # Resolve bash dynamically so Windows (Git Bash) and Linux/macOS
# all work. On native Windows without Git for Windows installed
# shutil.which returns None — fall back to a clear error rather
# than a FileNotFoundError with a confusing "[WinError 2]"
# traceback.
_bash = shutil.which("bash") or (
"/bin/bash" if os.path.isfile("/bin/bash") else None
)
if _bash is None:
return False, (
f"Cannot run .sh/.bash script {path.name!r}: bash not found on PATH. "
"On Windows, install Git for Windows (which ships Git Bash) "
"or rewrite the script as Python (.py)."
)
argv = [_bash, str(path)]
else: else:
argv = [sys.executable, str(path)] argv = [sys.executable, str(path)]

View file

@ -21,6 +21,7 @@ import logging
import os import os
import platform import platform
import re import re
import shutil
import signal import signal
import subprocess import subprocess
@ -177,10 +178,15 @@ def check_whatsapp_requirements() -> bool:
WhatsApp requires a Node.js bridge for most implementations. WhatsApp requires a Node.js bridge for most implementations.
""" """
# Check for Node.js # Check for Node.js. Resolve via shutil.which so we respect PATHEXT
# (node.exe vs node) and get a meaningful "not installed" signal
# instead of spawning a cmd flash on Windows.
_node = shutil.which("node")
if not _node:
return False
try: try:
result = subprocess.run( result = subprocess.run(
["node", "--version"], [_node, "--version"],
capture_output=True, capture_output=True,
text=True, text=True,
timeout=5 timeout=5
@ -464,9 +470,13 @@ class WhatsAppAdapter(BasePlatformAdapter):
bridge_dir = bridge_path.parent bridge_dir = bridge_path.parent
if not (bridge_dir / "node_modules").exists(): if not (bridge_dir / "node_modules").exists():
print(f"[{self.name}] Installing WhatsApp bridge dependencies...") print(f"[{self.name}] Installing WhatsApp bridge dependencies...")
# Resolve npm path so Windows can execute the .cmd shim.
# shutil.which honours PATHEXT; on POSIX it returns the
# plain executable path.
_npm_bin = shutil.which("npm") or "npm"
try: try:
install_result = subprocess.run( install_result = subprocess.run(
["npm", "install", "--silent"], [_npm_bin, "install", "--silent"],
cwd=str(bridge_dir), cwd=str(bridge_dir),
capture_output=True, capture_output=True,
text=True, text=True,

View file

@ -2784,6 +2784,48 @@ class GatewayRunner:
return return
current_pid = os.getpid() current_pid = os.getpid()
# On Windows there's no bash/setsid chain — spawn a tiny Python
# watcher directly via sys.executable instead. The watcher polls
# current_pid, waits for our exit, then runs `hermes gateway
# restart` with detach flags so the respawn survives the CLI
# that triggered the /restart command closing its console.
if sys.platform == "win32":
import textwrap
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
cmd_argv = [*hermes_cmd, "gateway", "restart"]
watcher = textwrap.dedent(
"""
import os, subprocess, sys, time
pid = int(sys.argv[1])
cmd = sys.argv[2:]
deadline = time.monotonic() + 120
while time.monotonic() < deadline:
try:
os.kill(pid, 0)
except (ProcessLookupError, PermissionError, OSError):
break
time.sleep(0.2)
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
subprocess.Popen(
cmd,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
creationflags=_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW,
)
"""
).strip()
subprocess.Popen(
[sys.executable, "-c", watcher, str(current_pid), *cmd_argv],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
**windows_detach_popen_kwargs(),
)
return
cmd = " ".join(shlex.quote(part) for part in hermes_cmd) cmd = " ".join(shlex.quote(part) for part in hermes_cmd)
shell_cmd = ( shell_cmd = (
f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; " f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; "
@ -11305,30 +11347,78 @@ class GatewayRunner:
# where systemd-run --user fails due to missing D-Bus session). # where systemd-run --user fails due to missing D-Bus session).
# PYTHONUNBUFFERED ensures output is flushed line-by-line so the # PYTHONUNBUFFERED ensures output is flushed line-by-line so the
# gateway can stream it to the messenger in near-real-time. # gateway can stream it to the messenger in near-real-time.
hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd) # Spawn `hermes update --gateway` detached so it survives gateway restart.
update_cmd = ( # --gateway enables file-based IPC for interactive prompts (stash
f"PYTHONUNBUFFERED=1 {hermes_cmd_str} update --gateway" # restore, config migration) so the gateway can forward them to the
f" > {shlex.quote(str(output_path))} 2>&1; " # user instead of silently skipping them.
f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}" # Use setsid for portable session detach (works under system services
) # where systemd-run --user fails due to missing D-Bus session).
# PYTHONUNBUFFERED ensures output is flushed line-by-line so the
# gateway can stream it to the messenger in near-real-time.
#
# Windows: no bash/setsid chain. Run `hermes update --gateway`
# directly via sys.executable; redirect stdout/stderr to the same
# output files via Popen file handles; write the exit code in a
# follow-up write. A tiny Python watcher would be cleaner but
# we're already inside gateway/run.py's update path which is async,
# so the simplest correct thing is: launch an inline Python helper
# that runs the command and writes both outputs.
try: try:
setsid_bin = shutil.which("setsid") if sys.platform == "win32":
if setsid_bin: import textwrap
# Preferred: setsid creates a new session, fully detached from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
# hermes_cmd is a list of argv parts we can pass directly
# (no shell-quoting needed).
helper = textwrap.dedent(
"""
import os, subprocess, sys
output_path = sys.argv[1]
exit_code_path = sys.argv[2]
cmd = sys.argv[3:]
env = dict(os.environ)
env["PYTHONUNBUFFERED"] = "1"
with open(output_path, "wb") as f:
proc = subprocess.Popen(cmd, stdout=f, stderr=subprocess.STDOUT, env=env)
rc = proc.wait()
with open(exit_code_path, "w") as f:
f.write(str(rc))
"""
).strip()
subprocess.Popen( subprocess.Popen(
[setsid_bin, "bash", "-c", update_cmd], [
sys.executable, "-c", helper,
str(output_path), str(exit_code_path),
*hermes_cmd, "update", "--gateway",
],
stdout=subprocess.DEVNULL, stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
start_new_session=True, **windows_detach_popen_kwargs(),
) )
else: else:
# Fallback: start_new_session=True calls os.setsid() in child hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd)
subprocess.Popen( update_cmd = (
["bash", "-c", update_cmd], f"PYTHONUNBUFFERED=1 {hermes_cmd_str} update --gateway"
stdout=subprocess.DEVNULL, f" > {shlex.quote(str(output_path))} 2>&1; "
stderr=subprocess.DEVNULL, f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}"
start_new_session=True,
) )
setsid_bin = shutil.which("setsid")
if setsid_bin:
# Preferred: setsid creates a new session, fully detached
subprocess.Popen(
[setsid_bin, "bash", "-c", update_cmd],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
else:
# Fallback: start_new_session=True calls os.setsid() in child
subprocess.Popen(
["bash", "-c", update_cmd],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
except Exception as e: except Exception as e:
pending_path.unlink(missing_ok=True) pending_path.unlink(missing_ok=True)
exit_code_path.unlink(missing_ok=True) exit_code_path.unlink(missing_ok=True)

View file

@ -0,0 +1,175 @@
"""Windows subprocess compatibility helpers.
Hermes is developed on Linux / macOS and tested natively on Windows too.
Several common subprocess patterns break silently-or-loudly on Windows:
* ``["npm", "install", ...]`` on Windows ``npm`` is ``npm.cmd``, a batch
shim. ``subprocess.Popen(["npm", ...])`` fails with WinError 193
("not a valid Win32 application") because CreateProcessW can't run a
``.cmd`` file without ``shell=True`` or PATHEXT resolution.
* ``start_new_session=True`` on POSIX, this maps to ``os.setsid()`` and
actually detaches the child. On Windows it's silently ignored; the
Windows equivalent is ``CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS``
creationflags, which Python only applies when you pass them explicitly.
* Console-window flashes every ``subprocess.Popen`` of a ``.exe`` on
Windows spawns a cmd window briefly unless ``CREATE_NO_WINDOW`` is
passed. Cosmetic but jarring for background daemons.
This module centralizes the platform-branching logic so the rest of the
codebase doesn't sprinkle ``if sys.platform == "win32":`` everywhere.
**All helpers are no-ops on non-Windows** calling them in Linux/macOS
code paths is safe by design. That's the "do no damage on POSIX"
guarantee.
"""
from __future__ import annotations
import os
import shutil
import subprocess
import sys
from typing import Optional, Sequence
__all__ = [
"IS_WINDOWS",
"resolve_node_command",
"windows_detach_flags",
"windows_hide_flags",
"windows_detach_popen_kwargs",
]
IS_WINDOWS = sys.platform == "win32"
# -----------------------------------------------------------------------------
# Node ecosystem launcher resolution
# -----------------------------------------------------------------------------
def resolve_node_command(name: str, argv: Sequence[str]) -> list[str]:
"""Resolve a Node-ecosystem command name to an absolute-path argv.
On Windows, commands like ``npm``, ``npx``, ``yarn``, ``pnpm``,
``playwright``, ``prettier`` ship as ``.cmd`` files (batch shims).
``subprocess.Popen(["npm", "install"])`` fails with WinError 193
because CreateProcessW doesn't execute batch files directly.
``shutil.which(name)`` *does* resolve ``.cmd`` via PATHEXT and returns
the fully-qualified path which CreateProcessW accepts because the
extension tells Windows to route through ``cmd.exe /c``.
On POSIX ``shutil.which`` also returns a fully-qualified path when
found. That's a small change from bare-name resolution (the OS does
its own PATH search) but functionally identical and has the side
benefit of making the argv reproducible in logs.
Behavior when the command is not on PATH:
- On Windows: return the bare name caller can still try with
``shell=True`` as a last resort, OR the subsequent Popen will
raise FileNotFoundError with a readable error we want to surface.
- On POSIX: same. Bare ``npm`` on a Linux box without npm installed
fails the same way it did before this function existed.
Args:
name: The command name to resolve (``npm``, ``npx``, ``node`` ).
argv: The remaining arguments. Must NOT include ``name`` itself
this function builds the full argv list.
Returns:
A list suitable for passing to subprocess.Popen/run/call.
"""
resolved = shutil.which(name)
if resolved:
return [resolved, *argv]
return [name, *argv]
# -----------------------------------------------------------------------------
# Detached / hidden process creation
# -----------------------------------------------------------------------------
# Win32 CreationFlags — defined here rather than imported from subprocess
# because CREATE_NO_WINDOW and DETACHED_PROCESS aren't guaranteed to be
# present on stdlib subprocess on older Pythons or non-Windows builds.
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
def windows_detach_flags() -> int:
"""Return Win32 creationflags that detach a child from the parent
console and process group. 0 on non-Windows.
Pair with ``start_new_session=False`` (default) when calling
subprocess.Popen on POSIX use ``start_new_session=True`` instead,
which maps to ``os.setsid()`` in the child.
Rationale:
- ``CREATE_NEW_PROCESS_GROUP`` child has its own process group so
Ctrl+C in the parent console doesn't propagate.
- ``DETACHED_PROCESS`` child has no console at all. Necessary for
background daemons (gateway watchers, update respawners) because
without it, closing the console kills the child.
- ``CREATE_NO_WINDOW`` suppress the brief cmd flash that would
otherwise appear when launching a console app. Redundant with
DETACHED_PROCESS but explicit for clarity.
"""
if not IS_WINDOWS:
return 0
return _CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW
def windows_hide_flags() -> int:
"""Return Win32 creationflags that merely hide the child's console
window without detaching the child. 0 on non-Windows.
Use for short-lived console apps spawned as part of a larger
operation (``taskkill``, ``where``, version probes) where we want no
flash but also want to collect stdout/exit code synchronously.
The key difference from :func:`windows_detach_flags`: NO
``DETACHED_PROCESS`` the child still inherits stdio handles so
``capture_output=True`` works. ``DETACHED_PROCESS`` would sever
stdio and break stdout capture.
"""
if not IS_WINDOWS:
return 0
return _CREATE_NO_WINDOW
def windows_detach_popen_kwargs() -> dict:
"""Return a dict of Popen kwargs that detach a child on Windows and
fall back to the POSIX equivalent (``start_new_session=True``) on
Linux/macOS.
Usage pattern:
.. code-block:: python
subprocess.Popen(
argv,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
stdin=subprocess.DEVNULL,
close_fds=True,
**windows_detach_popen_kwargs(),
)
This replaces the unsafe-on-Windows pattern:
.. code-block:: python
subprocess.Popen(..., start_new_session=True)
which silently fails to detach on Windows (the flag is accepted but
has no effect the child stays attached to the parent's console
and dies when the console closes).
"""
if IS_WINDOWS:
return {"creationflags": windows_detach_flags()}
return {"start_new_session": True}

View file

@ -1059,7 +1059,8 @@ def run_doctor(args):
check_warn("Node.js not found", "(optional, needed for browser tools)") check_warn("Node.js not found", "(optional, needed for browser tools)")
# npm audit for all Node.js packages # npm audit for all Node.js packages
if _safe_which("npm"): _npm_bin = _safe_which("npm")
if _npm_bin:
npm_dirs = [ npm_dirs = [
(PROJECT_ROOT, "Browser tools (agent-browser)"), (PROJECT_ROOT, "Browser tools (agent-browser)"),
(PROJECT_ROOT / "scripts" / "whatsapp-bridge", "WhatsApp bridge"), (PROJECT_ROOT / "scripts" / "whatsapp-bridge", "WhatsApp bridge"),
@ -1068,8 +1069,10 @@ def run_doctor(args):
if not (npm_dir / "node_modules").exists(): if not (npm_dir / "node_modules").exists():
continue continue
try: try:
# Use resolved absolute path so Windows can execute
# npm.cmd (CreateProcessW can't run bare .cmd names).
audit_result = subprocess.run( audit_result = subprocess.run(
["npm", "audit", "--json"], [_npm_bin, "audit", "--json"],
cwd=str(npm_dir), cwd=str(npm_dir),
capture_output=True, text=True, timeout=30, capture_output=True, text=True, timeout=30,
) )

View file

@ -445,6 +445,25 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
if old_pid <= 0: if old_pid <= 0:
return False return False
# The watcher is a tiny Python subprocess that polls the old PID and
# respawns the gateway once it's gone. Both legs of the chain need
# platform-appropriate detach semantics:
#
# POSIX — ``start_new_session=True`` (os.setsid in the child) detaches
# from the parent's process group so Ctrl+C in the CLI doesn't
# propagate and the watcher/gateway survive the CLI exiting.
#
# Windows — ``start_new_session`` is silently accepted but does NOT
# detach. The watcher stays attached to the CLI's console and dies
# when the user closes the terminal, leaving ``hermes update`` users
# with no running gateway until they re-invoke ``hermes gateway``
# manually. The Win32 equivalent is the ``CREATE_NEW_PROCESS_GROUP |
# DETACHED_PROCESS | CREATE_NO_WINDOW`` creationflags bundle.
#
# ``windows_detach_popen_kwargs()`` returns the right kwargs for the
# host platform and is a no-op on POSIX (just ``start_new_session=True``).
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
watcher = textwrap.dedent( watcher = textwrap.dedent(
""" """
import os import os
@ -466,21 +485,35 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
# Windows: gone PID raises OSError (WinError 87). # Windows: gone PID raises OSError (WinError 87).
break break
time.sleep(0.2) time.sleep(0.2)
subprocess.Popen(
cmd, # Platform-appropriate detach for the respawned gateway. On POSIX
stdout=subprocess.DEVNULL, # start_new_session=True maps to os.setsid; on Windows we need
stderr=subprocess.DEVNULL, # explicit creationflags because start_new_session is a no-op there.
start_new_session=True, _popen_kwargs = {
) "stdout": subprocess.DEVNULL,
"stderr": subprocess.DEVNULL,
}
if sys.platform == "win32":
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
_popen_kwargs["creationflags"] = (
_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW
)
else:
_popen_kwargs["start_new_session"] = True
subprocess.Popen(cmd, **_popen_kwargs)
""" """
).strip() ).strip()
try: try:
# Same platform-aware detach for the watcher process itself — so
# closing the user's terminal doesn't kill the watcher.
subprocess.Popen( subprocess.Popen(
[sys.executable, "-c", watcher, str(old_pid), *_gateway_run_args_for_profile(profile)], [sys.executable, "-c", watcher, str(old_pid), *_gateway_run_args_for_profile(profile)],
stdout=subprocess.DEVNULL, stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
start_new_session=True, **windows_detach_popen_kwargs(),
) )
except OSError: except OSError:
return False return False

View file

@ -3519,17 +3519,24 @@ def dispatch_once(
# cleanly without calling ``kanban_complete`` / ``kanban_block`` # cleanly without calling ``kanban_complete`` / ``kanban_block``
# (protocol violation — auto-block) from a real crash (OOM killer, # (protocol violation — auto-block) from a real crash (OOM killer,
# SIGKILL, non-zero exit — existing counter behavior). # SIGKILL, non-zero exit — existing counter behavior).
try: #
while True: # Windows has no zombies / no os.WNOHANG — subprocess.Popen handles
try: # are freed when the Python object is garbage-collected or .wait() is
_pid, _status = os.waitpid(-1, os.WNOHANG) # called explicitly. The kanban dispatcher discards the Popen handle
except ChildProcessError: # after spawn (``_default_spawn`` → abandon), so on Windows there's
break # nothing to reap here — skip the whole block.
if _pid == 0: if os.name != "nt":
break try:
_record_worker_exit(_pid, _status) while True:
except Exception: try:
pass _pid, _status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if _pid == 0:
break
_record_worker_exit(_pid, _status)
except Exception:
pass
result = DispatchResult() result = DispatchResult()
result.reclaimed = release_stale_claims(conn) result.reclaimed = release_stale_claims(conn)

View file

@ -509,8 +509,12 @@ def _run_post_setup(post_setup_key: str):
if not node_modules.exists() and npm_bin: if not node_modules.exists() and npm_bin:
_print_info(" Installing Node.js dependencies for browser tools...") _print_info(" Installing Node.js dependencies for browser tools...")
import subprocess import subprocess
# Use the resolved npm_bin absolute path so subprocess.Popen can
# execute npm.cmd on Windows (CreateProcessW otherwise rejects
# batch shims). On POSIX npm_bin is the plain path — same
# behaviour as before.
result = subprocess.run( result = subprocess.run(
["npm", "install", "--silent"], [npm_bin, "install", "--silent"],
capture_output=True, text=True, cwd=str(PROJECT_ROOT) capture_output=True, text=True, cwd=str(PROJECT_ROOT)
) )
if result.returncode == 0: if result.returncode == 0:
@ -609,11 +613,13 @@ def _run_post_setup(post_setup_key: str):
elif post_setup_key == "camofox": elif post_setup_key == "camofox":
camofox_dir = PROJECT_ROOT / "node_modules" / "@askjo" / "camofox-browser" camofox_dir = PROJECT_ROOT / "node_modules" / "@askjo" / "camofox-browser"
if not camofox_dir.exists() and shutil.which("npm"): _npm_bin = shutil.which("npm")
if not camofox_dir.exists() and _npm_bin:
_print_info(" Installing Camofox browser server...") _print_info(" Installing Camofox browser server...")
import subprocess import subprocess
# Absolute npm path so .cmd shim executes on Windows.
result = subprocess.run( result = subprocess.run(
["npm", "install", "--silent"], [_npm_bin, "install", "--silent"],
capture_output=True, text=True, cwd=str(PROJECT_ROOT) capture_output=True, text=True, cwd=str(PROJECT_ROOT)
) )
if result.returncode == 0: if result.returncode == 0:

View file

@ -340,7 +340,15 @@ class TestRunBrowserCommandPathConstruction:
_run_browser_command("test-task", "navigate", ["https://example.com"]) _run_browser_command("test-task", "navigate", ["https://example.com"])
assert captured_cmd is not None assert captured_cmd is not None
assert captured_cmd[:2] == ["npx", "agent-browser"] # The prefix must split "npx agent-browser" into two argv items.
# On POSIX shutil.which("npx") returns the absolute path if npx is on
# PATH (which the test's patched PATH always contains when the system
# has it installed). The important invariant is that the second
# argv item is the package name "agent-browser", not a merged
# "npx agent-browser" string — that's what Popen needs.
assert len(captured_cmd) >= 2
assert captured_cmd[0].endswith("npx") or captured_cmd[0] == "npx"
assert captured_cmd[1] == "agent-browser"
assert captured_cmd[2:6] == [ assert captured_cmd[2:6] == [
"--session", "--session",
"test-session", "test-session",

View file

@ -445,3 +445,368 @@ class TestEntryPointsConfigureStdio:
f"{relpath} must call hermes_cli.stdio.configure_windows_stdio() " f"{relpath} must call hermes_cli.stdio.configure_windows_stdio() "
"early in startup so Windows consoles render Unicode without crashing" "early in startup so Windows consoles render Unicode without crashing"
) )
# ---------------------------------------------------------------------------
# _subprocess_compat shared helpers
# ---------------------------------------------------------------------------
class TestSubprocessCompatHelpers:
"""hermes_cli/_subprocess_compat.py POSIX + Windows behaviour."""
def test_is_windows_matches_sys_platform(self):
from hermes_cli import _subprocess_compat as sc
assert sc.IS_WINDOWS == (sys.platform == "win32")
def test_resolve_node_command_returns_absolute_on_posix(self):
"""On Linux, resolve_node_command('sh', ['-c','echo hi']) picks up /bin/sh."""
from hermes_cli._subprocess_compat import resolve_node_command
# We can't assert "npm is on PATH" portably; use `sh` which is
# guaranteed on POSIX. On Windows the test only confirms the
# no-crash fallback path.
argv = resolve_node_command("sh", ["-c", "echo hi"])
assert argv[1:] == ["-c", "echo hi"]
# First element is either an absolute path (sh found) or the bare
# name (fallback) — both are acceptable behaviours.
def test_resolve_node_command_fallback_when_absent(self):
from hermes_cli._subprocess_compat import resolve_node_command
argv = resolve_node_command(
"zzz-definitely-not-on-path-xyzzy", ["--help"]
)
# Must fall back to the bare name — NOT return None, NOT crash.
assert argv[0] == "zzz-definitely-not-on-path-xyzzy"
assert argv[1:] == ["--help"]
def test_windows_flags_zero_on_posix(self):
from hermes_cli._subprocess_compat import (
windows_detach_flags,
windows_hide_flags,
)
if sys.platform != "win32":
assert windows_detach_flags() == 0
assert windows_hide_flags() == 0
def test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix(self):
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
kwargs = windows_detach_popen_kwargs()
if sys.platform != "win32":
# POSIX path MUST produce start_new_session=True, which maps to
# os.setsid() in the child — identical to the unchanged main
# branch behaviour. Do NOT break Linux/macOS here.
assert kwargs == {"start_new_session": True}
else:
# Windows path must include creationflags with all 3 bits set.
assert "creationflags" in kwargs
assert kwargs["creationflags"] != 0
# No start_new_session on Windows (silently no-op there).
assert "start_new_session" not in kwargs
def test_windows_detach_flags_has_expected_win32_bits(self, monkeypatch):
"""Simulate Windows to verify flag bundle."""
from hermes_cli import _subprocess_compat as sc
monkeypatch.setattr(sc, "IS_WINDOWS", True)
flags = sc.windows_detach_flags()
# CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW
assert flags & 0x00000200, "missing CREATE_NEW_PROCESS_GROUP"
assert flags & 0x00000008, "missing DETACHED_PROCESS"
assert flags & 0x08000000, "missing CREATE_NO_WINDOW"
# ---------------------------------------------------------------------------
# tui_gateway/entry.py signal installation survives absent POSIX signals
# ---------------------------------------------------------------------------
class TestTuiGatewayEntrySignalGuards:
"""Importing tui_gateway.entry must not crash when SIGPIPE/SIGHUP absent.
Linux has both signals, so this is mostly a source-level invariant check
(no bare ``signal.SIGPIPE`` at module level without a ``hasattr`` guard).
On Windows the import would have raised AttributeError before this fix.
"""
def test_source_guards_each_signal_installation(self):
root = Path(__file__).resolve().parents[2]
source = (root / "tui_gateway" / "entry.py").read_text(encoding="utf-8")
# Every signal.signal(...) at module scope must be preceded by a
# hasattr check. We look at the text: no bare "signal.signal("
# call should appear outside a function body without a guard.
# Simpler heuristic: all SIGPIPE / SIGHUP references outside the
# dict-building loop must be wrapped in hasattr.
assert 'hasattr(signal, "SIGPIPE")' in source
assert 'hasattr(signal, "SIGHUP")' in source
assert 'hasattr(signal, "SIGTERM")' in source
assert 'hasattr(signal, "SIGINT")' in source
def test_module_imports_cleanly(self):
"""Importing the module must not raise — verifies the guards work."""
# Drop any cached import so the module re-initialises
for mod in list(sys.modules):
if mod.startswith("tui_gateway"):
del sys.modules[mod]
import tui_gateway.entry # noqa: F401 # must not raise
# ---------------------------------------------------------------------------
# hermes_cli/kanban_db.py waitpid guard
# ---------------------------------------------------------------------------
class TestKanbanWaitpidWindowsGuard:
"""os.WNOHANG doesn't exist on Windows — the dispatcher tick reap loop
must be gated behind ``os.name != "nt"``."""
def test_source_gates_waitpid_loop(self):
root = Path(__file__).resolve().parents[2]
source = (root / "hermes_cli" / "kanban_db.py").read_text(encoding="utf-8")
# Find the waitpid call and confirm it's inside a POSIX gate.
idx = source.find("os.waitpid(-1, os.WNOHANG)")
assert idx > 0, "waitpid call must exist"
# Look backwards up to 400 chars for the gate.
preamble = source[max(0, idx - 400):idx]
assert 'os.name != "nt"' in preamble or "os.name != 'nt'" in preamble, (
"os.waitpid(-1, os.WNOHANG) must sit behind an os.name != 'nt' guard"
)
# ---------------------------------------------------------------------------
# code_execution_tool TCP loopback on Windows
# ---------------------------------------------------------------------------
class TestCodeExecutionTransportTcpFallback:
"""The RPC transport must fall back to TCP on Windows.
We can't easily execute the sandbox on Linux CI in Windows mode, but we
CAN assert that the generated client module supports both AF_UNIX and
AF_INET endpoints based on the HERMES_RPC_SOCKET format.
"""
def test_generated_client_handles_tcp_endpoint(self):
root = Path(__file__).resolve().parents[2]
source = (root / "tools" / "code_execution_tool.py").read_text(encoding="utf-8")
# _UDS_TRANSPORT_HEADER body must parse both transports.
assert 'endpoint.startswith("tcp://")' in source, (
"generated sandbox client must accept tcp:// endpoints for Windows"
)
assert "socket.AF_INET" in source, (
"generated sandbox client must be able to open AF_INET sockets"
)
def test_server_side_branches_on_use_tcp_rpc(self):
root = Path(__file__).resolve().parents[2]
source = (root / "tools" / "code_execution_tool.py").read_text(encoding="utf-8")
assert "_use_tcp_rpc = _IS_WINDOWS" in source
assert 'rpc_endpoint = f"tcp://{_host}:{_port}"' in source
# ---------------------------------------------------------------------------
# cron/scheduler.py /bin/bash dynamic resolution
# ---------------------------------------------------------------------------
class TestCronSchedulerBashResolution:
"""cron.scheduler must NOT hardcode /bin/bash — .sh scripts need a
dynamically-resolved bash so Windows (Git Bash) works."""
def test_source_uses_shutil_which_for_bash(self):
root = Path(__file__).resolve().parents[2]
source = (root / "cron" / "scheduler.py").read_text(encoding="utf-8")
# The old hardcoded path should be gone as the sole bash source.
# It may still appear as a POSIX fallback after shutil.which(), so
# we check for the shutil.which call near the .sh/.bash branch.
assert 'shutil.which("bash")' in source, (
"cron.scheduler must resolve bash dynamically via shutil.which"
)
def test_error_message_when_bash_missing(self):
root = Path(__file__).resolve().parents[2]
source = (root / "cron" / "scheduler.py").read_text(encoding="utf-8")
# The graceful-failure message must mention "bash not found" so
# Windows users without Git Bash see an actionable error instead
# of a WinError 2 traceback.
assert "bash not found" in source.lower()
# ---------------------------------------------------------------------------
# Node-ecosystem launcher resolution (npm / npx / node)
# ---------------------------------------------------------------------------
class TestNpmBareSpawnsResolved:
"""Every spawn site that launches ``npm``/``npx`` must resolve via
shutil.which / hermes_cli._subprocess_compat.resolve_node_command
so Windows can execute the .cmd batch shims."""
@pytest.mark.parametrize(
"relpath",
[
"hermes_cli/tools_config.py",
"hermes_cli/doctor.py",
"gateway/platforms/whatsapp.py",
"tools/browser_tool.py",
],
)
def test_no_bare_npm_or_npx_in_popen_argv(self, relpath):
"""Reject ``subprocess.run(["npm", ...])`` / ``["npx", ...]`` patterns.
Those fail on Windows with WinError 193. Callers must resolve
via shutil.which(...) and pass the absolute path (or fall back
to the bare name only as a last resort behind a variable).
"""
root = Path(__file__).resolve().parents[2]
source = (root / relpath).read_text(encoding="utf-8")
# The forbidden literal: a subprocess invocation that names npm
# or npx as a bare string inside an argv list.
forbidden_patterns = [
'["npm",',
'["npx",',
"['npm',",
"['npx',",
]
for pat in forbidden_patterns:
# Exception: strings inside error-message text or comments are fine.
# We only fail if the literal appears in an argv position, which
# we approximate by checking it isn't inside a print/log/comment.
# Find all occurrences and verify they're behind shutil.which.
idx = 0
while True:
idx = source.find(pat, idx)
if idx < 0:
break
# Look at the preceding 120 chars — if "shutil.which" appears
# there, or the pattern is inside a comment/string, it's fine.
context = source[max(0, idx - 120):idx]
if "#" in context.split("\n")[-1]:
idx += len(pat)
continue
# Argv forms that START with a bare npm/npx are the bug.
raise AssertionError(
f"{relpath}: bare {pat!r} still present at offset {idx}"
f"resolve via shutil.which(...) so Windows can execute .cmd shims"
)
# ---------------------------------------------------------------------------
# tools/environments/local.py Windows temp dir & PATH injection
# ---------------------------------------------------------------------------
class TestLocalEnvironmentWindowsTempDir:
"""LocalEnvironment.get_temp_dir must return a native Windows path on
Windows, NOT the POSIX ``/tmp`` literal (which Python can't open)."""
def test_posix_path_preserved_on_linux(self):
"""Linux/macOS behaviour MUST be unchanged — return / tmp or
tempfile.gettempdir()-derived POSIX path. This is the 'do no harm'
test regressions here break every Unix user's terminal tool."""
from tools.environments.local import LocalEnvironment
env = LocalEnvironment(cwd="/tmp", timeout=10, env={})
tmp_dir = env.get_temp_dir()
if sys.platform != "win32":
assert tmp_dir.startswith("/"), (
f"POSIX temp dir must start with '/'; got {tmp_dir!r}"
)
def test_source_has_windows_branch_using_hermes_home(self):
root = Path(__file__).resolve().parents[2]
source = (root / "tools" / "environments" / "local.py").read_text(encoding="utf-8")
assert "if _IS_WINDOWS:" in source
assert "get_hermes_home" in source
assert 'cache_dir = get_hermes_home() / "cache" / "terminal"' in source
class TestLocalEnvironmentPathInjectionGated:
"""The /usr/bin PATH injection in _make_run_env must be POSIX-only."""
def test_source_gates_path_injection(self):
root = Path(__file__).resolve().parents[2]
source = (root / "tools" / "environments" / "local.py").read_text(encoding="utf-8")
# The fix wraps the injection in `if not _IS_WINDOWS`.
assert 'not _IS_WINDOWS and "/usr/bin" not in existing_path.split(":")' in source
# ---------------------------------------------------------------------------
# cli.py git path normalization
# ---------------------------------------------------------------------------
class TestGitBashPathNormalization:
"""_normalize_git_bash_path should turn /c/Users/... into C:\\Users\\...
on Windows and leave paths unchanged on POSIX."""
def test_posix_noop(self):
"""Must NOT mutate paths on Linux/macOS."""
from cli import _normalize_git_bash_path
if sys.platform != "win32":
assert _normalize_git_bash_path("/home/teknium/foo") == "/home/teknium/foo"
assert _normalize_git_bash_path("/c/Users/foo") == "/c/Users/foo"
assert _normalize_git_bash_path("C:/Users/foo") == "C:/Users/foo"
assert _normalize_git_bash_path(None) is None
def test_empty_string_preserved(self):
from cli import _normalize_git_bash_path
assert _normalize_git_bash_path("") == ""
def test_windows_translation(self, monkeypatch):
"""Simulate Windows and verify /c/Users/... becomes C:\\Users\\..."""
import cli as cli_mod
monkeypatch.setattr(cli_mod.sys, "platform", "win32")
assert cli_mod._normalize_git_bash_path("/c/Users/foo") == r"C:\Users\foo"
assert cli_mod._normalize_git_bash_path("/C/Users/foo") == r"C:\Users\foo"
assert cli_mod._normalize_git_bash_path("/cygdrive/d/data") == r"D:\data"
assert cli_mod._normalize_git_bash_path("/mnt/c/Users") == r"C:\Users"
# Already-native path is preserved
assert cli_mod._normalize_git_bash_path(r"C:\Users\foo") == r"C:\Users\foo"
# Forward-slash Windows path is preserved (git on Windows often
# returns this form; it's valid for both bash and Python, so we
# don't need to translate).
assert cli_mod._normalize_git_bash_path("C:/Users/foo") == "C:/Users/foo"
class TestWorktreeSymlinkFallback:
""".worktreeinclude directory symlinks must fall back to copytree on
Windows (where symlink creation requires admin / Dev Mode)."""
def test_source_has_symlink_fallback(self):
root = Path(__file__).resolve().parents[2]
source = (root / "cli.py").read_text(encoding="utf-8")
# Look for the try/except that handles OSError around os.symlink
# with a shutil.copytree fallback.
assert "os.symlink(str(src_resolved), str(dst))" in source
assert "except (OSError, NotImplementedError)" in source
assert "shutil.copytree" in source
assert 'sys.platform == "win32"' in source
# ---------------------------------------------------------------------------
# Gateway detached watcher — Windows creationflags
# ---------------------------------------------------------------------------
class TestGatewayDetachedWatcherWindowsFlags:
"""launch_detached_profile_gateway_restart and the in-gateway update
launcher must use CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS on
Windows, not silent start_new_session=True."""
def test_hermes_cli_gateway_uses_compat_kwargs(self):
root = Path(__file__).resolve().parents[2]
source = (root / "hermes_cli" / "gateway.py").read_text(encoding="utf-8")
assert "windows_detach_popen_kwargs" in source, (
"hermes_cli/gateway.py must use the platform-aware detach helper"
)
# The legacy start_new_session=True on the outer Popen should be
# replaced by **windows_detach_popen_kwargs(). Inside the watcher
# STRING the old pattern is replaced by explicit creationflags.
assert "**windows_detach_popen_kwargs()" in source
def test_gateway_run_update_has_windows_branch(self):
root = Path(__file__).resolve().parents[2]
source = (root / "gateway" / "run.py").read_text(encoding="utf-8")
# Both the /restart and /update paths must have sys.platform=='win32' branches.
assert 'if sys.platform == "win32":' in source
# Windows branch uses windows_detach_popen_kwargs
assert "windows_detach_popen_kwargs" in source

View file

@ -708,7 +708,16 @@ def _run_chrome_fallback_command(
) )
return {"success": False, "error": hint} return {"success": False, "error": hint}
cmd_prefix = ["npx", "agent-browser"] if browser_cmd == "npx agent-browser" else [browser_cmd] # On Windows npx is npx.cmd — use shutil.which so CreateProcessW can
# execute the batch shim. shutil.which honours PATHEXT on Windows and
# returns the plain executable on POSIX. If npx isn't on PATH (Termux,
# bare container), fall back to the bare name and let Popen raise with
# a readable "FileNotFoundError: 'npx'" rather than WinError 193.
if browser_cmd == "npx agent-browser":
_npx_bin = shutil.which("npx") or "npx"
cmd_prefix = [_npx_bin, "agent-browser"]
else:
cmd_prefix = [browser_cmd]
base_args = cmd_prefix + ["--engine", "chrome", "--session", tmp_session, "--json"] base_args = cmd_prefix + ["--engine", "chrome", "--session", tmp_session, "--json"]
task_socket_dir = os.path.join(_socket_safe_tmpdir(), f"agent-browser-{tmp_session}") task_socket_dir = os.path.join(_socket_safe_tmpdir(), f"agent-browser-{tmp_session}")
@ -1768,7 +1777,12 @@ def _run_browser_command(
# Keep concrete executable paths intact, even when they contain spaces. # Keep concrete executable paths intact, even when they contain spaces.
# Only the synthetic npx fallback needs to expand into multiple argv items. # Only the synthetic npx fallback needs to expand into multiple argv items.
cmd_prefix = ["npx", "agent-browser"] if browser_cmd == "npx agent-browser" else [browser_cmd] # shutil.which resolves npx → npx.cmd on Windows; bare "npx" stays on POSIX.
if browser_cmd == "npx agent-browser":
_npx_bin = shutil.which("npx") or "npx"
cmd_prefix = [_npx_bin, "agent-browser"]
else:
cmd_prefix = [browser_cmd]
cmd_parts = cmd_prefix + backend_args + [ cmd_parts = cmd_prefix + backend_args + [
"--json", "--json",

View file

@ -235,10 +235,27 @@ _call_lock = threading.Lock()
''' + _COMMON_HELPERS + '''\ ''' + _COMMON_HELPERS + '''\
def _connect(): def _connect():
"""Connect to the parent's RPC server via the transport it picked.
HERMES_RPC_SOCKET can be either:
- a filesystem path (POSIX Unix domain socket the default on
Linux and macOS)
- a string of the form ``tcp://127.0.0.1:<port>`` (Windows, where
AF_UNIX is unreliable the parent falls back to loopback TCP)
"""
global _sock global _sock
if _sock is None: if _sock is None:
_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) endpoint = os.environ["HERMES_RPC_SOCKET"]
_sock.connect(os.environ["HERMES_RPC_SOCKET"]) if endpoint.startswith("tcp://"):
# tcp://host:port (host is always 127.0.0.1 in practice — we
# only bind loopback server-side)
_host_port = endpoint[len("tcp://"):]
_host, _, _port = _host_port.rpartition(":")
_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
_sock.connect((_host or "127.0.0.1", int(_port)))
else:
_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
_sock.connect(endpoint)
_sock.settimeout(300) _sock.settimeout(300)
return _sock return _sock
@ -988,8 +1005,22 @@ def execute_code(
# Use /tmp on macOS to avoid the long /var/folders/... path that pushes # Use /tmp on macOS to avoid the long /var/folders/... path that pushes
# Unix domain socket paths past the 104-byte macOS AF_UNIX limit. # Unix domain socket paths past the 104-byte macOS AF_UNIX limit.
# On Linux, tempfile.gettempdir() already returns /tmp. # On Linux, tempfile.gettempdir() already returns /tmp.
#
# Windows: Python 3.9+ added partial AF_UNIX support but the file-backed
# variant is flaky across Windows builds (requires Windows 10 1803+,
# still fails under some configurations, and the socket file can't live
# on the same temp drive as the script). Fall back to loopback TCP —
# same ephemeral port, same 1-connection listen queue, same serialized
# request/response framing. The generated client reads the transport
# selector from HERMES_RPC_SOCKET (path vs. ``tcp://host:port``).
_sock_tmpdir = "/tmp" if sys.platform == "darwin" else tempfile.gettempdir() _sock_tmpdir = "/tmp" if sys.platform == "darwin" else tempfile.gettempdir()
sock_path = os.path.join(_sock_tmpdir, f"hermes_rpc_{uuid.uuid4().hex}.sock") _use_tcp_rpc = _IS_WINDOWS
if _use_tcp_rpc:
sock_path = None # not used on Windows; TCP endpoint stored below
rpc_endpoint = None # set after bind()
else:
sock_path = os.path.join(_sock_tmpdir, f"hermes_rpc_{uuid.uuid4().hex}.sock")
rpc_endpoint = sock_path
tool_call_log: list = [] tool_call_log: list = []
tool_call_counter = [0] # mutable so the RPC thread can increment tool_call_counter = [0] # mutable so the RPC thread can increment
@ -1008,10 +1039,24 @@ def execute_code(
with open(os.path.join(tmpdir, "script.py"), "w") as f: with open(os.path.join(tmpdir, "script.py"), "w") as f:
f.write(code) f.write(code)
# --- Start UDS server --- # --- Start RPC server ---
server_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) # Two transports:
server_sock.bind(sock_path) # POSIX: AF_UNIX stream socket on sock_path, chmod 0600 for
os.chmod(sock_path, 0o600) # owner-only access. Filesystem permissions gate the socket.
# Windows: AF_INET stream socket on 127.0.0.1 with an ephemeral
# port. No filesystem permission story, but loopback-only bind
# means only the current user's processes (not remote) can
# connect. HERMES_RPC_SOCKET is set to ``tcp://127.0.0.1:<port>``
# which the generated client parses to pick AF_INET.
if _use_tcp_rpc:
server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_sock.bind(("127.0.0.1", 0)) # ephemeral port
_host, _port = server_sock.getsockname()[:2]
rpc_endpoint = f"tcp://{_host}:{_port}"
else:
server_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
server_sock.bind(sock_path)
os.chmod(sock_path, 0o600)
server_sock.listen(1) server_sock.listen(1)
rpc_thread = threading.Thread( rpc_thread = threading.Thread(
@ -1053,7 +1098,7 @@ def execute_code(
# Allow vars with known safe prefixes. # Allow vars with known safe prefixes.
if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES): if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES):
child_env[k] = v child_env[k] = v
child_env["HERMES_RPC_SOCKET"] = sock_path child_env["HERMES_RPC_SOCKET"] = rpc_endpoint
child_env["PYTHONDONTWRITEBYTECODE"] = "1" child_env["PYTHONDONTWRITEBYTECODE"] = "1"
# Ensure the hermes-agent root is importable in the sandbox so # Ensure the hermes-agent root is importable in the sandbox so
# repo-root modules are available to child scripts. We also prepend # repo-root modules are available to child scripts. We also prepend
@ -1302,7 +1347,10 @@ def execute_code(
import shutil import shutil
shutil.rmtree(tmpdir, ignore_errors=True) shutil.rmtree(tmpdir, ignore_errors=True)
try: try:
os.unlink(sock_path) # Only UDS has a filesystem socket to unlink; TCP sockets are
# freed by server_sock.close() above.
if sock_path:
os.unlink(sock_path)
except OSError: except OSError:
pass # already cleaned up or never created pass # already cleaned up or never created

View file

@ -9,6 +9,7 @@ import signal
import subprocess import subprocess
import tempfile import tempfile
import time import time
from pathlib import Path
from tools.environments.base import BaseEnvironment, _pipe_stdin from tools.environments.base import BaseEnvironment, _pipe_stdin
@ -249,7 +250,15 @@ def _make_run_env(env: dict) -> dict:
elif k not in _HERMES_PROVIDER_ENV_BLOCKLIST or _is_passthrough(k): elif k not in _HERMES_PROVIDER_ENV_BLOCKLIST or _is_passthrough(k):
run_env[k] = v run_env[k] = v
existing_path = run_env.get("PATH", "") existing_path = run_env.get("PATH", "")
if "/usr/bin" not in existing_path.split(":"): # The "/usr/bin not already present → inject sane POSIX path" heuristic
# only makes sense on POSIX. On Windows the PATH separator is ";"
# (the split(":") above turns a full Windows PATH into a single
# unrecognisable chunk, which then triggers prepending POSIX paths
# to a Windows PATH — completely wrong). Skip the injection entirely
# on Windows; the native PATH already points at whatever shell
# Hermes is driving via _find_bash (Git Bash), and Git Bash itself
# prepends its MSYS2 /usr/bin equivalent via the shell-init files.
if not _IS_WINDOWS and "/usr/bin" not in existing_path.split(":"):
run_env["PATH"] = f"{existing_path}:{_SANE_PATH}" if existing_path else _SANE_PATH run_env["PATH"] = f"{existing_path}:{_SANE_PATH}" if existing_path else _SANE_PATH
# Per-profile HOME isolation: redirect system tool configs (git, ssh, gh, # Per-profile HOME isolation: redirect system tool configs (git, ssh, gh,
@ -371,7 +380,29 @@ class LocalEnvironment(BaseEnvironment):
Check the environment configured for this backend first so callers can Check the environment configured for this backend first so callers can
override the temp root explicitly (for example via terminal.env or a override the temp root explicitly (for example via terminal.env or a
custom TMPDIR), then fall back to the host process environment. custom TMPDIR), then fall back to the host process environment.
**Windows:** hardcoded ``/tmp`` is wrong in two ways native Python
can't open the path, and the Windows default temp (``%TEMP%``) often
contains spaces (``C:\\Users\\Some Name\\AppData\\Local\\Temp``) that
break unquoted bash interpolations. Use a dedicated cache dir under
``HERMES_HOME`` instead single-word path, guaranteed to exist, same
string resolves in both Git Bash and native Python.
""" """
if _IS_WINDOWS:
# Derive a Windows-safe temp dir under HERMES_HOME. Using
# forward slashes makes the same string work unchanged in bash
# command interpolations AND in Python ``open()`` — Windows
# accepts forward slashes in filesystem paths, and we control
# the path so we can guarantee no spaces.
try:
from hermes_constants import get_hermes_home
cache_dir = get_hermes_home() / "cache" / "terminal"
except Exception:
cache_dir = Path(tempfile.gettempdir()) / "hermes_terminal"
cache_dir.mkdir(parents=True, exist_ok=True)
# Force forward slashes so the same string serves both contexts.
return str(cache_dir).replace("\\", "/")
for env_var in ("TMPDIR", "TMP", "TEMP"): for env_var in ("TMPDIR", "TMP", "TEMP"):
candidate = self.env.get(env_var) or os.environ.get(env_var) candidate = self.env.get(env_var) or os.environ.get(env_var)
if candidate and candidate.startswith("/"): if candidate and candidate.startswith("/"):

View file

@ -81,11 +81,14 @@ def _log_signal(signum: int, frame) -> None:
thread, and fall back to ``os._exit(0)`` so a wedged write/flush thread, and fall back to ``os._exit(0)`` so a wedged write/flush
can never strand the process. can never strand the process.
""" """
name = { # SIGPIPE and SIGHUP don't exist on Windows — build the lookup
signal.SIGPIPE: "SIGPIPE", # dict from attributes that actually exist on the current platform.
signal.SIGTERM: "SIGTERM", _signal_names: dict[int, str] = {}
signal.SIGHUP: "SIGHUP", for _attr in ("SIGPIPE", "SIGTERM", "SIGHUP", "SIGINT", "SIGBREAK"):
}.get(signum, f"signal {signum}") _sig = getattr(signal, _attr, None)
if _sig is not None:
_signal_names[int(_sig)] = _attr
name = _signal_names.get(signum, f"signal {signum}")
try: try:
os.makedirs(os.path.dirname(_CRASH_LOG), exist_ok=True) os.makedirs(os.path.dirname(_CRASH_LOG), exist_ok=True)
with open(_CRASH_LOG, "a", encoding="utf-8") as f: with open(_CRASH_LOG, "a", encoding="utf-8") as f:
@ -140,10 +143,23 @@ def _log_signal(signum: int, frame) -> None:
# sys.exit(0) + _log_exit), which keeps the gateway alive as long as # sys.exit(0) + _log_exit), which keeps the gateway alive as long as
# the main command pipe is still readable. Terminal signals still # the main command pipe is still readable. Terminal signals still
# route through _log_signal so kills and hangups are diagnosable. # route through _log_signal so kills and hangups are diagnosable.
signal.signal(signal.SIGPIPE, signal.SIG_IGN) #
signal.signal(signal.SIGTERM, _log_signal) # SIGPIPE and SIGHUP don't exist on Windows; guard each installation
signal.signal(signal.SIGHUP, _log_signal) # with hasattr so ``python -m tui_gateway.entry`` (spawned by
signal.signal(signal.SIGINT, signal.SIG_IGN) # ``hermes --tui``) imports cleanly there. SIGBREAK (Windows' Ctrl+Break)
# is installed when available as a weaker equivalent of SIGHUP.
if hasattr(signal, "SIGPIPE"):
signal.signal(signal.SIGPIPE, signal.SIG_IGN)
if hasattr(signal, "SIGTERM"):
signal.signal(signal.SIGTERM, _log_signal)
if hasattr(signal, "SIGHUP"):
signal.signal(signal.SIGHUP, _log_signal)
elif hasattr(signal, "SIGBREAK"):
# Windows-only: Ctrl+Break in a console window delivers SIGBREAK.
# Route it through the same handler so kills are diagnosable.
signal.signal(signal.SIGBREAK, _log_signal)
if hasattr(signal, "SIGINT"):
signal.signal(signal.SIGINT, signal.SIG_IGN)
def _log_exit(reason: str) -> None: def _log_exit(reason: str) -> None: