hermes-agent/hermes_cli/_subprocess_compat.py
brooklyn! 5db1430af9
fix(windows): stop terminal-window popups from background spawns (#53810)
* fix(windows): stop terminal-window popups from background spawns

Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:

- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
  agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
  subprocess allocate a fresh console when the launching process has
  none (pythonw desktop backend / detached gateway) - even with output
  captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
  get a console regardless of how they're launched.

Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
  CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
  console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
  exclusively owns an auto-allocated console detaches it at startup
  (GetConsoleProcessList()==1 gate leaves shared interactive consoles
  untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
  tools repo-wide so the popup class can't regress.

* fix(windows): scope FreeConsole to background entry points; fix merge fallout

Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.

- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
  remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
  (start_server, what the desktop spawns), cron standalone, tui_gateway entry,
  slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
  no-op without console / on POSIX, and asserts it's not an import side effect.

Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
  every terminal init). _subprocess_compat.popen already hides the window, so
  drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
  primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
  case-variant test to drive that production path.

* test(claw): mock _subprocess_compat.run seam for Windows process scan

claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
2026-06-27 14:02:24 -07:00

275 lines
11 KiB
Python

"""Windows subprocess compatibility helpers.
Hermes is developed on Linux / macOS and tested natively on Windows too.
Several common subprocess patterns break silently-or-loudly on Windows:
* ``["npm", "install", ...]`` — on Windows ``npm`` is ``npm.cmd``, a batch
shim. ``subprocess.Popen(["npm", ...])`` fails with WinError 193
("not a valid Win32 application") because CreateProcessW can't run a
``.cmd`` file without ``shell=True`` or PATHEXT resolution.
* ``start_new_session=True`` — on POSIX, this maps to ``os.setsid()`` and
actually detaches the child. On Windows it's silently ignored; the
Windows equivalent is ``CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS``
creationflags, which Python only applies when you pass them explicitly.
* Console-window flashes — every ``subprocess.Popen`` of a ``.exe`` on
Windows spawns a cmd window briefly unless ``CREATE_NO_WINDOW`` is
passed. Cosmetic but jarring for background daemons.
This module centralizes the platform-branching logic so the rest of the
codebase doesn't sprinkle ``if sys.platform == "win32":`` everywhere.
**All helpers are no-ops on non-Windows** — calling them in Linux/macOS
code paths is safe by design. That's the "do no damage on POSIX"
guarantee.
"""
from __future__ import annotations
import shutil
import subprocess
import sys
from typing import Sequence
__all__ = [
"IS_WINDOWS",
"resolve_node_command",
"run",
"popen",
"windows_detach_flags",
"windows_detach_flags_without_breakaway",
"windows_hide_flags",
"windows_detach_popen_kwargs",
]
IS_WINDOWS = sys.platform == "win32"
# -----------------------------------------------------------------------------
# Node ecosystem launcher resolution
# -----------------------------------------------------------------------------
def resolve_node_command(name: str, argv: Sequence[str]) -> list[str]:
"""Resolve a Node-ecosystem command name to an absolute-path argv.
On Windows, commands like ``npm``, ``npx``, ``yarn``, ``pnpm``,
``playwright``, ``prettier`` ship as ``.cmd`` files (batch shims).
``subprocess.Popen(["npm", "install"])`` fails with WinError 193
because CreateProcessW doesn't execute batch files directly.
``shutil.which(name)`` *does* resolve ``.cmd`` via PATHEXT and returns
the fully-qualified path — which CreateProcessW accepts because the
extension tells Windows to route through ``cmd.exe /c``.
On POSIX ``shutil.which`` also returns a fully-qualified path when
found. That's a small change from bare-name resolution (the OS does
its own PATH search) but functionally identical and has the side
benefit of making the argv reproducible in logs.
Behavior when the command is not on PATH:
- On Windows: return the bare name — caller can still try with
``shell=True`` as a last resort, OR the subsequent Popen will
raise FileNotFoundError with a readable error we want to surface.
- On POSIX: same. Bare ``npm`` on a Linux box without npm installed
fails the same way it did before this function existed.
Args:
name: The command name to resolve (``npm``, ``npx``, ``node`` …).
argv: The remaining arguments. Must NOT include ``name`` itself —
this function builds the full argv list.
Returns:
A list suitable for passing to subprocess.Popen/run/call.
"""
resolved = shutil.which(name)
if resolved:
return [resolved, *argv]
return [name, *argv]
# -----------------------------------------------------------------------------
# Detached / hidden process creation
# -----------------------------------------------------------------------------
# Win32 CreationFlags — defined here rather than imported from subprocess
# because CREATE_NO_WINDOW and DETACHED_PROCESS aren't guaranteed to be
# present on stdlib subprocess on older Pythons or non-Windows builds.
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
# Escape any Win32 job object the parent process belongs to. Without this,
# a detached child still inherits its parent's job object membership, and
# when that parent (Electron, Tauri, Windows Terminal, the Desktop GUI's
# bootstrap-installer) dies, the OS tears down the whole job — taking the
# "detached" child with it. Critical for the post-update gateway watcher:
# Electron spawns the Tauri updater inside its own job, the updater spawns
# the watcher subprocess; without BREAKAWAY the watcher dies the instant
# Electron exits, so the gateway never gets respawned after a `hermes
# update` triggered from the GUI. See fix/windows-gateway-reliability.
_CREATE_BREAKAWAY_FROM_JOB = 0x01000000
def windows_detach_flags() -> int:
"""Return Win32 creationflags that detach a child from the parent
console and process group. 0 on non-Windows.
Pair with ``start_new_session=False`` (default) when calling
subprocess.Popen — on POSIX use ``start_new_session=True`` instead,
which maps to ``os.setsid()`` in the child.
Rationale:
- ``CREATE_NEW_PROCESS_GROUP`` — child has its own process group so
Ctrl+C in the parent console doesn't propagate.
- ``DETACHED_PROCESS`` — child has no console at all. Necessary for
background daemons (gateway watchers, update respawners) because
without it, closing the console kills the child.
- ``CREATE_NO_WINDOW`` — suppress the brief cmd flash that would
otherwise appear when launching a console app. Redundant with
DETACHED_PROCESS but explicit for clarity.
- ``CREATE_BREAKAWAY_FROM_JOB`` — escape any job object the parent is
in. Electron (Desktop app) and Tauri (bootstrap installer) wrap
their children in job objects; without breakaway, those children
die when the parent process exits even if they were spawned with
DETACHED_PROCESS. This was the missing flag that made the
post-update gateway respawn watcher silently die alongside the
Tauri updater after the Electron Desktop's update flow finished.
If a process is in a job that disallows breakaway (rare —
JOB_OBJECT_LIMIT_BREAKAWAY_OK isn't set), CreateProcess returns
ERROR_ACCESS_DENIED. Python surfaces that as ``PermissionError``
on the ``subprocess.Popen`` call. Callers in this codebase already
wrap detached spawns in ``try/except OSError`` and fall back to a
cmd.exe wrapper, so the breakaway-denied case degrades gracefully
rather than crashing.
"""
if not IS_WINDOWS:
return 0
return (
_CREATE_NEW_PROCESS_GROUP
| _DETACHED_PROCESS
| _CREATE_NO_WINDOW
| _CREATE_BREAKAWAY_FROM_JOB
)
def windows_detach_flags_without_breakaway() -> int:
"""Same as :func:`windows_detach_flags` minus ``CREATE_BREAKAWAY_FROM_JOB``.
The docstring on :func:`windows_detach_flags` notes that a process in
a job which disallows breakaway (no ``JOB_OBJECT_LIMIT_BREAKAWAY_OK``)
will see ``ERROR_ACCESS_DENIED`` from CreateProcess, surfacing as
``OSError`` (``PermissionError``) on the ``subprocess.Popen`` call.
Callers that want to recover — by retrying without the breakaway
bit — can pair the two helpers symbolically rather than coding the
``& ~0x01000000`` magic at every site:
.. code-block:: python
try:
subprocess.Popen(argv, creationflags=windows_detach_flags(), …)
except OSError:
subprocess.Popen(
argv,
creationflags=windows_detach_flags_without_breakaway(),
…,
)
See ``gateway_windows.py::_spawn_detached`` for the canonical
implementation of this pattern. Returns 0 on non-Windows.
"""
if not IS_WINDOWS:
return 0
return _CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW
def windows_hide_flags() -> int:
"""Return Win32 creationflags that merely hide the child's console
window without detaching the child. 0 on non-Windows.
Use for short-lived console apps spawned as part of a larger
operation (``taskkill``, ``where``, version probes) where we want no
flash but also want to collect stdout/exit code synchronously.
The key difference from :func:`windows_detach_flags`: NO
``DETACHED_PROCESS`` — the child still inherits stdio handles so
``capture_output=True`` works. ``DETACHED_PROCESS`` would sever
stdio and break stdout capture.
"""
if not IS_WINDOWS:
return 0
return _CREATE_NO_WINDOW
# -----------------------------------------------------------------------------
# The single chokepoint for spawning a process without a console window.
# -----------------------------------------------------------------------------
def _no_window(kwargs: dict) -> dict:
"""OR ``CREATE_NO_WINDOW`` into ``creationflags`` on Windows (no-op on POSIX).
Merges rather than overwrites, so a caller that needs detach semantics can
pass ``creationflags=windows_detach_flags()`` and still go through here —
``CREATE_NO_WINDOW`` is already part of that bundle, so the OR is idempotent.
"""
if IS_WINDOWS:
kwargs["creationflags"] = kwargs.get("creationflags", 0) | _CREATE_NO_WINDOW
return kwargs
def run(cmd, **kwargs):
"""``subprocess.run`` that never flashes a console window on Windows.
This is the primitive every Hermes spawn of a *console-subsystem* program
(``taskkill``, ``schtasks``, ``agent-browser``, ``git-bash``, version
probes, …) must use. Routing through one function makes "no visible
terminal" structural instead of a per-call-site rule that gets forgotten —
which is exactly how cron-driven and future spawns leaked windows before.
Python child processes are additionally covered by the ``FreeConsole``
catch-all in :mod:`hermes_bootstrap`, but native exes can't run that, so the
spawn-time flag here is the only thing that helps them.
"""
return subprocess.run(cmd, **_no_window(kwargs))
def popen(cmd, **kwargs):
"""``subprocess.Popen`` counterpart of :func:`run` — see its docstring."""
return subprocess.Popen(cmd, **_no_window(kwargs))
def windows_detach_popen_kwargs() -> dict:
"""Return a dict of Popen kwargs that detach a child on Windows and
fall back to the POSIX equivalent (``start_new_session=True``) on
Linux/macOS.
Usage pattern:
.. code-block:: python
subprocess.Popen(
argv,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
stdin=subprocess.DEVNULL,
close_fds=True,
**windows_detach_popen_kwargs(),
)
This replaces the unsafe-on-Windows pattern:
.. code-block:: python
subprocess.Popen(..., start_new_session=True)
which silently fails to detach on Windows (the flag is accepted but
has no effect — the child stays attached to the parent's console
and dies when the console closes).
"""
if IS_WINDOWS:
return {"creationflags": windows_detach_flags()}
return {"start_new_session": True}