mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-09 03:11:58 +00:00
Native Windows (with Git for Windows installed) can now run the Hermes CLI and gateway end-to-end without crashing. install.ps1 already existed and the Git Bash terminal backend was already wired up — this PR fills the remaining gaps discovered by auditing every Windows-unsafe primitive (`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios` imports) and by comparing hermes against how Claude Code, OpenCode, Codex, and Cline handle native Windows. ## What changed ### UTF-8 stdio (new module) - `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point. Flips the console code page to CP_UTF8 (65001), reconfigures `sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8` for subprocesses. No-op on non-Windows. Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`. - Called early in `cli.py::main`, `hermes_cli/main.py::main`, and `gateway/run.py::main` so Unicode banners (box-drawing, geometric symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252 consoles. ### Crash sites fixed - `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw `os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)` which routes through `taskkill /T /F` on Windows. - `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also converted SIGTERM path to `terminate_pid()` and widened OSError catch on the intermediate `os.kill(pid, 0)` probe. - `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` → `getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the pattern already used in `gateway/status.py`). ### OSError widening on `os.kill(pid, 0)` probes Windows raises `OSError` (WinError 87) for a gone PID instead of `ProcessLookupError`. Widened the catch at: - `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this, the loop busy-spins the full 10s every Windows gateway start) - `hermes_cli/gateway.py:228, 460, 940` - `hermes_cli/profiles.py:777` - `tools/process_registry.py::_is_host_pid_alive` - `tools/browser_tool.py:1170, 1206` ### Dashboard PTY graceful degradation `hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`, none of which exist on native Windows. Previously a Windows dashboard would crash on `import hermes_cli.web_server` because of a top-level import. Now: - `hermes_cli/web_server.py` wraps the pty_bridge import in `try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`. - The `/api/pty` WebSocket handler returns a friendly "use WSL2 for this tab" message instead of exploding. - Every other dashboard feature (sessions, jobs, metrics, config editor) runs natively on Windows. ### Dependency - `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so Python's `zoneinfo` works on Windows (which has no IANA tzdata shipped with the OS). Credits @sprmn24 (PR #13182). ### Docs - README.md: removed "Native Windows is not supported"; added PowerShell one-liner and Git-for-Windows prerequisite note. - `website/docs/getting-started/installation.md`: new Windows section with capability matrix (everything native except the dashboard `/chat` PTY tab, which is WSL2-only). - `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as "WSL2 as an alternative to native" rather than "the only way". - `website/docs/developer-guide/contributing.md`: updated cross-platform guidance with the `signal.SIGKILL` / `OSError` rules we enforce now. - `website/docs/user-guide/features/web-dashboard.md`: acknowledged native Windows works for everything except the embedded PTY pane. ## Why this shape Pulled from a survey of how other agent codebases handle native Windows (Claude Code, OpenCode, Codex, Cline): - All four treat Git Bash as the canonical shell on Windows, same as hermes already does in `tools/environments/local.py::_find_bash()`. - None of them force `SetConsoleOutputCP` — but they don't have to, Node/Rust write UTF-16 to the Win32 console API. Python does not get that for free, so we flip CP_UTF8 via ctypes. - None of them ship PowerShell-as-primary-shell (Claude Code exposes PS as a secondary tool; scope creep for this PR). - All of them use `taskkill /T /F` for force-kill on Windows, which is exactly what `gateway.status.terminate_pid(force=True)` does. ## Non-goals (deliberate scope limits) - No PowerShell-as-a-second-shell tool — worth designing separately. - No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's the hardest design call and needs a separate doc. - No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld cluster) — will do as follow-up if users hit actual breakage; most modern code already specifies it. ## Validation - 28 new tests in `tests/tools/test_windows_native_support.py` — all platform-mocked, pass on Linux CI. Cover: - `configure_windows_stdio` idempotency, opt-out, env-preservation - `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback - `getattr(signal, "SIGKILL", …)` fallback shape - `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior) - Source-level checks that all entry points call `configure_windows_stdio` - pty_bridge import-guard present in `web_server.py` - README no longer says "not supported" - 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass. - `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed pre-existing on main by stash-test). - `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure). - `tests/tools/test_process_registry.py` + `test_browser_*` pass. - Manual smoke: `import hermes_cli.stdio; import gateway.run; import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True` on Linux (as expected). ## Files - New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py` - Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, `hermes_cli/profiles.py`, `hermes_cli/gateway.py`, `hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`, `hermes_cli/web_server.py`, `tools/browser_tool.py`, `tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4 docs pages. Credits to everyone whose prior PR work informed these fixes — see the co-author trailers. All of the PRs listed in `~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL` / UTF-8 stdio / tzdata / README patterns found the same issues; this PR consolidates them. Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com> Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com> Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com> Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com> Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com> Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com> Co-authored-by: sprmn24 <oncuevtv@gmail.com> Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com> Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com>
237 lines
8.5 KiB
Python
237 lines
8.5 KiB
Python
"""PTY bridge for `hermes dashboard` chat tab.
|
|
|
|
Wraps a child process behind a pseudo-terminal so its ANSI output can be
|
|
streamed to a browser-side terminal emulator (xterm.js) and typed
|
|
keystrokes can be fed back in. The only caller today is the
|
|
``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
|
|
|
|
Design constraints:
|
|
|
|
* **POSIX-only.** This module depends on ``fcntl``, ``termios``, and
|
|
``ptyprocess``, none of which exist on native Windows Python. Native
|
|
Windows ConPTY is a different API (Windows 10 build 17763+) and would
|
|
need a separate Windows implementation (``pywinpty``) — that's tracked
|
|
as a future enhancement. On native Windows, importing this module
|
|
raises :class:`ImportError` and the dashboard's ``/chat`` tab shows a
|
|
WSL-recommended banner instead of crashing. Every other feature in the
|
|
dashboard (sessions, jobs, metrics, config editor) works natively.
|
|
* **Zero Node dependency on the server side.** We use :mod:`ptyprocess`,
|
|
which is a pure-Python wrapper around the OS calls. The browser talks
|
|
to the same ``hermes --tui`` binary it would launch from the CLI, so
|
|
every TUI feature (slash popover, model picker, tool rows, markdown,
|
|
skin engine, clarify/sudo/approval prompts) ships automatically.
|
|
* **Byte-safe I/O.** Reads and writes go through the PTY master fd
|
|
directly — we avoid :class:`ptyprocess.PtyProcessUnicode` because
|
|
streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
|
|
mid-read.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import errno
|
|
import fcntl
|
|
import os
|
|
import select
|
|
import signal
|
|
import struct
|
|
import sys
|
|
import termios
|
|
import time
|
|
from typing import Optional, Sequence
|
|
|
|
try:
|
|
import ptyprocess # type: ignore
|
|
_PTY_AVAILABLE = not sys.platform.startswith("win")
|
|
except ImportError: # pragma: no cover - dev env without ptyprocess
|
|
ptyprocess = None # type: ignore
|
|
_PTY_AVAILABLE = False
|
|
|
|
|
|
__all__ = ["PtyBridge", "PtyUnavailableError"]
|
|
|
|
|
|
class PtyUnavailableError(RuntimeError):
|
|
"""Raised when a PTY cannot be created on this platform.
|
|
|
|
Today this means native Windows (no ConPTY bindings) or a dev
|
|
environment missing the ``ptyprocess`` dependency. The dashboard
|
|
surfaces the message to the user as a chat-tab banner.
|
|
"""
|
|
|
|
|
|
class PtyBridge:
|
|
"""Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
|
|
|
|
Not thread-safe. A single bridge is owned by the WebSocket handler
|
|
that spawned it; the reader runs in an executor thread while writes
|
|
happen on the event-loop thread. Both sides are OK because the
|
|
kernel PTY is the actual synchronization point — we never call
|
|
:mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
|
|
``os.write`` on the master fd, which is safe.
|
|
"""
|
|
|
|
def __init__(self, proc: "ptyprocess.PtyProcess"): # type: ignore[name-defined]
|
|
self._proc = proc
|
|
self._fd: int = proc.fd
|
|
self._closed = False
|
|
|
|
# -- lifecycle --------------------------------------------------------
|
|
|
|
@classmethod
|
|
def is_available(cls) -> bool:
|
|
"""True if a PTY can be spawned on this platform."""
|
|
return bool(_PTY_AVAILABLE)
|
|
|
|
@classmethod
|
|
def spawn(
|
|
cls,
|
|
argv: Sequence[str],
|
|
*,
|
|
cwd: Optional[str] = None,
|
|
env: Optional[dict] = None,
|
|
cols: int = 80,
|
|
rows: int = 24,
|
|
) -> "PtyBridge":
|
|
"""Spawn ``argv`` behind a new PTY and return a bridge.
|
|
|
|
Raises :class:`PtyUnavailableError` if the platform can't host a
|
|
PTY. Raises :class:`FileNotFoundError` or :class:`OSError` for
|
|
ordinary exec failures (missing binary, bad cwd, etc.).
|
|
"""
|
|
if not _PTY_AVAILABLE:
|
|
if sys.platform.startswith("win"):
|
|
raise PtyUnavailableError(
|
|
"Pseudo-terminals are unavailable on this platform. "
|
|
"Hermes Agent supports Windows only via WSL."
|
|
)
|
|
if ptyprocess is None:
|
|
raise PtyUnavailableError(
|
|
"The `ptyprocess` package is missing. "
|
|
"Install with: pip install ptyprocess "
|
|
"(or pip install -e '.[pty]')."
|
|
)
|
|
raise PtyUnavailableError("Pseudo-terminals are unavailable.")
|
|
# PTY-hosted programs expect TERM to describe the terminal type.
|
|
# CI often runs without TERM in the parent process, which makes
|
|
# simple terminal probes like `tput cols` fail before winsize reads.
|
|
# Preserve explicit caller overrides, but backfill a sensible default
|
|
# when TERM is missing or blank.
|
|
spawn_env = (os.environ.copy() if env is None else env.copy())
|
|
if not spawn_env.get("TERM"):
|
|
spawn_env["TERM"] = "xterm-256color"
|
|
proc = ptyprocess.PtyProcess.spawn( # type: ignore[union-attr]
|
|
list(argv),
|
|
cwd=cwd,
|
|
env=spawn_env,
|
|
dimensions=(rows, cols),
|
|
)
|
|
return cls(proc)
|
|
|
|
@property
|
|
def pid(self) -> int:
|
|
return int(self._proc.pid)
|
|
|
|
def is_alive(self) -> bool:
|
|
if self._closed:
|
|
return False
|
|
try:
|
|
return bool(self._proc.isalive())
|
|
except Exception:
|
|
return False
|
|
|
|
# -- I/O --------------------------------------------------------------
|
|
|
|
def read(self, timeout: float = 0.2) -> Optional[bytes]:
|
|
"""Read up to 64 KiB of raw bytes from the PTY master.
|
|
|
|
Returns:
|
|
* bytes — zero or more bytes of child output
|
|
* empty bytes (``b""``) — no data available within ``timeout``
|
|
* None — child has exited and the master fd is at EOF
|
|
|
|
Never blocks longer than ``timeout`` seconds. Safe to call after
|
|
:meth:`close`; returns ``None`` in that case.
|
|
"""
|
|
if self._closed:
|
|
return None
|
|
try:
|
|
readable, _, _ = select.select([self._fd], [], [], timeout)
|
|
except (OSError, ValueError):
|
|
return None
|
|
if not readable:
|
|
return b""
|
|
try:
|
|
data = os.read(self._fd, 65536)
|
|
except OSError as exc:
|
|
# EIO on Linux = slave side closed. EBADF = already closed.
|
|
if exc.errno in (errno.EIO, errno.EBADF):
|
|
return None
|
|
raise
|
|
if not data:
|
|
return None
|
|
return data
|
|
|
|
def write(self, data: bytes) -> None:
|
|
"""Write raw bytes to the PTY master (i.e. the child's stdin)."""
|
|
if self._closed or not data:
|
|
return
|
|
# os.write can return a short write under load; loop until drained.
|
|
view = memoryview(data)
|
|
while view:
|
|
try:
|
|
n = os.write(self._fd, view)
|
|
except OSError as exc:
|
|
if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
|
|
return
|
|
raise
|
|
if n <= 0:
|
|
return
|
|
view = view[n:]
|
|
|
|
def resize(self, cols: int, rows: int) -> None:
|
|
"""Forward a terminal resize to the child via ``TIOCSWINSZ``."""
|
|
if self._closed:
|
|
return
|
|
# struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
|
|
winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
|
|
try:
|
|
fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
|
|
except OSError:
|
|
pass
|
|
|
|
# -- teardown ---------------------------------------------------------
|
|
|
|
def close(self) -> None:
|
|
"""Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
|
|
|
|
Idempotent. Reaping the child is important so we don't leak
|
|
zombies across the lifetime of the dashboard process.
|
|
"""
|
|
if self._closed:
|
|
return
|
|
self._closed = True
|
|
|
|
# SIGHUP is the conventional "your terminal went away" signal.
|
|
# We escalate if the child ignores it.
|
|
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
|
|
if not self._proc.isalive():
|
|
break
|
|
try:
|
|
self._proc.kill(sig)
|
|
except Exception:
|
|
pass
|
|
deadline = time.monotonic() + 0.5
|
|
while self._proc.isalive() and time.monotonic() < deadline:
|
|
time.sleep(0.02)
|
|
|
|
try:
|
|
self._proc.close(force=True)
|
|
except Exception:
|
|
pass
|
|
|
|
# Context-manager sugar — handy in tests and ad-hoc scripts.
|
|
def __enter__(self) -> "PtyBridge":
|
|
return self
|
|
|
|
def __exit__(self, *_exc) -> None:
|
|
self.close()
|