mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-12 03:42:08 +00:00
feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why
Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:
- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.
This commit does three things:
1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.
## What changed
### 1. psutil as a core dependency (pyproject.toml)
Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.
### 2. `gateway/status.py::_pid_exists`
Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.
### 3. `os.killpg` migration to psutil (7 callsites, 5 files)
- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
`# windows-footgun: ok` — the pgid semantics psutil can't replicate,
and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`
### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)
Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:
1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode
Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
`shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds
### 5. CI integration
A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:
```yaml
name: Windows footgun check
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: "3.11"}
- run: python scripts/check-windows-footguns.py --all
```
### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion
Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.
### 7. Baseline cleanup (91 → 0 findings)
- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
`encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
tool subprocess management → annotated with
`# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)
## Verification
```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).
$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```
Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
This commit is contained in:
parent
324567c936
commit
cc38282b04
28 changed files with 943 additions and 85 deletions
|
|
@ -893,7 +893,7 @@ def _file_lock(
|
|||
if msvcrt and (not lock_path.exists() or lock_path.stat().st_size == 0):
|
||||
lock_path.write_text(" ", encoding="utf-8")
|
||||
|
||||
with lock_path.open("r+" if msvcrt else "a+") as lock_file:
|
||||
with lock_path.open("r+" if msvcrt else "a+", encoding="utf-8") as lock_file:
|
||||
deadline = time.monotonic() + max(1.0, timeout_seconds)
|
||||
while True:
|
||||
try:
|
||||
|
|
|
|||
|
|
@ -4160,8 +4160,9 @@ def load_env() -> Dict[str, str]:
|
|||
|
||||
if env_path.exists():
|
||||
# On Windows, open() defaults to the system locale (cp1252) which can
|
||||
# fail on UTF-8 .env files. Use explicit UTF-8 only on Windows.
|
||||
open_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
|
||||
# fail on UTF-8 .env files. Always use explicit UTF-8; tolerate BOM
|
||||
# via utf-8-sig since users may edit .env in Notepad which adds one.
|
||||
open_kw = {"encoding": "utf-8-sig", "errors": "replace"}
|
||||
with open(env_path, **open_kw) as f:
|
||||
raw_lines = f.readlines()
|
||||
# Sanitize before parsing: split concatenated lines & drop stale
|
||||
|
|
@ -4246,8 +4247,8 @@ def sanitize_env_file() -> int:
|
|||
if not env_path.exists():
|
||||
return 0
|
||||
|
||||
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
|
||||
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
|
||||
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
|
||||
write_kw = {"encoding": "utf-8"}
|
||||
|
||||
with open(env_path, **read_kw) as f:
|
||||
original_lines = f.readlines()
|
||||
|
|
@ -4336,8 +4337,8 @@ def save_env_value(key: str, value: str):
|
|||
|
||||
# On Windows, open() defaults to the system locale (cp1252) which can
|
||||
# cause OSError errno 22 on UTF-8 .env files.
|
||||
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
|
||||
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
|
||||
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
|
||||
write_kw = {"encoding": "utf-8"}
|
||||
|
||||
lines = []
|
||||
if env_path.exists():
|
||||
|
|
@ -4406,8 +4407,8 @@ def remove_env_value(key: str) -> bool:
|
|||
os.environ.pop(key, None)
|
||||
return False
|
||||
|
||||
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
|
||||
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
|
||||
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
|
||||
write_kw = {"encoding": "utf-8"}
|
||||
|
||||
with open(env_path, **read_kw) as f:
|
||||
lines = f.readlines()
|
||||
|
|
|
|||
|
|
@ -113,7 +113,7 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
|
|||
except ImportError:
|
||||
return # early bootstrap — config module not available yet
|
||||
|
||||
read_kw = {"encoding": "utf-8", "errors": "replace"}
|
||||
read_kw = {"encoding": "utf-8-sig", "errors": "replace"}
|
||||
try:
|
||||
with open(path, **read_kw) as f:
|
||||
original = f.readlines()
|
||||
|
|
|
|||
|
|
@ -177,7 +177,7 @@ def _request_gateway_self_restart(pid: int) -> bool:
|
|||
if not _is_pid_ancestor_of_current_process(pid):
|
||||
return False
|
||||
try:
|
||||
os.kill(pid, signal.SIGUSR1)
|
||||
os.kill(pid, signal.SIGUSR1) # windows-footgun: ok — POSIX signal, guarded by hasattr(signal, 'SIGUSR1') above
|
||||
except (ProcessLookupError, PermissionError, OSError):
|
||||
return False
|
||||
return True
|
||||
|
|
@ -213,7 +213,7 @@ def _graceful_restart_via_sigusr1(pid: int, drain_timeout: float) -> bool:
|
|||
if pid <= 0:
|
||||
return False
|
||||
try:
|
||||
os.kill(pid, signal.SIGUSR1)
|
||||
os.kill(pid, signal.SIGUSR1) # windows-footgun: ok — POSIX signal, guarded by hasattr(signal, 'SIGUSR1') above
|
||||
except ProcessLookupError:
|
||||
# Already gone — nothing to drain.
|
||||
return True
|
||||
|
|
@ -1196,13 +1196,13 @@ class SystemScopeRequiresRootError(RuntimeError):
|
|||
|
||||
def _user_dbus_socket_path() -> Path:
|
||||
"""Return the expected per-user D-Bus socket path (regardless of existence)."""
|
||||
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
|
||||
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
|
||||
return Path(xdg) / "bus"
|
||||
|
||||
|
||||
def _user_systemd_private_socket_path() -> Path:
|
||||
"""Return the per-user systemd private socket path (regardless of existence)."""
|
||||
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}"
|
||||
xdg = os.environ.get("XDG_RUNTIME_DIR") or f"/run/user/{os.getuid()}" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
|
||||
return Path(xdg) / "systemd" / "private"
|
||||
|
||||
|
||||
|
|
@ -1225,7 +1225,7 @@ def _ensure_user_systemd_env() -> None:
|
|||
We detect the standard socket path and set the vars so all subsequent
|
||||
subprocess calls inherit them.
|
||||
"""
|
||||
uid = os.getuid()
|
||||
uid = os.getuid() # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
|
||||
if "XDG_RUNTIME_DIR" not in os.environ:
|
||||
runtime_dir = f"/run/user/{uid}"
|
||||
if Path(runtime_dir).exists():
|
||||
|
|
@ -1291,7 +1291,7 @@ def _preflight_user_systemd(*, auto_enable_linger: bool = True) -> None:
|
|||
username,
|
||||
reason="User systemd control sockets are missing even though linger is enabled.",
|
||||
fix_hint=(
|
||||
f" systemctl start user@{os.getuid()}.service\n"
|
||||
f" systemctl start user@{os.getuid()}.service\n" # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
|
||||
" (may require sudo; try again after the command succeeds)"
|
||||
),
|
||||
)
|
||||
|
|
@ -1561,7 +1561,7 @@ def remove_legacy_hermes_units(
|
|||
|
||||
# System-scope removal (needs root)
|
||||
if system_units:
|
||||
if os.geteuid() != 0:
|
||||
if os.geteuid() != 0: # windows-footgun: ok — Linux systemd removal path, guarded by `if system == "Linux"` / systemd-only branch
|
||||
print()
|
||||
print_warning("System-scope legacy units require root to remove.")
|
||||
print_info(" Re-run with: sudo hermes gateway migrate-legacy")
|
||||
|
|
@ -1608,7 +1608,7 @@ def print_systemd_scope_conflict_warning() -> None:
|
|||
|
||||
|
||||
def _require_root_for_system_service(action: str) -> None:
|
||||
if os.geteuid() != 0:
|
||||
if os.geteuid() != 0: # windows-footgun: ok — POSIX systemd helper, never invoked on Windows
|
||||
raise SystemScopeRequiresRootError(
|
||||
f"System gateway {action} requires root. Re-run with sudo.",
|
||||
action,
|
||||
|
|
@ -1676,7 +1676,7 @@ def install_linux_gateway_from_setup(force: bool = False) -> tuple[str | None, b
|
|||
|
||||
if scope == "system":
|
||||
run_as_user = _default_system_service_user()
|
||||
if os.geteuid() != 0:
|
||||
if os.geteuid() != 0: # windows-footgun: ok — Linux systemd install wizard, never invoked on Windows
|
||||
print_warning(" System service install requires sudo, so Hermes can't create it from this user session.")
|
||||
if run_as_user:
|
||||
print_info(f" After setup, run: sudo hermes gateway install --system --run-as-user {run_as_user}")
|
||||
|
|
@ -1720,7 +1720,7 @@ def get_systemd_linger_status() -> tuple[bool | None, str]:
|
|||
if not username:
|
||||
try:
|
||||
import pwd
|
||||
username = pwd.getpwuid(os.getuid()).pw_name
|
||||
username = pwd.getpwuid(os.getuid()).pw_name # windows-footgun: ok — POSIX loginctl helper, never invoked on Windows
|
||||
except Exception:
|
||||
return None, "could not determine current user"
|
||||
|
||||
|
|
@ -1770,7 +1770,7 @@ def _launchd_user_home() -> Path:
|
|||
"""
|
||||
import pwd
|
||||
|
||||
return Path(pwd.getpwuid(os.getuid()).pw_dir)
|
||||
return Path(pwd.getpwuid(os.getuid()).pw_dir) # windows-footgun: ok — POSIX launchd (macOS) helper, never invoked on Windows
|
||||
|
||||
|
||||
def get_launchd_plist_path() -> Path:
|
||||
|
|
@ -2169,7 +2169,7 @@ def _system_scope_wizard_would_need_root(system: bool = False) -> bool:
|
|||
``SystemScopeRequiresRootError`` propagate out and leave the user
|
||||
staring at a bare shell.
|
||||
"""
|
||||
if os.geteuid() == 0:
|
||||
if os.geteuid() == 0: # windows-footgun: ok — systemd scope wizard decision, never invoked on Windows
|
||||
return False
|
||||
return _select_systemd_scope(system=system)
|
||||
|
||||
|
|
@ -2520,7 +2520,7 @@ def get_launchd_label() -> str:
|
|||
|
||||
|
||||
def _launchd_domain() -> str:
|
||||
return f"gui/{os.getuid()}"
|
||||
return f"gui/{os.getuid()}" # windows-footgun: ok — POSIX launchd (macOS) helper, never invoked on Windows
|
||||
|
||||
|
||||
def generate_launchd_plist() -> str:
|
||||
|
|
|
|||
|
|
@ -6838,7 +6838,7 @@ def _ensure_fhs_path_guard() -> None:
|
|||
if sys.platform != "linux":
|
||||
return
|
||||
try:
|
||||
if os.geteuid() != 0:
|
||||
if os.geteuid() != 0: # windows-footgun: ok — Linux FHS helper, guarded by sys.platform == "linux" above + AttributeError catch
|
||||
return
|
||||
except AttributeError:
|
||||
return
|
||||
|
|
|
|||
|
|
@ -213,7 +213,7 @@ class PtyBridge:
|
|||
|
||||
# SIGHUP is the conventional "your terminal went away" signal.
|
||||
# We escalate if the child ignores it.
|
||||
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
|
||||
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL): # windows-footgun: ok — POSIX-only module (imports fcntl/termios/ptyprocess at top)
|
||||
if not self._proc.isalive():
|
||||
break
|
||||
try:
|
||||
|
|
|
|||
|
|
@ -167,7 +167,7 @@ def uninstall_gateway_service():
|
|||
|
||||
scope = "system" if is_system else "user"
|
||||
try:
|
||||
if is_system and os.geteuid() != 0:
|
||||
if is_system and os.geteuid() != 0: # windows-footgun: ok — Linux systemd uninstall path, guarded by `if system == "Linux"` above
|
||||
log_warn(f"System gateway service exists at {unit_path} "
|
||||
f"but needs sudo to remove")
|
||||
continue
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue