feat(cross-platform): psutil for PID/process management + Windows footgun checker

## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
This commit is contained in:
Teknium 2026-05-08 12:57:33 -07:00
parent 324567c936
commit cc38282b04
28 changed files with 943 additions and 85 deletions

View file

@ -135,7 +135,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
if yaml_file.exists():
try:
import yaml
with open(yaml_file) as f:
with open(yaml_file, encoding="utf-8-sig") as f:
meta = yaml.safe_load(f) or {}
desc = meta.get("description", "")
except Exception:
@ -381,7 +381,7 @@ def discover_plugin_cli_commands() -> List[dict]:
if yaml_file.exists():
try:
import yaml
with open(yaml_file) as f:
with open(yaml_file, encoding="utf-8-sig") as f:
meta = yaml.safe_load(f) or {}
desc = meta.get("description", "")
if desc:

View file

@ -1215,7 +1215,7 @@ class HindsightMemoryProvider(MemoryProvider):
# would capture output from other threads.
import hindsight_embed.daemon_embed_manager as dem
from rich.console import Console
dem.console = Console(file=open(log_path, "a"), force_terminal=False)
dem.console = Console(file=open(log_path, "a", encoding="utf-8"), force_terminal=False)
client = self._get_client()
profile = self._config.get("profile", "hermes")
@ -1231,15 +1231,15 @@ class HindsightMemoryProvider(MemoryProvider):
if config_changed:
profile_env = _materialize_embedded_profile_env(self._config)
if client._manager.is_running(profile):
with open(log_path, "a") as f:
with open(log_path, "a", encoding="utf-8") as f:
f.write("\n=== Config changed, restarting daemon ===\n")
client._manager.stop(profile)
client._ensure_started()
with open(log_path, "a") as f:
with open(log_path, "a", encoding="utf-8") as f:
f.write("\n=== Daemon started successfully ===\n")
except Exception as e:
with open(log_path, "a") as f:
with open(log_path, "a", encoding="utf-8") as f:
f.write(f"\n=== Daemon startup failed: {e} ===\n")
traceback.print_exc(file=f)

View file

@ -101,7 +101,7 @@ def _load_plugin_config() -> dict:
return {}
try:
import yaml
with open(config_path) as f:
with open(config_path, encoding="utf-8-sig") as f:
all_config = yaml.safe_load(f) or {}
return cfg_get(all_config, "plugins", "hermes-memory-store", default={}) or {}
except Exception:
@ -136,11 +136,11 @@ class HolographicMemoryProvider(MemoryProvider):
import yaml
existing = {}
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8-sig") as f:
existing = yaml.safe_load(f) or {}
existing.setdefault("plugins", {})
existing["plugins"]["hermes-memory-store"] = values
with open(config_path, "w") as f:
with open(config_path, "w", encoding="utf-8") as f:
yaml.dump(existing, f, default_flow_style=False)
except Exception:
pass