hermes-agent/agent
Teknium 3dfb357001 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
..
transports feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path 2026-05-05 13:40:01 -07:00
__init__.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
account_usage.py feat(account-usage): add per-provider account limits module 2026-04-21 01:56:35 -07:00
anthropic_adapter.py fix: avoid unsupported anthropic context beta by default 2026-05-07 05:43:20 -07:00
auxiliary_client.py fix(auxiliary): enforce Codex Responses stream timeout 2026-05-07 06:21:50 -07:00
bedrock_adapter.py fix(bedrock): preserve reasoningContent across converse normalization 2026-05-07 05:17:16 -07:00
codex_responses_adapter.py fix(agent): preserve Codex message items for replay 2026-04-25 18:22:06 -07:00
context_compressor.py fix(compressor): soften summary prompt for content filters 2026-05-07 06:42:32 -07:00
context_engine.py fix(compress): don't reach into ContextCompressor privates from /compress (#15039) 2026-04-24 02:55:43 -07:00
context_references.py fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
copilot_acp_client.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 12:57:33 -07:00
credential_pool.py fix(auth): shorten credential 401 cooldown 2026-05-07 06:15:33 -07:00
credential_sources.py feat(minimax-oauth): full integration with peer OAuth providers 2026-04-29 09:53:42 -07:00
curator.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-07 19:24:45 -07:00
curator_backup.py fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731) 2026-05-02 01:29:57 -07:00
display.py fix(cli): honor positive tool preview length 2026-05-07 05:26:28 -07:00
error_classifier.py fix(tool-schemas): reactive strip of pattern/format on llama.cpp grammar 400s 2026-05-05 04:25:18 -07:00
file_safety.py fix(security): apply file safety to copilot acp fs 2026-04-21 01:31:58 -07:00
gemini_cloudcode_adapter.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
gemini_native_adapter.py fix(gemini): extract usageMetadata from streaming chunks for token tracking 2026-05-04 02:33:30 -07:00
gemini_schema.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
google_code_assist.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
google_oauth.py fix(google_oauth): close TOCTOU window when saving credentials 2026-05-04 03:16:19 -07:00
i18n.py feat(i18n): add Turkish (tr) locale 2026-05-05 17:29:12 -07:00
image_gen_provider.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
image_gen_registry.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
image_routing.py fix(image-routing): sniff magic bytes for image MIME, ignore misleading suffix 2026-05-07 05:58:11 -07:00
insights.py Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
lmstudio_reasoning.py feat(agent): add lmstudio integration 2026-04-28 12:27:36 -07:00
manual_compression_feedback.py fix(compression): include system prompt + tool schemas in token estimates (#18265) 2026-04-30 23:03:54 -07:00
memory_manager.py fix: salvage batch — compaction guidance, memory authority, cache eviction after compression 2026-05-05 22:33:45 -07:00
memory_provider.py docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings 2026-05-05 13:33:49 -07:00
model_metadata.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-07 19:24:45 -07:00
models_dev.py fix(models): prefer image modalities for vision routing 2026-05-07 05:54:12 -07:00
moonshot_schema.py fix(moonshot): also strip nullable/enum after anyOf collapse 2026-04-30 23:14:31 -07:00
nous_rate_guard.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-07 19:24:45 -07:00
onboarding.py docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507) 2026-04-29 08:08:36 -07:00
prompt_builder.py feat: enrich system-prompt environment hints with host + terminal-backend info 2026-05-08 05:07:40 -07:00
prompt_caching.py fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter 2026-03-21 16:54:43 -07:00
rate_limit_tracker.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
redact.py feat(security): enable secret redaction by default (#17691, #20785) (#21193) 2026-05-07 05:10:33 -07:00
retry_utils.py feat(agent): add jittered retry backoff 2026-04-08 00:41:36 -07:00
shell_hooks.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-07 19:24:45 -07:00
skill_commands.py fix(skills): rescan skill_commands cache when platform scope changes (#18739) 2026-05-02 01:36:53 -07:00
skill_preprocessing.py fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
skill_utils.py fix(skills): exclude .archive from skill index walk 2026-04-30 04:59:22 -07:00
subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
think_scrubber.py fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184) 2026-05-05 04:33:38 -07:00
title_generator.py fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
tool_guardrails.py fix(guardrails): preserve display _detect_tool_failure semantics 2026-04-30 20:43:15 -07:00
trajectory.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
usage_pricing.py fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455) 2026-05-07 13:24:31 -07:00