mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-13 03:52:00 +00:00
fix(windows): gateway status dedup + install.ps1 platform-SDK bootstrap
## Two residual Windows fixes that were hanging from earlier commits. ### 1. `hermes gateway status` reported 2 PIDs per gateway — TWO bugs compounded Diagnosed with psutil parent/child walk against live gateway PIDs: **Bug A (the real one): `_get_parent_pid` silently failed on Windows.** The helper shelled out to `ps -o ppid= -p <pid>`, which doesn't exist on Windows — `FileNotFoundError` → returns `None` → the ancestor walk terminated at `os.getpid()` alone. Consequence: the PID table scan in `_scan_gateway_pids` couldn't filter out `hermes gateway status`'s own launcher stub (a venv `pythonw.exe`/`python.exe` that matches the same `-m hermes_cli.main gateway` pattern as the gateway). Every status call saw "itself" as a second gateway. Fix: `_get_parent_pid` now calls `psutil.Process(pid).ppid()` first (psutil is a core dependency since3dfb35700) and falls back to `ps` only when `shutil.which("ps")` succeeds — matching the Windows-footgun checker's "always guard `ps` / `wmic` / etc. with `shutil.which`" rule. Before: `Gateway process running (PID: 21952, 46880)` — 46880 changing on every call (the status invocation's own launcher, which died by the time the next status call looked). After (5 consecutive calls): ``` ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ``` Ancestor walk on the fix: 14 PIDs (full chain through bash/explorer) instead of the broken 1-PID set. **Bug B (the cosmetic one): venv-launcher dedup.** Standard Windows CPython venv behaviour is that `<venv>/Scripts/pythonw.exe` is a ~5 MB launcher stub that spawns the base Python (`C:\\Program Files\\Python311 \\pythonw.exe`) with the same command line and waits. Our process scanner sees two PIDs for every gateway: launcher + interpreter, same cmdline. Bug A masked this by accidentally counting the status call AS one of them; with Bug A fixed, we see both the real launcher and real interpreter for the gateway process itself. Fix: `_filter_venv_launcher_stubs` at the tail of `_scan_gateway_pids` walks each matched PID's ppid via psutil. Any PID that's the PARENT of another matched PID is a launcher stub — drop it, keep the child. Scoped to Windows (`is_windows() and len(pids) > 1`) and no-ops when psutil isn't importable. Net effect: `gateway status` now reports one PID per gateway — the interpreter — matching POSIX behaviour and user expectations. ### 2. `install.ps1`: bootstrap pip + auto-install platform SDKs New `Install-PlatformSdks` function wired between `Invoke-SetupWizard` and `Start-GatewayIfConfigured`. Fixes two related issues on fresh Windows installs: 1. The tiered `uv pip install` cascade (introduced in87fca8342) correctly falls through when tier 1 `.[all]` fails on the RL git deps, but the fallback tiers can silently skip SDKs from `[messaging]` when there's a partial-resolve. Result: user sets `DISCORD_BOT_TOKEN` in `.env`, fires up gateway, hits "discord module not installed". 2. `uv` creates venvs WITHOUT pip by default, so the user's escape hatch (`pip install discord.py` in the venv) doesn't exist either. The new function: - Skips if `-NoVenv` (nothing to bootstrap into). - Scans `~/.hermes/.env` for messaging tokens (TELEGRAM_BOT_TOKEN, DISCORD_BOT_TOKEN, SLACK_BOT_TOKEN, SLACK_APP_TOKEN, WHATSAPP_ENABLED), filtering placeholder values. - For each token that's set, runs `python -c "import <sdk>"` to verify. - If any import fails: runs `python -m ensurepip --upgrade` to bootstrap pip into the venv (idempotent — no-ops if pip is already present), then `pip install <spec>` for each missing SDK with specs mirroring pyproject.toml's `[messaging]` extra to avoid version drift. The `$ErrorActionPreference = "SilentlyContinue"` spans are not cosmetic — PowerShell wraps native-stderr from a non-zero-exit subprocess as a `NativeCommandError` that prints even through `*> $null` / `2>$null`. Save + restore EAP over the import-probe and pip-install blocks keeps the output clean. Verified on this Windows 10 box: - Initial state: telegram+fastapi+psutil present, discord+slack_sdk missing (tier 1 `.[all]` had failed — `.tirith-install-failed` marker in `%LOCALAPPDATA%\\hermes`). - First run with discord+slack tokens in .env: detects both missing, ensurepip (skipped — pip was already bootstrapped earlier this session for telegram), installs `discord.py[voice]==2.7.1` + `PyNaCl` + `davey`, installs `slack-sdk==3.41.0`. All imports succeed on verify. - Second run: all three SDKs report OK, function no-ops. Pip spec strings mirror pyproject.toml's `[messaging]` extra verbatim so a bump to the extra picks up here automatically — no drift. ### Files - `hermes_cli/gateway.py`: `_get_parent_pid` rewritten (psutil-first); `_filter_venv_launcher_stubs` added; `_scan_gateway_pids` dedups launchers on Windows when it finds >1 match. - `scripts/install.ps1`: new `Install-PlatformSdks` function (~85 lines); wired into the main flow at line 1438. ### Verification - `venv/Scripts/python.exe scripts/check-windows-footguns.py --all` → `✓ No Windows footguns found (380 file(s) scanned).` - `ast.parse` passes on gateway.py. - `[System.Management.Automation.Language.Parser]::ParseFile` passes on install.ps1. - Live gateway (PID 21952, running since 12:33 today) survived 5x stress loop of `hermes gateway status` without dying.
This commit is contained in:
parent
cc38282b04
commit
0548facc50
2 changed files with 172 additions and 1 deletions
|
|
@ -131,9 +131,26 @@ def _get_service_pids() -> set:
|
|||
|
||||
|
||||
def _get_parent_pid(pid: int) -> int | None:
|
||||
"""Return the parent PID for ``pid``, or ``None`` when unavailable."""
|
||||
"""Return the parent PID for ``pid``, or ``None`` when unavailable.
|
||||
|
||||
Uses psutil (core dependency) which works on every platform. The
|
||||
older implementation shelled out to ``ps -o ppid= -p <pid>``, which
|
||||
silently fails on Windows (no ``ps``) so the ancestor walk terminated
|
||||
at self — the caller's dedup / exclude logic then couldn't distinguish
|
||||
"hermes CLI that invoked this scan" from "real gateway process".
|
||||
"""
|
||||
if pid <= 1:
|
||||
return None
|
||||
try:
|
||||
import psutil # type: ignore
|
||||
return psutil.Process(pid).ppid() or None
|
||||
except ImportError:
|
||||
pass
|
||||
except Exception:
|
||||
return None
|
||||
# Fallback: shell out to ps (POSIX only — bare ``ps`` doesn't exist on Windows).
|
||||
if not shutil.which("ps"):
|
||||
return None
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["ps", "-o", "ppid=", "-p", str(pid)],
|
||||
|
|
@ -416,9 +433,53 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
|
|||
except (OSError, subprocess.TimeoutExpired):
|
||||
return []
|
||||
|
||||
# Windows-specific: collapse venv launcher stubs. A venv-built
|
||||
# ``pythonw.exe`` in ``<venv>/Scripts/`` is a ~100 KB launcher exe
|
||||
# that spawns the base Python (e.g. ``C:\Program Files\Python311\
|
||||
# pythonw.exe``) with the same command line, preserving the venv's
|
||||
# ``pyvenv.cfg`` context. This is standard Windows CPython venv
|
||||
# behaviour — BUT it means every gateway run produces two pythonw
|
||||
# PIDs with identical command lines (one launcher stub, one actual
|
||||
# interpreter) which is confusing in ``gateway status`` output.
|
||||
# Filter the stub: if a PID in our result is the PARENT of another
|
||||
# PID in our result, and both are pythonw.exe, the parent is the
|
||||
# launcher stub — drop it, keep the child.
|
||||
if is_windows() and len(pids) > 1:
|
||||
pids = _filter_venv_launcher_stubs(pids)
|
||||
|
||||
return pids
|
||||
|
||||
|
||||
def _filter_venv_launcher_stubs(pids: list[int]) -> list[int]:
|
||||
"""Drop venv-launcher ``pythonw.exe`` stubs that are parents of the real
|
||||
interpreter process. See comment at the tail of ``_scan_gateway_pids``.
|
||||
|
||||
Uses ``psutil`` (core dependency). Safe on any platform; only invoked
|
||||
on Windows by the caller because the stub pattern is Windows-specific.
|
||||
"""
|
||||
try:
|
||||
import psutil # type: ignore
|
||||
except ImportError:
|
||||
return pids
|
||||
|
||||
pid_set = set(pids)
|
||||
# Collect each PID's parent so we can flag "child of another matched PID".
|
||||
parent_of: dict[int, int | None] = {}
|
||||
for pid in pids:
|
||||
try:
|
||||
parent_of[pid] = psutil.Process(pid).ppid()
|
||||
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||
parent_of[pid] = None
|
||||
|
||||
# For each child whose parent is also in our set, drop the parent.
|
||||
drop: set[int] = set()
|
||||
for pid, ppid in parent_of.items():
|
||||
if ppid is not None and ppid in pid_set:
|
||||
drop.add(ppid)
|
||||
|
||||
return [p for p in pids if p not in drop]
|
||||
|
||||
|
||||
def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = False) -> list:
|
||||
"""Find PIDs of running gateway processes.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue