mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-10 03:22:05 +00:00
fix(windows): gateway status dedup + install.ps1 platform-SDK bootstrap
## Two residual Windows fixes that were hanging from earlier commits. ### 1. `hermes gateway status` reported 2 PIDs per gateway — TWO bugs compounded Diagnosed with psutil parent/child walk against live gateway PIDs: **Bug A (the real one): `_get_parent_pid` silently failed on Windows.** The helper shelled out to `ps -o ppid= -p <pid>`, which doesn't exist on Windows — `FileNotFoundError` → returns `None` → the ancestor walk terminated at `os.getpid()` alone. Consequence: the PID table scan in `_scan_gateway_pids` couldn't filter out `hermes gateway status`'s own launcher stub (a venv `pythonw.exe`/`python.exe` that matches the same `-m hermes_cli.main gateway` pattern as the gateway). Every status call saw "itself" as a second gateway. Fix: `_get_parent_pid` now calls `psutil.Process(pid).ppid()` first (psutil is a core dependency since3dfb35700) and falls back to `ps` only when `shutil.which("ps")` succeeds — matching the Windows-footgun checker's "always guard `ps` / `wmic` / etc. with `shutil.which`" rule. Before: `Gateway process running (PID: 21952, 46880)` — 46880 changing on every call (the status invocation's own launcher, which died by the time the next status call looked). After (5 consecutive calls): ``` ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ✓ Gateway process running (PID: 21952) ``` Ancestor walk on the fix: 14 PIDs (full chain through bash/explorer) instead of the broken 1-PID set. **Bug B (the cosmetic one): venv-launcher dedup.** Standard Windows CPython venv behaviour is that `<venv>/Scripts/pythonw.exe` is a ~5 MB launcher stub that spawns the base Python (`C:\\Program Files\\Python311 \\pythonw.exe`) with the same command line and waits. Our process scanner sees two PIDs for every gateway: launcher + interpreter, same cmdline. Bug A masked this by accidentally counting the status call AS one of them; with Bug A fixed, we see both the real launcher and real interpreter for the gateway process itself. Fix: `_filter_venv_launcher_stubs` at the tail of `_scan_gateway_pids` walks each matched PID's ppid via psutil. Any PID that's the PARENT of another matched PID is a launcher stub — drop it, keep the child. Scoped to Windows (`is_windows() and len(pids) > 1`) and no-ops when psutil isn't importable. Net effect: `gateway status` now reports one PID per gateway — the interpreter — matching POSIX behaviour and user expectations. ### 2. `install.ps1`: bootstrap pip + auto-install platform SDKs New `Install-PlatformSdks` function wired between `Invoke-SetupWizard` and `Start-GatewayIfConfigured`. Fixes two related issues on fresh Windows installs: 1. The tiered `uv pip install` cascade (introduced in87fca8342) correctly falls through when tier 1 `.[all]` fails on the RL git deps, but the fallback tiers can silently skip SDKs from `[messaging]` when there's a partial-resolve. Result: user sets `DISCORD_BOT_TOKEN` in `.env`, fires up gateway, hits "discord module not installed". 2. `uv` creates venvs WITHOUT pip by default, so the user's escape hatch (`pip install discord.py` in the venv) doesn't exist either. The new function: - Skips if `-NoVenv` (nothing to bootstrap into). - Scans `~/.hermes/.env` for messaging tokens (TELEGRAM_BOT_TOKEN, DISCORD_BOT_TOKEN, SLACK_BOT_TOKEN, SLACK_APP_TOKEN, WHATSAPP_ENABLED), filtering placeholder values. - For each token that's set, runs `python -c "import <sdk>"` to verify. - If any import fails: runs `python -m ensurepip --upgrade` to bootstrap pip into the venv (idempotent — no-ops if pip is already present), then `pip install <spec>` for each missing SDK with specs mirroring pyproject.toml's `[messaging]` extra to avoid version drift. The `$ErrorActionPreference = "SilentlyContinue"` spans are not cosmetic — PowerShell wraps native-stderr from a non-zero-exit subprocess as a `NativeCommandError` that prints even through `*> $null` / `2>$null`. Save + restore EAP over the import-probe and pip-install blocks keeps the output clean. Verified on this Windows 10 box: - Initial state: telegram+fastapi+psutil present, discord+slack_sdk missing (tier 1 `.[all]` had failed — `.tirith-install-failed` marker in `%LOCALAPPDATA%\\hermes`). - First run with discord+slack tokens in .env: detects both missing, ensurepip (skipped — pip was already bootstrapped earlier this session for telegram), installs `discord.py[voice]==2.7.1` + `PyNaCl` + `davey`, installs `slack-sdk==3.41.0`. All imports succeed on verify. - Second run: all three SDKs report OK, function no-ops. Pip spec strings mirror pyproject.toml's `[messaging]` extra verbatim so a bump to the extra picks up here automatically — no drift. ### Files - `hermes_cli/gateway.py`: `_get_parent_pid` rewritten (psutil-first); `_filter_venv_launcher_stubs` added; `_scan_gateway_pids` dedups launchers on Windows when it finds >1 match. - `scripts/install.ps1`: new `Install-PlatformSdks` function (~85 lines); wired into the main flow at line 1438. ### Verification - `venv/Scripts/python.exe scripts/check-windows-footguns.py --all` → `✓ No Windows footguns found (380 file(s) scanned).` - `ast.parse` passes on gateway.py. - `[System.Management.Automation.Language.Parser]::ParseFile` passes on install.ps1. - Live gateway (PID 21952, running since 12:33 today) survived 5x stress loop of `hermes gateway status` without dying.
This commit is contained in:
parent
cc38282b04
commit
0548facc50
2 changed files with 172 additions and 1 deletions
|
|
@ -131,9 +131,26 @@ def _get_service_pids() -> set:
|
|||
|
||||
|
||||
def _get_parent_pid(pid: int) -> int | None:
|
||||
"""Return the parent PID for ``pid``, or ``None`` when unavailable."""
|
||||
"""Return the parent PID for ``pid``, or ``None`` when unavailable.
|
||||
|
||||
Uses psutil (core dependency) which works on every platform. The
|
||||
older implementation shelled out to ``ps -o ppid= -p <pid>``, which
|
||||
silently fails on Windows (no ``ps``) so the ancestor walk terminated
|
||||
at self — the caller's dedup / exclude logic then couldn't distinguish
|
||||
"hermes CLI that invoked this scan" from "real gateway process".
|
||||
"""
|
||||
if pid <= 1:
|
||||
return None
|
||||
try:
|
||||
import psutil # type: ignore
|
||||
return psutil.Process(pid).ppid() or None
|
||||
except ImportError:
|
||||
pass
|
||||
except Exception:
|
||||
return None
|
||||
# Fallback: shell out to ps (POSIX only — bare ``ps`` doesn't exist on Windows).
|
||||
if not shutil.which("ps"):
|
||||
return None
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["ps", "-o", "ppid=", "-p", str(pid)],
|
||||
|
|
@ -416,9 +433,53 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
|
|||
except (OSError, subprocess.TimeoutExpired):
|
||||
return []
|
||||
|
||||
# Windows-specific: collapse venv launcher stubs. A venv-built
|
||||
# ``pythonw.exe`` in ``<venv>/Scripts/`` is a ~100 KB launcher exe
|
||||
# that spawns the base Python (e.g. ``C:\Program Files\Python311\
|
||||
# pythonw.exe``) with the same command line, preserving the venv's
|
||||
# ``pyvenv.cfg`` context. This is standard Windows CPython venv
|
||||
# behaviour — BUT it means every gateway run produces two pythonw
|
||||
# PIDs with identical command lines (one launcher stub, one actual
|
||||
# interpreter) which is confusing in ``gateway status`` output.
|
||||
# Filter the stub: if a PID in our result is the PARENT of another
|
||||
# PID in our result, and both are pythonw.exe, the parent is the
|
||||
# launcher stub — drop it, keep the child.
|
||||
if is_windows() and len(pids) > 1:
|
||||
pids = _filter_venv_launcher_stubs(pids)
|
||||
|
||||
return pids
|
||||
|
||||
|
||||
def _filter_venv_launcher_stubs(pids: list[int]) -> list[int]:
|
||||
"""Drop venv-launcher ``pythonw.exe`` stubs that are parents of the real
|
||||
interpreter process. See comment at the tail of ``_scan_gateway_pids``.
|
||||
|
||||
Uses ``psutil`` (core dependency). Safe on any platform; only invoked
|
||||
on Windows by the caller because the stub pattern is Windows-specific.
|
||||
"""
|
||||
try:
|
||||
import psutil # type: ignore
|
||||
except ImportError:
|
||||
return pids
|
||||
|
||||
pid_set = set(pids)
|
||||
# Collect each PID's parent so we can flag "child of another matched PID".
|
||||
parent_of: dict[int, int | None] = {}
|
||||
for pid in pids:
|
||||
try:
|
||||
parent_of[pid] = psutil.Process(pid).ppid()
|
||||
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||
parent_of[pid] = None
|
||||
|
||||
# For each child whose parent is also in our set, drop the parent.
|
||||
drop: set[int] = set()
|
||||
for pid, ppid in parent_of.items():
|
||||
if ppid is not None and ppid in pid_set:
|
||||
drop.add(ppid)
|
||||
|
||||
return [p for p in pids if p not in drop]
|
||||
|
||||
|
||||
def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = False) -> list:
|
||||
"""Find PIDs of running gateway processes.
|
||||
|
||||
|
|
|
|||
|
|
@ -1161,6 +1161,115 @@ function Install-NodeDeps {
|
|||
}
|
||||
}
|
||||
|
||||
function Install-PlatformSdks {
|
||||
# Ensure messaging-platform SDKs matching tokens the user added to
|
||||
# ~/.hermes/.env are importable. Two problems this solves:
|
||||
#
|
||||
# 1. The tiered `uv pip install` cascade above can fall through to a
|
||||
# lower tier when the first fails (common when RL git deps choke),
|
||||
# which silently skips some messaging SDKs from [messaging].
|
||||
# 2. `uv` creates the venv without pip. If a messaging SDK ends up
|
||||
# missing, the user can't `pip install python-telegram-bot` to
|
||||
# recover — pip simply isn't in their venv.
|
||||
#
|
||||
# Strategy: bootstrap pip via `python -m ensurepip` (idempotent), then
|
||||
# for each token set in .env, verify the matching SDK imports. If not,
|
||||
# run one targeted `pip install` as last-chance recovery. Keeps fresh
|
||||
# Windows installs from hitting silent "python-telegram-bot not installed"
|
||||
# at runtime.
|
||||
if ($NoVenv) {
|
||||
Write-Info "Skipping platform-SDK verification (-NoVenv: no venv to bootstrap)"
|
||||
return
|
||||
}
|
||||
|
||||
$pythonExe = "$InstallDir\venv\Scripts\python.exe"
|
||||
if (-not (Test-Path $pythonExe)) {
|
||||
Write-Warn "Skipping platform-SDK verification: $pythonExe not found"
|
||||
return
|
||||
}
|
||||
|
||||
$envPath = "$HermesHome\.env"
|
||||
if (-not (Test-Path $envPath)) { return }
|
||||
$envLines = Get-Content $envPath -ErrorAction SilentlyContinue
|
||||
|
||||
# Map: env var set in .env -> (import name, pip spec matching [messaging] extra).
|
||||
# Specs mirror pyproject.toml to avoid version drift.
|
||||
$sdkMap = @(
|
||||
@{ Var = "TELEGRAM_BOT_TOKEN"; Import = "telegram"; Spec = "python-telegram-bot[webhooks]>=22.6,<23" },
|
||||
@{ Var = "DISCORD_BOT_TOKEN"; Import = "discord"; Spec = "discord.py[voice]>=2.7.1,<3" },
|
||||
@{ Var = "SLACK_BOT_TOKEN"; Import = "slack_sdk"; Spec = "slack-sdk>=3.27.0,<4" },
|
||||
@{ Var = "SLACK_APP_TOKEN"; Import = "slack_bolt";Spec = "slack-bolt>=1.18.0,<2" },
|
||||
@{ Var = "WHATSAPP_ENABLED"; Import = "qrcode"; Spec = "qrcode>=7.0,<8" }
|
||||
)
|
||||
|
||||
# Which tokens are actually set (not placeholder)?
|
||||
$needed = @()
|
||||
foreach ($sdk in $sdkMap) {
|
||||
$match = $envLines | Where-Object {
|
||||
$_ -match ("^" + [regex]::Escape($sdk.Var) + "=.+") `
|
||||
-and $_ -notmatch "your-token-here" `
|
||||
-and $_ -notmatch "^\s*#"
|
||||
}
|
||||
if ($match) { $needed += $sdk }
|
||||
}
|
||||
if ($needed.Count -eq 0) { return }
|
||||
|
||||
Write-Host ""
|
||||
Write-Info "Verifying platform SDKs for tokens found in $envPath ..."
|
||||
|
||||
# Verify each SDK's import without triggering side-effect imports.
|
||||
# Quirk: PowerShell wraps non-zero-exit native stderr as a
|
||||
# NativeCommandError that prints even with `2>$null` / `*> $null`
|
||||
# unless we set $ErrorActionPreference to SilentlyContinue for the
|
||||
# span. Save + restore rather than nuking globally.
|
||||
$prevEAP = $ErrorActionPreference
|
||||
$ErrorActionPreference = "SilentlyContinue"
|
||||
try {
|
||||
$missing = @()
|
||||
foreach ($sdk in $needed) {
|
||||
& $pythonExe -c "import $($sdk.Import)" 2>&1 | Out-Null
|
||||
if ($LASTEXITCODE -ne 0) {
|
||||
$missing += $sdk
|
||||
Write-Warn " $($sdk.Import) NOT importable (needed for $($sdk.Var))"
|
||||
} else {
|
||||
Write-Success " $($sdk.Import) OK"
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
$ErrorActionPreference = $prevEAP
|
||||
}
|
||||
if ($missing.Count -eq 0) { return }
|
||||
|
||||
# Bootstrap pip into the venv if it isn't there. `uv` creates venvs
|
||||
# without pip; ensurepip is the stdlib-blessed way to add it.
|
||||
$prevEAP = $ErrorActionPreference
|
||||
$ErrorActionPreference = "SilentlyContinue"
|
||||
try {
|
||||
& $pythonExe -m pip --version 2>&1 | Out-Null
|
||||
if ($LASTEXITCODE -ne 0) {
|
||||
Write-Info "Bootstrapping pip into venv (uv doesn't ship pip)..."
|
||||
& $pythonExe -m ensurepip --upgrade 2>&1 | Out-Null
|
||||
if ($LASTEXITCODE -ne 0) {
|
||||
Write-Warn "ensurepip failed — can't auto-install missing SDKs."
|
||||
Write-Info "Manual recovery: $UvCmd pip install `"$($missing[0].Spec)`""
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
foreach ($sdk in $missing) {
|
||||
Write-Info " Installing $($sdk.Spec) ..."
|
||||
& $pythonExe -m pip install $sdk.Spec 2>&1 | ForEach-Object { Write-Host " $_" }
|
||||
if ($LASTEXITCODE -eq 0) {
|
||||
Write-Success " Installed $($sdk.Import)"
|
||||
} else {
|
||||
Write-Warn " Failed to install $($sdk.Spec). Recover manually: $pythonExe -m pip install `"$($sdk.Spec)`""
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
$ErrorActionPreference = $prevEAP
|
||||
}
|
||||
}
|
||||
|
||||
function Invoke-SetupWizard {
|
||||
if ($SkipSetup) {
|
||||
Write-Info "Skipping setup wizard (-SkipSetup)"
|
||||
|
|
@ -1343,6 +1452,7 @@ function Main {
|
|||
Set-PathVariable
|
||||
Copy-ConfigTemplates
|
||||
Invoke-SetupWizard
|
||||
Install-PlatformSdks
|
||||
Start-GatewayIfConfigured
|
||||
|
||||
Write-Completion
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue