mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
fix(gateway): consult lock record argv when cmdline unreadable in scoped-lock stale check
PR #24500 introduced stale-lock detection that calls `_looks_like_gateway_process` to confirm a running PID is not an unrelated process that reused the slot. On Windows neither `/proc` nor `ps` is available, so `_read_process_cmdline` always returns `None` and `_looks_like_gateway_process` always returns `False` — causing every valid Windows gateway lock to be marked stale and immediately evicted. Fix: after `_looks_like_gateway_process` returns `False`, call `_read_process_cmdline` directly. If the result is non-`None` the live cmdline was readable and confirms the PID is foreign → stale. If it is `None` (cmdline unreadable, e.g. Windows without ps), fall back to `_record_looks_like_gateway` which validates the stored `argv` the gateway wrote into the lock file at startup. Both oracles must say "not a gateway" before the lock is evicted — the same two-oracle pattern already used in `get_running_pid` (line 941). Adds a regression test that simulates a Windows host where `_looks_like_gateway_process` returns `False` for every PID and `_read_process_cmdline` returns `None`, confirming the lock is kept when the record's argv identifies it as a gateway process.
This commit is contained in:
parent
24e2151cd6
commit
f9559c39c4
2 changed files with 42 additions and 3 deletions
|
|
@ -613,15 +613,20 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
|
|||
stale = True
|
||||
# When start_time comparison is unavailable (macOS / Windows
|
||||
# have no /proc, so both sides are None), fall back to
|
||||
# checking the live process command line. If the PID was
|
||||
# reused by an unrelated process the lock is stale.
|
||||
# checking the live process command line. When cmdline is
|
||||
# also unreadable (Windows has no ps), consult the lock
|
||||
# record's own argv — the gateway writes it at startup and
|
||||
# it's the only identity signal on platforms without ps.
|
||||
# Both oracles must indicate "not a gateway" to mark stale.
|
||||
if (
|
||||
not stale
|
||||
and existing.get("start_time") is None
|
||||
and current_start is None
|
||||
and not _looks_like_gateway_process(existing_pid)
|
||||
):
|
||||
stale = True
|
||||
live_cmdline = _read_process_cmdline(existing_pid)
|
||||
if live_cmdline is not None or not _record_looks_like_gateway(existing):
|
||||
stale = True
|
||||
# Check if process is stopped (Ctrl+Z / SIGTSTP) — stopped
|
||||
# processes still appear alive to _pid_exists but are not
|
||||
# actually running. Treat them as stale so --replace works.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue