fix(gateway): consult lock record argv when cmdline unreadable in scoped-lock stale check

PR #24500 introduced stale-lock detection that calls
`_looks_like_gateway_process` to confirm a running PID is not an
unrelated process that reused the slot.  On Windows neither `/proc`
nor `ps` is available, so `_read_process_cmdline` always returns
`None` and `_looks_like_gateway_process` always returns `False` —
causing every valid Windows gateway lock to be marked stale and
immediately evicted.

Fix: after `_looks_like_gateway_process` returns `False`, call
`_read_process_cmdline` directly.  If the result is non-`None` the
live cmdline was readable and confirms the PID is foreign → stale.
If it is `None` (cmdline unreadable, e.g. Windows without ps), fall
back to `_record_looks_like_gateway` which validates the stored
`argv` the gateway wrote into the lock file at startup.  Both
oracles must say "not a gateway" before the lock is evicted — the
same two-oracle pattern already used in `get_running_pid` (line 941).

Adds a regression test that simulates a Windows host where
`_looks_like_gateway_process` returns `False` for every PID and
`_read_process_cmdline` returns `None`, confirming the lock is kept
when the record's argv identifies it as a gateway process.
This commit is contained in:
EloquentBrush 2026-05-13 02:15:57 +03:00 committed by Teknium
parent 24e2151cd6
commit f9559c39c4
2 changed files with 42 additions and 3 deletions

View file

@ -613,15 +613,20 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
stale = True
# When start_time comparison is unavailable (macOS / Windows
# have no /proc, so both sides are None), fall back to
# checking the live process command line. If the PID was
# reused by an unrelated process the lock is stale.
# checking the live process command line. When cmdline is
# also unreadable (Windows has no ps), consult the lock
# record's own argv — the gateway writes it at startup and
# it's the only identity signal on platforms without ps.
# Both oracles must indicate "not a gateway" to mark stale.
if (
not stale
and existing.get("start_time") is None
and current_start is None
and not _looks_like_gateway_process(existing_pid)
):
stale = True
live_cmdline = _read_process_cmdline(existing_pid)
if live_cmdline is not None or not _record_looks_like_gateway(existing):
stale = True
# Check if process is stopped (Ctrl+Z / SIGTSTP) — stopped
# processes still appear alive to _pid_exists but are not
# actually running. Treat them as stale so --replace works.