hermes-agent/tools/environments
valentt 7bd1f8a2d1 fix(mcp): re-validate PGID ownership before killing orphaned process groups
The stdio-MCP orphan reaper (`_kill_orphaned_mcp_children`) and the local
environment's `_kill_process` both signalled a process *group* via
`os.killpg(pgid, sig)` using a PGID captured at spawn time. Once the original
child exits and is reaped, the kernel can recycle that PID/PGID onto an
unrelated process group, so a later sweep would `killpg` a stranger.

Observed in the wild: a long-running gateway reaped a recycled PGID that had
landed on a desktop browser's session leader, SIGTERM-ing Firefox at irregular
intervals (proven with the kernel `signal:signal_generate` tracepoint —
`comm=firefox` killed by the hermes gateway via a recycled group id).

Fix: record each leader's start time (`/proc/<pid>/stat` field 22, via the
existing `gateway.status.get_process_start_time`) alongside the PGID, and
re-check it before signalling. A PGID whose leader's start time no longer
matches — or whose leader is gone — is never signalled. When no start time is
available (e.g. macOS has no /proc) the guard degrades to the prior best-effort
behaviour, so non-Linux platforms are unaffected.

Adds regression tests covering both the recycled-PID (skip) and matching-PID
(signal) paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:37:25 +02:00
..
__init__.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
base.py fix(tools): don't compound-rewrite spawn_via_env background wrappers 2026-06-01 00:05:10 +05:30
daytona.py fix(daytona): migrate legacy-sandbox lookup to cursor-based list() (#24587) 2026-05-12 16:31:46 -07:00
docker.py fix(docker): support s6 /init images in terminal sandbox (#34628) (#34635) 2026-06-01 13:46:04 +10:00
file_sync.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
local.py fix(mcp): re-validate PGID ownership before killing orphaned process groups 2026-06-09 21:37:25 +02:00
managed_modal.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
modal.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
modal_utils.py fix(tools): don't compound-rewrite spawn_via_env background wrappers 2026-06-01 00:05:10 +05:30
singularity.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
ssh.py fix(ssh): keep bulk sync extraction scoped to .hermes 2026-05-21 19:17:51 -07:00