hermes-agent/gateway
kshitij 7379f17556
fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749)
* fix(gateway): only fire planned-stop watcher for markers targeting self

Salvaged from #34599 — rebased onto current main.

The planned-stop watcher now only fires shutdown for a marker that targets
the current process, instead of any marker that exists on disk. Fixes the
Windows crash loop (#34597) where a stale marker from a previous Gateway
instance kills a freshly booted Gateway ~400ms after start with a false
"Received UNKNOWN — initiating shutdown".

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

* fix(gateway): match planned-stop/takeover markers by PID alone when start_time is unavailable

Follow-up to the #34599 salvage. The watcher's non-destructive probe
(planned_stop_marker_targets_self) already falls back to PID equality when
a process start_time is unavailable, but the authoritative consume it gates
(_consume_pid_marker_for_self) still required a non-None start_time match.

_get_process_start_time reads /proc/<pid>/stat and returns None on macOS and
native Windows — the only platform the planned-stop watcher exists for. So on
Windows the probe would fire the shutdown handler (PID matches) but the
handler's consume_planned_stop_marker_for_self() would return False, and a
legitimate 'hermes gateway stop' was still misclassified as an unexpected
UNKNOWN exit (exit 1) and revived by the service manager — a residual half of
the #34597 crash loop on the legitimate-stop path.

Align the consume with the probe: when both start_times are known they must
match (PID-reuse guard preserved on Linux); when either is unavailable, fall
back to PID equality alone, bounded by the existing short marker TTL. This
also fixes the parallel --replace takeover consume on Windows, which shares
the same helper.

Adds regression tests for the Windows (None start_time) path, the foreign-PID
rejection under that fallback, and confirmation the start_time-mismatch guard
still rejects when both are known.

---------

Co-authored-by: Bartok9 <danielrpike9@gmail.com>
2026-05-29 17:36:58 +00:00
..
assets fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
builtin_hooks remove: BOOT.md built-in hook (#17093) 2026-04-28 09:50:27 -07:00
platforms fix(gateway): trust adapter-owned access policy over env default-deny (#34515) 2026-05-29 04:22:41 -07:00
__init__.py docs(gateway): mention Weixin in gateway help and docstrings 2026-05-12 17:08:51 -07:00
channel_directory.py refactor(ntfy): convert built-in adapter to platform plugin 2026-05-23 16:13:01 -07:00
config.py refactor(gateway): migrate Mattermost adapter to bundled plugin 2026-05-24 18:05:33 -07:00
delivery.py refactor: simplify Telegram DM topic refresh 2026-05-25 14:54:02 -07:00
display_config.py fix(gateway): keep Telegram heartbeat + interim commentary on; edit heartbeat in place (#33187) 2026-05-27 05:21:53 -07:00
hooks.py fix(plugins): register dynamically-loaded modules in sys.modules before exec 2026-04-29 23:34:35 -07:00
memory_monitor.py Port from cline/cline#10343: periodic gateway memory logging (#27102) 2026-05-16 12:55:23 -07:00
mirror.py refactor(gateway): drop _append_to_jsonl from mirror 2026-05-20 13:00:57 -07:00
pairing.py fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips 2026-05-23 01:46:34 -07:00
platform_registry.py refactor(plugins): add apply_yaml_config_fn registry hook 2026-05-13 22:20:30 -07:00
restart.py fix(gateway): address restart review feedback 2026-04-10 21:18:34 -07:00
run.py fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749) 2026-05-29 17:36:58 +00:00
runtime_footer.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
session.py fix(gateway): separate observed Telegram group context 2026-05-23 01:33:42 -07:00
session_context.py fix(cli): synchronize HERMES_SESSION_ID across environment and contextvar during session switches 2026-05-23 17:46:55 -07:00
shutdown_forensics.py chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926) 2026-05-11 11:03:29 -07:00
slash_access.py feat(gateway): per-platform admin/user split for slash commands (salvage of #4443) (#23373) 2026-05-10 12:33:54 -07:00
status.py fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749) 2026-05-29 17:36:58 +00:00
sticker_cache.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
stream_consumer.py fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered 2026-05-28 03:15:19 -07:00
whatsapp_identity.py fix(whatsapp_identity): pin identifier regex to ASCII, clarify it's defense-in-depth 2026-04-26 20:48:31 -07:00