hermes-agent/gateway
teknium1 ccf526964a fix(gateway): bound adapter teardown awaits on the stop path (#14128)
The main stop loop in _stop_impl() awaited adapter.cancel_background_tasks()
and adapter.disconnect() with no timeout, for both the primary and the
secondary-profile (multiplex) adapter maps. A half-dead platform — a wedged
Feishu/Lark WebSocket thread blocked on network I/O is the reported case —
makes one of those awaits block forever, so the process never exits. systemd
then SIGKILLs it after TimeoutStopSec, skipping atexit PID-file cleanup, and
the next start dies with 'PID file race lost' and enters a restart loop.

The per-adapter timeout infra already existed on main
(_adapter_disconnect_timeout_secs / HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT,
default 5s) but was only wired into _safe_adapter_disconnect, which the
teardown path never calls.

Add _bounded_adapter_teardown(): wraps BOTH cancel_background_tasks() and
disconnect() in the existing timeout budget, logs and forces forward progress
on timeout, and never raises. Both teardown loops now route through it, so the
stop sequence always completes regardless of any adapter's internal behavior
and PID-file cleanup runs.

Original report + fix direction by @happy5318 (#14128, #14130); this widens it
to cover cancel_background_tasks(), the multiplex loop, and the config knob.

Co-authored-by: happy5318 <happy5318@users.noreply.github.com>
2026-06-27 19:05:04 -07:00
..
assets fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
builtin_hooks remove: BOOT.md built-in hook (#17093) 2026-04-28 09:50:27 -07:00
platforms revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
relay feat(relay): multi-platform-per-agent — list identity, provision-loop, N-hello, per-frame egress (Phase 1.5) (#52830) 2026-06-26 17:32:46 +10:00
__init__.py docs(gateway): mention Weixin in gateway help and docstrings 2026-05-12 17:08:51 -07:00
authz_mixin.py fix(relay): authorize relay-delivered events by delivery, not source.platform (#52306) 2026-06-25 14:21:09 +10:00
channel_directory.py docs(sessions): clarify sessions.json is the gateway routing index, not the session list (#51726) 2026-06-23 23:56:36 -07:00
code_skew.py fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
config.py Address email pairing review feedback 2026-06-21 22:43:57 -07:00
delivery.py fix(delivery): drop env-var knob, flag all chunking adapters 2026-06-22 05:41:22 -07:00
display_config.py feat(discord): render reasoning as -# subtext via display.reasoning_style (#51168) 2026-06-23 10:44:02 -07:00
drain_control.py fix(gateway): stamp drain marker with instantiation epoch so a durable-volume restart clears it (NS-570) 2026-06-26 18:59:41 +05:30
hooks.py feat(hooks): expose thread_id and chat_type in agent:start/end context (#41672) 2026-06-07 19:16:36 -07:00
kanban_watchers.py fix(kanban): honor kanban.auto_decompose toggle live, without a gateway restart (#50358) 2026-06-21 12:43:44 -07:00
memory_monitor.py Port from cline/cline#10343: periodic gateway memory logging (#27102) 2026-05-16 12:55:23 -07:00
message_timestamps.py feat(gateway): inject stable human-readable message timestamps 2026-06-16 15:49:59 -07:00
mirror.py fix(cron): mirror continuable cron as a labelled user turn (alternation-safe) 2026-06-24 20:27:05 -07:00
pairing.py fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips 2026-05-23 01:46:34 -07:00
platform_registry.py refactor(plugins): add apply_yaml_config_fn registry hook 2026-05-13 22:20:30 -07:00
response_filters.py fix(gateway): suppress exact silence tokens without mutating history 2026-06-14 03:25:08 -07:00
restart.py fix(gateway): exit 78 (EX_CONFIG) on fatal startup errors, s6 finish script stops restart loop 2026-06-24 16:34:51 +10:00
rich_sent_store.py fix(telegram): resolve replies to rich (sendRichMessage) messages 2026-06-16 13:04:20 -07:00
run.py fix(gateway): bound adapter teardown awaits on the stop path (#14128) 2026-06-27 19:05:04 -07:00
runtime_footer.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
scale_to_zero.py feat(gateway): scale-to-zero idle detection + dormant-quiesce (Phase 0) 2026-06-24 18:47:18 -07:00
session.py fix(gateway): dedupe user turns on transient failure (#47237) 2026-06-26 00:11:17 +05:30
session_context.py fix(api-server): stop silently promising async delivery on stateless HTTP path (#50319) 2026-06-21 12:15:14 -07:00
shutdown_forensics.py chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926) 2026-05-11 11:03:29 -07:00
slash_access.py feat(gateway): per-platform admin/user split for slash commands (salvage of #4443) (#23373) 2026-05-10 12:33:54 -07:00
slash_commands.py fix(gateway): offload /model switch off the event loop (#53603) 2026-06-27 04:36:22 -07:00
status.py revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
sticker_cache.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
stream_consumer.py fix(gateway): respect adapter decline of fresh-final to prevent double delivery 2026-06-21 13:55:50 -07:00
stream_dispatch.py feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250) 2026-06-02 00:33:50 -07:00
stream_events.py feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250) 2026-06-02 00:33:50 -07:00
whatsapp_identity.py fix(whatsapp): normalize bare phone targets to JIDs before bridge send 2026-06-21 13:32:22 -07:00