hermes-agent/gateway
Teknium 3a6351454b
fix(gateway): close pending-drain and late-arrival races in base adapter (#12371)
Two related race conditions in gateway/platforms/base.py that could
produce duplicate agent runs or silently drop messages. Neither is
specific to any one platform — all adapters inherit this logic.

R5 (HIGH) — duplicate agent spawn on turn chain
  In _process_message_background, the pending-drain path deleted
  _active_sessions[session_key] before awaiting typing_task.cancel()
  and then recursively awaiting _process_message_background for the
  queued event. During the typing_task await, a fresh inbound message
  M3 could pass the Level-1 guard (entry now missing), set its own
  Event, and spawn a second _process_message_background for the same
  session_key — two agents running simultaneously, duplicate responses,
  duplicate tool calls.

  Fix: keep the _active_sessions entry populated and only clear() the
  Event. The guard stays live, so any concurrent inbound message takes
  the busy-handler path (queue + interrupt) as intended.

R6 (MED-HIGH) — message dropped during finally cleanup
  The finally block has two await points (typing_task, stop_typing)
  before it deletes _active_sessions. A message arriving in that
  window passes the guard (entry still live), lands in
  _pending_messages via the busy-handler — and then the unconditional
  del removes the guard with that message still queued. Nothing
  drains it; the user never gets a reply.

  Fix: before deleting _active_sessions in finally, pop any late
  pending_messages entry and spawn a drain task for it. Only delete
  _active_sessions when no pending is waiting.

Tests: tests/gateway/test_pending_drain_race.py — three regression
cases. Validated: without the fix, two of the three fail exactly
where the races manifest (duplicate-spawn guard loses identity,
late-arrival 'LATE' message not in processed list).
2026-04-18 19:32:26 -07:00
..
builtin_hooks refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
platforms fix(gateway): close pending-drain and late-arrival races in base adapter (#12371) 2026-04-18 19:32:26 -07:00
__init__.py Enhance CLI with multi-platform messaging integration and configuration management 2026-02-02 19:01:51 -08:00
channel_directory.py feat(discord): support forum channels 2026-04-17 20:25:48 -07:00
config.py fix(qqbot): add back-compat for env var rename; drop qrcode core dep 2026-04-17 15:31:14 -07:00
delivery.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
display_config.py fix(gateway): fix regression causing display.streaming to override root streaming key 2026-04-14 10:52:23 -07:00
hooks.py feat: built-in boot-md hook — run BOOT.md on gateway startup (#3733) 2026-03-29 10:19:54 -07:00
mirror.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
pairing.py fix: multiple platform adaptors concurrency 2026-04-06 16:49:54 -07:00
restart.py fix(gateway): address restart review feedback 2026-04-10 21:18:34 -07:00
run.py fix(gateway): close adapter resources when connect() fails or raises (#12339) 2026-04-18 18:53:31 -07:00
session.py fix(gateway): auto-resume sessions after drain-timeout restart (#11852) (#12301) 2026-04-18 17:32:17 -07:00
session_context.py fix: prevent stale os.environ leak after clear_session_vars (#10304) (#10527) 2026-04-15 14:27:17 -07:00
status.py fix(gateway): detect legacy hermes.service + mark --replace SIGTERM as planned (#11909) 2026-04-17 19:27:58 -07:00
sticker_cache.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
stream_consumer.py feat(dingtalk): AI Cards streaming, emoji reactions, and media handling 2026-04-17 19:26:53 -07:00