hermes-agent/gateway
dyxushuai d72985b7ce fix(gateway): serialize reset command handoff and heal stale session locks
Closes the adapter-side half of the split-brain described in issue #11016
where _active_sessions stays live but nothing is processing, trapping the
chat in repeated 'Interrupting current task...' while /stop reports no
active task.

Changes on BasePlatformAdapter:
- Add _session_tasks: Dict[str, asyncio.Task] mapping session -> owner task
  so session-terminating commands can cancel the right task and old task
  finally blocks can't clobber a newer task's guard.
- Add _release_session_guard(guard=...) that only releases if the guard
  Event still matches, preventing races where /stop or /new swaps in a
  temporary guard while the old task unwinds.
- Add _session_task_is_stale() and _heal_stale_session_lock() for
  on-entry self-heal: when handle_message() sees an _active_sessions
  entry whose RECORDED owner task is done/cancelled, clear it and fall
  through to normal dispatch.  No owner task recorded = not stale (some
  tests install guards directly and shouldn't be auto-healed).
- Add cancel_session_processing() as the explicit adapter-side cancel
  API so /stop/ /new/ /reset can cleanly tear down in-flight work.
- Route /stop, /new, /reset through _dispatch_active_session_command():
    1. install a temporary command guard so follow-ups stay queued
    2. let the runner process the command
    3. cancel the old adapter task AFTER the runner response is ready
    4. release the command guard and drain the latest pending follow-up
- _start_session_processing() replaces the inline create_task + guard
  setup in handle_message() so guard + owner-task entry land atomically.
- cancel_background_tasks() also clears _session_tasks.

Combined, this means:
- /stop / /new / /reset actually cancel stuck work instead of leaving
  adapter state desynced from runner state.
- A dead session lock self-heals on the next inbound message rather than
  persisting until gateway restart.
- Follow-up messages after /new are processed in order, after the reset
  command's runner response lands.

Refs #11016
2026-04-23 05:15:52 -07:00
..
builtin_hooks refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
platforms fix(gateway): serialize reset command handoff and heal stale session locks 2026-04-23 05:15:52 -07:00
__init__.py Enhance CLI with multi-platform messaging integration and configuration management 2026-02-02 19:01:51 -08:00
channel_directory.py feat(discord): support forum channels 2026-04-17 20:25:48 -07:00
config.py feat(gateway/slack): add SLACK_REACTIONS env toggle for reaction lifecycle 2026-04-22 08:49:24 -07:00
delivery.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
display_config.py fix(gateway): fix regression causing display.streaming to override root streaming key 2026-04-14 10:52:23 -07:00
hooks.py feat(gateway): expose plugin slash commands natively on all platforms + decision-capable command hook 2026-04-22 16:23:21 -07:00
mirror.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
pairing.py fix: multiple platform adaptors concurrency 2026-04-06 16:49:54 -07:00
restart.py fix(gateway): address restart review feedback 2026-04-10 21:18:34 -07:00
run.py fix(model_switch): group custom_providers by endpoint in /model picker (#9210) 2026-04-23 03:10:30 -07:00
session.py fix(feishu): correct identity model docs and prefer tenant-scoped user_id 2026-04-22 18:06:22 -07:00
session_context.py fix(cron): run due jobs in parallel to prevent serial tick starvation (#13021) 2026-04-20 11:53:07 -07:00
status.py fix(status): catch OSError in os.kill(pid, 0) for Windows compatibility 2026-04-23 03:05:49 -07:00
sticker_cache.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
stream_consumer.py fix(gateway): strip cursor from frozen message on empty fallback continuation (#7183) 2026-04-19 01:51:12 -07:00