hermes-agent

mirrors/hermes-agent

Fork 0

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Commit graph

Author	SHA1	Message	Date
Teknium	5651a73331	fix(gateway): guard-match the finally-block _active_sessions delete Before this, _process_message_background's finally did an unconditional 'del self._active_sessions[session_key]' — even if a /stop/ /new command had already swapped in its own command_guard via _dispatch_active_session_command and cancelled us. The old task's unwind would clobber the newer guard, opening a race for follow-ups. Replace with _release_session_guard(session_key, guard=interrupt_event) so the delete only fires when the guard we captured is still the one installed. The sibling _session_tasks pop already had equivalent ownership matching via asyncio.current_task() identity; this closes the asymmetry. Adds two direct regressions in test_session_split_brain_11016: - stale guard reference must not clobber a newer guard by identity - guard=None default still releases unconditionally (for callers that don't have a captured guard to match against) Refs #11016	2026-04-23 05:15:52 -07:00
Teknium	ec02d905c9	test(gateway): regressions for issue #11016 split-brain session locks Covers all three layers of the salvaged fix: 1. Adapter-side cancellation: /stop, /new, /reset cancel the in-flight adapter task, release the guard, and let follow-up messages through; /new keeps the guard installed until the runner response lands, then drains the queued follow-up in order. 2. Adapter-side self-heal: a split-brain guard (done owner task, lock still live) is healed on the next inbound message and the user gets a reply instead of being trapped in infinite busy acks. A guard with no recorded owner task is NOT auto-healed (protects fixtures that install guards directly). 3. Runner-side generation guard: stale async runs whose generation was bumped by /stop or /new cannot clear a newer run's _running_agents slot on the way out. 11 tests, all green. Refs #11016	2026-04-23 05:15:52 -07:00

Author

SHA1

Message

Date

Teknium

5651a73331

fix(gateway): guard-match the finally-block _active_sessions delete

Before this, _process_message_background's finally did an unconditional
'del self._active_sessions[session_key]' — even if a /stop/ /new
command had already swapped in its own command_guard via
_dispatch_active_session_command and cancelled us.  The old task's
unwind would clobber the newer guard, opening a race for follow-ups.

Replace with _release_session_guard(session_key, guard=interrupt_event)
so the delete only fires when the guard we captured is still the one
installed.  The sibling _session_tasks pop already had equivalent
ownership matching via asyncio.current_task() identity; this closes the
asymmetry.

Adds two direct regressions in test_session_split_brain_11016:
- stale guard reference must not clobber a newer guard by identity
- guard=None default still releases unconditionally (for callers that
  don't have a captured guard to match against)

Refs #11016

2026-04-23 05:15:52 -07:00

Teknium

ec02d905c9

test(gateway): regressions for issue #11016 split-brain session locks

Covers all three layers of the salvaged fix:

1. Adapter-side cancellation: /stop, /new, /reset cancel the in-flight
   adapter task, release the guard, and let follow-up messages through;
   /new keeps the guard installed until the runner response lands, then
   drains the queued follow-up in order.

2. Adapter-side self-heal: a split-brain guard (done owner task, lock
   still live) is healed on the next inbound message and the user gets
   a reply instead of being trapped in infinite busy acks.  A guard
   with no recorded owner task is NOT auto-healed (protects fixtures
   that install guards directly).

3. Runner-side generation guard: stale async runs whose generation was
   bumped by /stop or /new cannot clear a newer run's _running_agents
   slot on the way out.

11 tests, all green.

Refs #11016

2026-04-23 05:15:52 -07:00

2 commits