mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
When a user hits /new or /resume before the previous session finishes initializing, session.close runs while the previous session.create's _build thread is still constructing the agent. session.close pops _sessions[sid] and closes whatever slash_worker it finds (None at that point — _build hasn't installed it yet), then returns. _build keeps running in the background, installs the slash_worker subprocess and registers an approval-notify callback on a session dict that's now unreachable via _sessions. The subprocess leaks until process exit; the notify callback lingers in the global registry. Fix: _build now tracks what it allocates (worker, notify_registered) and checks in its finally block whether _sessions[sid] still points to the session it's building for. If not, the build was orphaned by a racing close, so clean up the subprocess and unregister the notify ourselves. tui_gateway/server.py: - _build reads _sessions.get(sid) safely (returns early if already gone) - tracks allocated worker + notify registration - finally checks orphan status and cleans up Tests (tests/test_tui_gateway_server.py): 2 new cases. - test_session_create_close_race_does_not_orphan_worker: slow _make_agent, close mid-build, verify worker.close() and unregister_gateway_notify both fire from the build thread's cleanup path. - test_session_create_no_race_keeps_worker_alive: regression guard — happy path does NOT over-eagerly clean up a live worker. Validated: against the unpatched code, the race test fails with 'orphan worker was not cleaned up — closed_workers=[]'. Live E2E against the live Python environment confirmed the cleanup fires exactly when the race happens. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| entry.py | ||
| render.py | ||
| server.py | ||
| slash_worker.py | ||