hermes-agent/web/src
firefly ae94ed1728
fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main)
Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the
parent-death watchdog (slash_worker.py) and PTY process-group teardown
(pty_bridge.py) directly on main. Those pieces are intentionally NOT included
here — this carries only what is still missing:

- C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at
  stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect`
  sessions and schedules the grace-reap for the rest, offloaded via
  `asyncio.to_thread` so the blocking worker.close() + DB write never stalls
  the uvicorn loop.
- C2 create/close orphan race: `_attach_worker` stores the worker iff
  `_sessions.get(sid) is session` under the lock (else closes it), applied at
  every spawn site incl. the post-turn `_restart_slash_worker`.
- Single idempotent teardown funnel: session.close, WS disconnect, the
  generous-TTL idle reaper, shutdown, and the WS grace-reap all reach
  `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags
  make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock.
- uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy
  524) becomes a `WebSocketDisconnect` and the C1 path runs.

Plus two review-driven hardening fixes (mine):

- `session.active_list` now skips `_finalized` sessions so the footer
  "N sessions" count reflects attachable sessions instead of only ever
  growing until restart (#38950). Keys on `_finalized` only, NOT the stdio
  sentinel, so a standalone `hermes --tui` session stays visible.
- `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id`
  (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated
  `_session_resume_lock` (#39591); the resume_lock now only guards the orphan
  re-check against `session.resume`.
- Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`)
  parse with a fallback helper so a malformed value can't crash the worker at
  import.

Fixes #32377
Fixes #38950
Addresses #22855

Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com>
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-08 10:02:05 -07:00
..
components fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) 2026-06-08 10:02:05 -07:00
contexts fix(dashboard): surface Docker update guidance instead of generic failure (#34347) (#37085) 2026-06-02 10:36:10 +10:00
hooks Merge remote-tracking branch 'origin/main' into refactor/use-ds-primitives 2026-05-28 14:20:49 -04:00
i18n feat(dashboard): change UI font from the theme picker, independent of theme (#41145) 2026-06-07 03:39:01 -07:00
lib feat(dashboard): change UI font from the theme picker, independent of theme (#41145) 2026-06-07 03:39:01 -07:00
pages feat(dashboard): full tool backend configuration in the GUI (#40418) 2026-06-06 07:45:36 -07:00
plugins fix(dashboard): sanction plugin WS/upload auth via SDK helpers (gated mode) 2026-06-03 16:59:36 -07:00
themes feat(dashboard): change UI font from the theme picker, independent of theme (#41145) 2026-06-07 03:39:01 -07:00
App.tsx feat(dashboard): Channels page — set up every gateway messaging channel from the browser (#37211) 2026-06-01 23:41:35 -07:00
index.css feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383) 2026-06-02 12:37:40 -04:00
main.tsx fix(dashboard): remove country flags from language picker (#29997) 2026-05-21 13:10:52 -07:00