hermes-agent/apps/desktop/src
Austin Pickett 016bce1a09
fix(desktop): recover stranded session windows when resume fails (#47655)
* fix(desktop): recover stranded session windows when resume fails

Opening a session in a new window (or any routed resume) could latch the
thread loader on "session" forever — the reported "stays stuck loading,
even after a nap" bug. Two compounding causes:

1. use-session-actions.resumeSession's catch ran the REST transcript
   fallback OUTSIDE its own try. When session.resume rejected AND the
   fallback also threw (the common case on a wedged/unreachable backend),
   the throw skipped setMessages and left activeSessionId null with an
   empty transcript — exactly the state the loader gates on
   (messagesEmpty && !activeSessionId), with no terminal/error state.

2. use-route-resume's self-heal could never re-fire: resumeSession sets
   selectedStoredSessionIdRef synchronously at entry (before failing), so
   stuckOnRoutedSession stays false, and on an already-open idle window
   neither pathnameChanged nor gatewayBecameOpen fire again. The window
   never retried — naps, focus, nothing recovered it.

Fix:
- Wrap the REST fallback in its own try so a fallback failure can't strand
  the loader.
- Add $resumeFailedSessionId: armed on terminal resume failure, cleared at
  the next resume's entry (and left clear on success).
- use-route-resume gains a bounded backoff auto-retry (4 attempts, 1s→8s)
  that re-resumes while the routed session matches the failure flag, with a
  fire-time liveness recheck so a recovered session isn't double-resumed.

Regression tests cover: fallback-wrap arming the flag without throwing,
flag cleared on success, retry fires on backoff, no retry for a
non-routed/recovered session, and the retry cap.

* feat(desktop): show error + manual Retry when resume retries exhaust

When a stranded session window's bounded auto-retry gives up (gateway
resume RPC + REST fallback fail through all MAX_RESUME_RETRIES attempts),
the loader latched forever. Add a $resumeExhaustedSessionId atom armed at
the give-up point so the chat view swaps the perpetual spinner for an
explicit error state + manual Retry button. Retry / reconnect / reselect
clears the latch and resets the auto-retry counter for a fresh cycle; a
route-change away from the stranded session also clears it.

Distinct from $resumeFailedSessionId (armed during the backoff window) so
the error UI only appears once auto-recovery has actually given up, not
mid-retry. Adds i18n strings across en/ja/zh/zh-hant and 3 tests covering
latch-arms-on-exhaustion, stays-clear-while-retries-remain, and
clears-on-route-change.

* fix(desktop): address review on stranded-resume recovery layer

Follow-up to review on #47655 (PR head 253bfc0e3). Four issues on the
recovery layer:

1. (blocking) Arm $resumeFailedSessionId only when the transcript is still
   empty after the REST fallback ($messages.get().length === 0), matching the
   atom's documented contract and the loader's messagesEmpty gate. Previously
   armed on any resume-RPC reject regardless of fallback outcome, so a window
   that recovered its history via REST still auto-retried and, on exhaustion,
   blanked the visible transcript behind the error overlay.

2. Reset the bounded-retry attempt counter on the $resumeExhaustedSessionId
   armed->cleared edge so a manual Retry / reconnect / reselect on the SAME
   stranded session gets a fresh backoff cycle, not a single one-shot attempt
   that immediately re-arms the error. (Keyed on the exhausted latch rather
   than the resumeFailedSessionId null->value transition the review suggested:
   the auto-retry loop itself toggles resumeFailedSessionId every cycle, so
   keying the reset there would defeat the MAX_RESUME_RETRIES cap. Only
   resumeSession clears the exhausted latch, making its clear edge the
   unambiguous manual-retry signal.)

3. Advance retryAttemptRef only when the timer actually dispatches a resume,
   not at schedule time. Prevents unrelated dep changes during the 1s-8s
   backoff window (transient gatewayState flip, non-stable resumeSession) from
   burning attempts and hitting MAX with fewer than 4 real resume attempts.

4. Drop unrelated blank-line-only insertions in store/session.ts and
   use-session-actions.ts to keep the diff tight.

Tests: +3 (RPC-fails-REST-succeeds-no-arm; manual-retry-fresh-cycle;
no-attempts-burned-on-dep-churn). All 19 resume tests + full session-hook
suite (65) pass; tsc --noEmit clean.

---------

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 17:33:53 -04:00
..
app fix(desktop): recover stranded session windows when resume fails (#47655) 2026-06-17 17:33:53 -04:00
components feat(desktop): add dismiss control to chat error banners (#47985) 2026-06-17 16:46:43 -04:00
fonts fix(desktop): crisp terminal text via opaque xterm canvas 2026-06-12 19:36:30 -05:00
hooks fix(desktop): keep generated images in the tool slot, not inline 2026-06-13 02:42:15 -05:00
i18n fix(desktop): recover stranded session windows when resume fails (#47655) 2026-06-17 17:33:53 -04:00
lib perf(desktop): keep oversized messages from freezing the chat 2026-06-17 08:25:52 -05:00
store fix(desktop): recover stranded session windows when resume fails (#47655) 2026-06-17 17:33:53 -04:00
themes feat(desktop): composer status stack, live subagent windows, editable prompts (#44630) 2026-06-12 08:30:06 -05:00
types feat(desktop): disconnect external (CLI-managed) providers 2026-06-16 00:08:21 -05:00
global.d.ts feat(desktop): open new sessions in compact windows 2026-06-15 20:59:57 -05:00
hermes.test.ts fix(desktop): route profile session reads 2026-06-11 18:09:24 -05:00
hermes.ts fix: add provider account removal 2026-06-13 21:47:13 -07:00
main.tsx feat(desktop): window translucency slider in Appearance settings (#45086) 2026-06-12 12:02:38 -05:00
styles.css Merge branch 'main' into bb/desktop-stick-to-bottom 2026-06-13 02:52:03 -05:00
vite-env.d.ts Add Hermes desktop app (#20059) 2026-05-31 17:46:56 -05:00