Long TUI sessions were crashing Node via V8 fatal-OOM once transcripts +
reasoning blobs crossed the default 1.5–4GB heap cap. This adds defense
in depth: a bigger heap, leak-proofing the RPC hot path, bounded
diagnostic buffers, automatic heap dumps at high-water marks, and
graceful signal / uncaught handlers.
## Changes
### Heap budget
- hermes_cli/main.py: `_launch_tui` now injects `NODE_OPTIONS=
--max-old-space-size=8192 --expose-gc` (appended — does not clobber
user-supplied NODE_OPTIONS). Covers both `node dist/entry.js` and
`tsx src/entry.tsx` launch paths.
- ui-tui/src/entry.tsx: shebang rewritten to
`#!/usr/bin/env -S node --max-old-space-size=8192 --expose-gc` as a
fallback when the binary is invoked directly.
### GatewayClient (ui-tui/src/gatewayClient.ts)
- `setMaxListeners(0)` — silences spurious warnings from React hook
subscribers.
- `logs` and `bufferedEvents` replaced with fixed-capacity
CircularBuffer — O(1) push, no splice(0, …) copies under load.
- RPC timeout refactor: `setTimeout(this.onTimeout.bind(this), …, id)`
replaces the inline arrow closure that captured `method`/`params`/
`resolve`/`reject` for the full 120 s request timeout. Each Pending
record now stores its own timeout handle, `.unref()`'d so stuck
timers never keep the event loop alive, and `rejectPending()` clears
them (previously leaked the timer itself).
### Memory diagnostics (new)
- ui-tui/src/lib/memory.ts: `performHeapDump()` +
`captureMemoryDiagnostics()`. Writes heap snapshot + JSON diag
sidecar to `~/.hermes/heapdumps/` (override via
`HERMES_HEAPDUMP_DIR`). Diagnostics are written first so we still get
useful data if the snapshot crashes on very large heaps.
Captures: detached V8 contexts (closure-leak signal), active
handles/requests (`process._getActiveHandles/_getActiveRequests`),
Linux `/proc/self/fd` count + `/proc/self/smaps_rollup`, heap growth
rate (MB/hr), and auto-classifies likely leak sources.
- ui-tui/src/lib/memoryMonitor.ts: 10 s interval polling heapUsed. At
1.5 GB writes an auto heap dump (trigger=`auto-high`); at 2.5 GB
writes a final dump and exits 137 before V8 fatal-OOMs so the user
can restart cleanly. Handle is `.unref()`'d so it never holds the
process open.
### Graceful exit (new)
- ui-tui/src/lib/gracefulExit.ts: SIGINT/SIGTERM/SIGHUP run registered
cleanups through a 4 s failsafe `setTimeout` that hard-exits if
cleanup hangs.
`uncaughtException` / `unhandledRejection` are logged to stderr
instead of crashing — a transient TUI render error should not kill
an in-flight agent turn.
### Slash commands (new)
- ui-tui/src/app/slash/commands/debug.ts:
- `/heapdump` — manual snapshot + diagnostics.
- `/mem` — live heap / rss / external / array-buffer / uptime panel.
- Registered in `ui-tui/src/app/slash/registry.ts`.
### Utility (new)
- ui-tui/src/lib/circularBuffer.ts: small fixed-capacity ring buffer
with `push` / `tail(n)` / `drain()` / `clear()`. Replaces the ad-hoc
`array.splice(0, len - MAX)` pattern.
## Validation
- tsc `--noEmit` clean
- `vitest run`: 15 files, 102 tests passing
- eslint clean on all touched/new files
- build produces executable `dist/entry.js` with preserved shebang
- smoke-tested: `HERMES_HEAPDUMP_DIR=… performHeapDump('manual')`
writes both a valid `.heapsnapshot` and a `.diagnostics.json`
containing detached-contexts, active-handles, smaps_rollup.
## Env knobs
- `HERMES_HEAPDUMP_DIR` — override snapshot output dir
- `HERMES_HEAPDUMP_ON_START=1` — dump once at boot
- existing `NODE_OPTIONS` is respected and appended, not replaced
Make the Ink TUI match macOS keyboard expectations: Command handles copy and common editor/session shortcuts, while Control remains reserved for interrupt/cancel flows. Update the visible hotkey help to show platform-appropriate labels.
Models that emit reasoning inline as <think>/<reasoning>/<thinking>/<thought>/
<REASONING_SCRATCHPAD> tags in the content field (rather than a separate API
reasoning channel) had the raw tags + inner content shown twice: once as body
text with literal <think> markers, and again in the thinking panel when the
reasoning field was populated.
Port v1's tag set to lib/reasoning.ts with a splitReasoning(text) helper that
returns { reasoning, text }. Applied in three spots:
- scheduleStreaming: strips tags from the live streaming view so the user
never sees <think> mid-turn.
- flushStreamingSegment: when a tool interrupts assistant output mid-turn,
the saved segment is the stripped text; extracted reasoning promotes to
reasoningText if the API channel hasn't already populated it.
- recordMessageComplete: final message text is split, extracted reasoning
merges with any existing reasoning (API channel wins on conflicts so we
don't double-count when both are present).
Additional TUI fixes discovered in the same audit:
1. /plan slash command was silently lost — process_command() queues the
plan skill invocation onto _pending_input which nobody reads in the
slash worker subprocess. Now intercepted in slash.exec and routed
through command.dispatch with a new 'send' dispatch type.
Same interception added for /retry, /queue, /steer as safety nets
(these already have correct TUI-local handlers in core.ts, but the
server-side guard prevents regressions if the local handler is
bypassed).
2. Tool results were stripping ANSI escape codes — the messageLine
component used stripAnsi() + plain <Text> for tool role messages,
losing all color/styling from terminal, search_files, etc. Now
uses <Ansi> component (already imported) when ANSI is detected.
3. Terminal tab title now shows model + busy status via useTerminalTitle
hook from @hermes/ink (was never used). Users can identify Hermes
tabs and see at a glance whether the agent is busy or ready.
4. Added 'send' variant to CommandDispatchResponse type + asCommandDispatch
parser + createSlashHandler handler for commands that need to inject
a message into the conversation (plan, queue fallback, steer fallback).
Adds a minimal hand-rolled highlighter for ts/js/jsx/tsx, py, sh/bash, go, rust,
json, yaml, sql. Recognizes whole-line comments, single/double/backtick strings,
numbers, and per-language keyword sets. Unknown langs fall through to the current
plain rendering; the existing diff-specific colorization is preserved.
Closes the §8 "Markdown syntax highlighting is missing (only diff gets colored)"
finding from the TUI v2 audit without pulling in a highlighter library.
- hermes-ink: export `withInkSuspended()` + `useExternalProcess()` that
pause/resume Ink around an arbitrary external process (built on the
existing enterAlternateScreen/exitAlternateScreen plumbing)
- tui: `launchHermesCommand(args)` spawns the `hermes` binary with
inherited stdio, with `HERMES_BIN` override for non-standard launches
- tui: `/model` and `/setup` slash commands invoke the CLI wizards
in-place, then re-preflight `setup.status` and auto-start a session on
success — no more exit-and-relaunch to finish first-run setup
- setup panel now advertises those slashes instead of only pointing
users back at the shell
Hoist turn state from a 286-line hook into $turnState atom + turnController
singleton. createGatewayEventHandler becomes a typed dispatch over the
controller; its ctx shrinks from 30 fields to 5. Event-handler refs and 16
threaded actions are gone.
Fold three createSlash*Handler factories into a data-driven SlashCommand[]
registry under slash/commands/{core,session,ops}.ts. Aliases are data;
findSlashCommand does name+alias lookup. Shared guarded/guardedErr combinator
in slash/guarded.ts.
Split constants.ts + app/helpers.ts into config/ (timing/limits/env),
content/ (faces/placeholders/hotkeys/verbs/charms/fortunes), domain/ (roles/
details/messages/paths/slash/viewport/usage), protocol/ (interpolation/paste).
Type every RPC response in gatewayTypes.ts (26 new interfaces); drop all
`(r: any)` across slash + main app.
Shrink useMainApp from 1216 -> 646 lines by extracting useSessionLifecycle,
useSubmission, useConfigSync. Add <Fg> themed primitive and strip ~50
`as any` color casts.
Tests: 50 passing. Build + type-check clean.
Resumed sessions showed raw JSON tool output in content boxes instead
of the compact trail lines seen during live use. The root cause was
two separate rendering paths with no shared code.
Extract buildToolTrailLine() into lib/text.ts as the single source
of truth for formatting tool trail lines. Both the live tool.complete
handler and toTranscriptMessages now call it.
Server-side, reconstruct tool name and args from the assistant
message's tool_calls field (tool_name column is unpopulated) and
pass them through _tool_ctx/build_tool_preview — the same path
the live tool.start callback uses.