#19194's fix added process.exit(0) to die()/dieWithCode() with a comment
relying on a process.on('exit') handler in entry.tsx that resets terminal
modes — but that handler was never installed. So /quit, Ctrl+C, Ctrl+D and
every process.exit() path left DEC mouse tracking (?1000/1002/1003/1006)
armed in the parent shell. The terminal then kept emitting mouse reports
into stdin — read as keystrokes by the shell or a freshly relaunched TUI —
surfacing as ...;...M garbage in the input box.
Install the missing handler. 'exit' fires once on real termination and runs
synchronous code only; resetTerminalModes() writes via writeSync, so the
disable sequence lands before the process is gone.
Fixes#28419
Automatic heap dumps from the TUI memory monitor could write multi-GiB
.heapsnapshot files on every threshold cross, growing ~/.hermes/heapdumps
to tens of GiB. Add four layered safeguards:
- Gate auto-high/auto-critical snapshots behind HERMES_AUTO_HEAPDUMP=1;
manual dumps remain unchanged.
- Always write the lightweight diagnostics JSON sidecar so users still
get an actionable artifact when the snapshot is suppressed.
- Cap total bytes in the dump dir (HERMES_HEAPDUMP_MAX_BYTES, default
2 GiB), evicting oldest first, retaining the newest.
- Add a cooldown between auto dumps (HERMES_AUTO_HEAPDUMP_COOLDOWN_MS,
default 10 min) so an oscillating heap can't re-trigger.
Closes#21767
A heavy --tui session (browser snapshots, large tool outputs) silently
OOM-killed the Node parent within minutes — closing the gateway child's
stdin, which the user saw only as a bare "gateway exited" / stdin EOF.
CLI was immune. Root cause: each completed tool's verbose trail line
embedded up to 16KB of result_text, persisted in transcript Msg.tools[]
for the whole session and rendered EXPANDED by default, so an Ink
render-node tree was built for every one of up to 800 messages at once.
That tree blew past Node's heap at a few hundred MB — far below the 2.5GB
memory-monitor exit threshold, so the death was never even attributed.
- text.ts: persisted verbose tool-trail blocks now cap to a small preview
(VERBOSE_TRAIL_MAX_CHARS=800/12 lines), not the 16KB live-render budget.
Retained trail strings drop ~17x (12.2MB -> 0.7MB at 800 msgs); the live
streaming tail still uses the larger LIVE_RENDER budget.
- tui_gateway/server.py: lower the gateway-side verbose text cap to match
(1KB/16 lines) so we stop shipping output the TUI no longer renders.
- memoryMonitor.ts: derive critical/high thresholds from the real V8 heap
ceiling (~88%/70%) instead of the hardcoded 2.5GB that killed the process
at 31% of an 8GB ceiling; add a one-shot onWarn early-warning on fast
sub-threshold heap growth so the next such death is diagnosable, not silent.
- entry.tsx: wire onWarn to a crash-log breadcrumb + stderr line.
Full tool output is unchanged in the agent context and SQLite session — this
is display/transport only, no behavior or context change.
Fixes#34095. Related #27282.
Tests: ui-tui text + new memoryMonitor suites (33 pass), python verbose-cap
guard (5 pass); full ui-tui suite shows no new failures vs pristine main.
E2E repro confirms the retention drop.
* fix(tui): persist gateway lifecycle breadcrumbs to crash log
A backend SIGTERM (`=== SIGTERM received ===` in tui_gateway_crash.log) is
always a parent action — `gw.kill()` (graceful-exit on a signal to Node, or an
explicit /quit) or `start()` replacing a live child. #31051 added parent-side
lifecycle breadcrumbs but left them in an in-memory CircularBuffer that dies
with the process, so SIGTERM crash reports arrive with no parent context and no
way to tell a signal-driven kill from a memory-critical `process.exit(137)`
(which closes the child's stdin → clean EOF, not SIGTERM).
Persist the death-explaining breadcrumbs (spawn / transport-exit / child-exit /
replace-live-child / kill-reason / startup-timeout) plus the graceful-exit
signal name and the memory-critical exit into the same crash log the Python
side writes, so they interleave by timestamp next to the child's panic entry —
making these recurring reports diagnosable.
Gated off under VITEST so unit tests stay hermetic.
* feat(tui): auto-recover the session when the gateway dies unexpectedly
When a still-owned gateway child dies while the TUI is alive (a crash, OOM
process.exit, or a SIGTERM/SIGHUP forwarded to it), the app currently nulls the
session and drops to an inert "gateway exited" state — the user loses a long
session and has to restart + re-run everything. That single behavior is most of
the "TUI doesn't survive heavy work" complaint, independent of what does the
killing.
The 'exit' event only reaches this handler on an *unexpected* death: a user
/quit calls process.exit before it fires, and a replaced child is identity-
skipped in GatewayClient. So on exit we now respawn the gateway and resume the
session that was live (history is persisted in SQLite) via a one-shot
recoverSidRef the next gateway.ready consults before forging a new session. The
in-flight reply is lost (it died with the process) but the session survives.
Bounded to GATEWAY_RECOVERY_LIMIT (3) attempts per GATEWAY_RECOVERY_WINDOW_MS
(60s) so a gateway that crash-loops on startup can't spawn-storm; past the
budget we fall back to the inert state.
* fix(tui): sanitize newlines + soften SIGTERM-cause claim in parentLog
Address PR review:
- recordParentLifecycle collapses embedded \r\n so a multi-line value (e.g. an
error message) stays a single breadcrumb and can't masquerade as a separate
entry or as the child's panic output sharing the crash log.
- Reword the header: a backend SIGTERM is *usually* a parent action but can come
straight from an external supervisor (s6, cgroup OOM, stray kill); the
presence/absence of a [tui-parent] line before the child's panic is precisely
what disambiguates the two.
* fix(tui): clear sid during recovery + extract/test the recovery budget
Address PR review:
- Null `sid` immediately in the gateway exit handler. While the gateway is down
(busy=false) the old sid would otherwise let sid-guarded effects (the 1.5s
session.active_list poll, queue drain) fire RPCs at a dead/respawning gateway.
recoverSidRef carries the session forward; resumeById restores sid on ready.
- Extract the respawn budget into a pure evalRecovery() (gatewayRecovery.ts) and
unit-test the bound: allows GATEWAY_RECOVERY_LIMIT within the window, blocks
past it, and prunes attempts older than the window so recovery re-arms.
* fix(tui): cap parent-log breadcrumb length (PR review)
Truncate a single persisted breadcrumb to 4096 chars (matching GatewayClient's
in-memory log-line cap) so a pathological value — e.g. a giant error string —
can't bloat the shared crash log or add noticeable blocking on the synchronous
append during a failure path. Covered by a test.
* fix(tui): keep "recovering session…" status visible during resume (PR review)
resumeById() synchronously sets status to 'resuming…' on entry, so the
recovery branch now applies its 'recovering session…' label *after* calling
resumeById — the distinct label sticks for the duration of the resume RPC
(which later flips to 'ready') instead of being immediately clobbered. Test
updated to assert the ordering.
* fix(tui): keep recovery budget alive across a startup crash-loop (PR review)
deadSid was read from getUiState().sid, which the first exit nulls — so if the
respawned gateway crash-looped before gateway.ready (resumeById never restored
sid), later exits saw null and abandoned the session after a single attempt,
defeating the bounded retry budget.
Lift the whole decision into a pure planGatewayRecovery() that falls back to the
pending recoverSidRef target when the live sid is already cleared, and unit-test
the crash-loop sequence (keeps retrying the same session up to the limit, then
falls back to inert). Supersedes evalRecovery.
* chore(tui): drop non-null assertion + clarify breadcrumb cap comment (PR review)
- Recovery branch guards on `recoverSidRef && recoverSid` so the ref write needs
no `!` assertion (avoids a future unsafe refactor).
- Reword the parentLog cap comment: it slices the value to 4096 chars and
appends a short truncation marker (so the written line is slightly longer),
rather than implying a strict 4096-byte limit.
* chore(tui): soften "absence ⇒ external signal" + "any in-flight reply" (PR review)
- parentLog header: a missing [tui-parent] line only *suggests* an external
signal (the logger is best-effort: VITEST-disabled, failed append swallowed),
not a definitive conclusion.
- Recovery notice says "any in-flight reply was lost" since the gateway can also
exit while idle.
Some hosts (notably WSL) report a junk window size such as 131072 columns
by 1 row. Both the Ink fork and our components only guard against
0/null/undefined/NaN (stdout.columns || 80), so a positive-but-absurd
width sails through into createScreen(width*height), allocating tens to
hundreds of MB per frame and tripping the TUI memory monitor's hard exit.
Add clampStdoutDimensions(), installed in entry.tsx before ink.render: it
patches process.stdout.columns/rows with clamping getters (cols 1-2000,
rows 1-1000; out-of-range -> 80x24). One install point fixes the renderer,
its resize handler, and every component read. Live resizes still propagate
through the original descriptor, just clamped.
* fix(tui): log parent gateway lifecycle exits
Add parent-side breadcrumbs for TUI gateway shutdown and transport exits so future backend EOF/SIGTERM reports identify the parent action that caused them.
* chore(tui): retrigger lifecycle logging checks
Retry transient GitHub checkout failures on the lifecycle logging PR.
Adds a Termux runtime detection helper and gates three TUI defaults on it:
- Skip the startup scrollback clear on Termux so users can review/copy
earlier output after reopening the app. Desktop keeps the existing
\x1b[2J\x1b[H\x1b[3J slate (AlternateScreen takes over there anyway).
- Default INLINE_MODE on under Termux: primary-buffer rendering makes
long-thread review and copy/paste much less fragile when users
background/foreground the app. Override with HERMES_TUI_INLINE=0/1.
- Default mouse tracking off under Termux so touch selection isn't
intercepted by terminal mouse protocols. Explicit override via
HERMES_TUI_MOUSE_TRACKING=0/1; legacy HERMES_TUI_DISABLE_MOUSE still
works on desktop.
Detection is purely env-based (TERMUX_VERSION or PREFIX path) with an
explicit opt-out HERMES_TUI_TERMUX_MODE=0 for debugging. Non-Termux
platforms keep every existing default.
Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com>
* tui: make URLs clickable + hover-highlight in any terminal
Problem
-------
URLs printed by `hermes --tui` were not clickable in basic macOS Terminal.app.
Cmd+click did nothing, the cursor didn't change shape — like nothing was
detected — even though arrow buttons and other Box onClick handlers worked
fine.
Root cause
----------
Two layers of dead plumbing:
1. `<Link>` only emitted the underlying `<ink-link>` (which carries the
hyperlink metadata into the screen buffer) when `supportsHyperlinks()`
said yes. On Apple_Terminal that's false, so the per-cell hyperlink
field stayed empty, so `Ink.getHyperlinkAt()` had nothing to return on
click. The visible underline was just decorative.
2. `Ink.openHyperlink()` calls `this.onHyperlinkClick?.(url)`, but
`onHyperlinkClick` was never assigned anywhere in the codebase. The
click pipeline (`App.tsx → onOpenHyperlink → Ink.openHyperlink`) ran
but bailed silently on the optional chain.
Bonus discovery: even when wired up, there was no hover affordance —
terminal apps can't change the system mouse cursor, so users had no
visual signal that a cell was clickable. Arrow buttons in the chrome
worked because they had explicit `<Box onClick>` styling; inline link
URLs didn't.
Fix
---
- `Link.tsx`: always emit `<ink-link>` regardless of terminal capability.
The renderer's `wrapWithOsc8Link` already gates the actual OSC 8 escape
on `supportsHyperlinks()` further down — so terminals that don't
understand OSC 8 still don't see the escape, but the screen-buffer
metadata (which the click dispatcher reads) is now populated everywhere.
- `ink.tsx + root.ts`: add `onHyperlinkClick?: (url: string) => void` to
`Options` / `RenderOptions`, wire it to the existing `Ink.onHyperlinkClick`
field in the constructor.
- `src/lib/openExternalUrl.ts`: small platform-aware opener using
`child_process.spawn` with arg-array (no shell) — http(s) only, rejects
`file:`, `javascript:`, `data:`, etc., so a hostile model can't trigger
arbitrary local handlers via `<Link url="file:///...">`. Detached + stdio
ignore so closing the TUI doesn't kill the browser and Chrome stderr
doesn't leak into the alt screen.
- `entry.tsx`: pass `onHyperlinkClick: openExternalUrl` to `ink.render`.
- `hyperlinkHover.ts` + Ink hover wiring: track the URL under the pointer
in `Ink.hoveredHyperlink`, update it from `dispatchHover`, and inverse-
highlight every cell of the matching link in the render-pass overlay
(same pattern as `applySearchHighlight`). This is the cursor-hover
affordance for clickable links — terminals don't expose cursor shape,
so we light up the link itself.
- `types/hermes-ink.d.ts`: add `onHyperlinkClick` to the `RenderOptions`
shim so consumers (`entry.tsx`) type-check against the new option.
Tests
-----
- `src/lib/openExternalUrl.test.ts` (15 cases): http(s) accepted; file/js/
data/mailto/ftp/ssh rejected; macOS open(1), Windows cmd.exe start with
empty title slot, Linux xdg-open dispatch; shell-metacharacter URLs
pass through unmolested as a single argv element; synchronous spawn
failure returns false.
Verified empirically in Apple Terminal 455.1 (macOS 15.7.3): clicking a
URL opens in default browser, hovering inverts the link cells, and
moving away clears the highlight. Full TUI suite: 713 passing, 0
type errors.
Reverts
-------
The earlier attempt that version-gated Apple_Terminal in
`supports-hyperlinks.ts` was based on a wrong assumption — Terminal.app
silently strips OSC 8 sequences but does not render them as clickable
hyperlinks. Reverted to the original allowlist.
* tui: address Copilot review — explorer.exe on win32 + comment fixes
- openExternalUrl: switch win32 from `cmd.exe /c start` to `explorer.exe`.
cmd.exe's `start` builtin reparses the URL through cmd's tokenizer, so
`&`, `|`, `^`, `<`, `>` either split the command or get reinterpreted —
breaking both the protocol-allowlist safety story AND plain http(s) URLs
with `&` in query strings. `explorer.exe <url>` invokes the registered
protocol handler directly with no shell.
- openExternalUrl.test.ts: rename the win32 test to reflect the new
contract and add two regression tests — one with `&|^<>` metachars,
one with the common analytics-URL `&` query-param pattern — both pinned
to single-argv-element delivery via explorer.exe.
- Link.tsx: fix misleading comment. OSC 8 escapes are emitted
unconditionally by the renderer (`wrapWithOsc8Link` in
render-node-to-output.ts, `oscLink` in log-update.ts). Non-supporting
terminals silently strip the sequence, which is why hover/click
affordance has to come from the in-process overlay rather than the
terminal's own link rendering.
Verified: 715/715 tests pass, type-check + build clean.
* tui: address Copilot review #2 — async spawn errors + hover scope + docs
1. openExternalUrl: attach a no-op `'error'` listener on the spawned
child BEFORE unref(). spawn() returns a ChildProcess synchronously
even when the binary is missing (ENOENT on xdg-open / explorer.exe),
unreachable, or otherwise unusable; the failure surfaces later as
an 'error' event. An unhandled 'error' on an EventEmitter crashes
Node, which would tear down the whole TUI. The listener is a
deliberate no-op — we already returned `true` synchronously and the
user just doesn't see the browser pop.
2. openExternalUrl.test.ts: add a regression test using a real
EventEmitter to simulate the async-error path. Pins both the
listener-attached contract and the "doesn't throw on emit" behavior.
Was 17/17, now 18/18.
3. ink.tsx dispatchHover: bypass `getHyperlinkAt()` and read
`cellAt(...).hyperlink` directly. `getHyperlinkAt` falls back to
`findPlainTextUrlAt` for cells without an OSC 8 hyperlink, but the
render-pass overlay (`applyHyperlinkHoverHighlight`) only matches on
`cell.hyperlink === hoveredUrl` — so plain-text URLs would burn
re-renders without ever producing the highlight. Hover is now a
strictly 1:1 fit for what the overlay can paint. Plain-text URLs
still get the click action via the existing dispatch path.
4. root.ts + ink.tsx doc comments: replace the misleading "typically
`open` / `xdg-open` / `start` shell" wording with the actual safe
recipe — argv-array spawn into `open` / `xdg-open` / `explorer.exe`,
with an explicit warning that `cmd.exe /c start` reparses the URL
through cmd's tokenizer and is unsafe + breaks `&`-query URLs.
Verified: 716/716 tests pass, type-check + build clean.
* tui: address Copilot review #3 — hover damage, alt-screen cleanup, opener allowlist
1. ink.tsx onRender: stop folding steady-state hover into hlActive.
hlActive forces a full-screen damage diff so previous-frame inverted
cells get re-emitted when the highlight set changes. The transition
IS the trigger — enter / leave / change-to-other-link. While the
pointer just sits on a link the painted cells don't change and the
per-cell diff handles the no-op. Folding the steady state in would
burn a full-screen diff on every frame. Added a
lastRenderedHoveredHyperlink tracker and gate the hlActive bump on
`hovered !== lastRendered`.
2. ink.tsx setAltScreenActive: clear hoveredHyperlink (and the tracker)
when toggling alt-screen state. Hover dispatch is alt-screen-gated,
so once we leave there's no path to clear it. Without this, remounting
<AlternateScreen> would paint a phantom hover from the previous
session until the next mouse-move arrived.
3. openExternalUrl.ts openCommand: allowlist linux + the BSD family for
xdg-open and return null for everything else (aix, sunos, cygwin,
haiku, etc.). Previously the default-fallback always returned
xdg-open, which made the caller's `if (!command) return false` dead
and yielded a misleading `true` on platforms that probably don't
have xdg-open. New tests cover the null path AND the
openExternalUrl-returns-false-without-spawning behavior.
Verified: 718/718 tests pass, type-check + build clean.
* tui: address Copilot review #4 — doc comment accuracy
1. openExternalUrl return-value doc: now lists all three false paths
(URL rejected / no opener for platform / synchronous spawn throw)
plus a note that async 'error' events still return true because the
spawn was attempted.
2. ink.tsx onHyperlinkClick field doc: clarifies the callback receives
either an OSC 8 hyperlink OR a plain-text URL detected by
findPlainTextUrlAt — App.tsx routes both into the same callback.
3. hyperlinkHover applyHyperlinkHoverHighlight doc: drops the misleading
'caller forces full-frame damage' promise. Caller decides; for hover
the current caller only forces full damage on transitions.
No behavior change. 718/718 tests pass.
* tui: address Copilot review #5 — lint fixes
1. ink.tsx: reorder `./hyperlinkHover.js` import before `./screen.js` to
satisfy perfectionist/sort-imports.
2. Link.tsx: drop unused `fallback` parameter destructuring + the
trailing `void (null as ...)` dead-statement (would trip
no-unused-expressions). Kept `fallback?: ReactNode` on the Props
interface as a documented compat shim so existing call sites still
compile, with a comment explaining why it's no longer wired up.
3. openExternalUrl.test.ts: replace `typeof import('node:child_process').spawn`
inline annotations (forbidden by @typescript-eslint/consistent-type-imports)
with a `SpawnLike` type alias backed by a real `import type { spawn as SpawnFn }`.
No behavior change. 718/718 tests pass, type-check clean, lint clean on
all modified files.
When TUI exits, tmux captures some TUI output into its scrollback buffer.
On restart, stale scrollback content appears at the top of screen before
AlternateScreen takes over.
Add ANSI escape sequences at startup:
- ESC[2J clear visible screen
- ESC[H cursor home
- ESC[3J clear scrollback buffer
Reset sticky mouse/focus/paste terminal modes before the TUI starts and during graceful shutdown paths so stale tab state from prior crashes cannot poison the next session.
This PR groups the TUI fixes that restore macOS Terminal usability and clean up the theme/composer regressions:
- copy transcript selections on macOS drag-release so Terminal.app users can copy while mouse tracking is enabled
- copy composer selections on macOS drag-release; composer selection is internal to TextInput and does not use the global Ink selection bus
- keep IDE Cmd+C forwarding setup macOS-only, and make keybinding conflict checks respect simple when-clause overlap/negation
- force truecolor before chalk initializes (unless NO_COLOR / FORCE_COLOR / HERMES_TUI_TRUECOLOR opt-outs apply) so the default banner keeps its gold/amber/bronze gradient in Terminal.app
- move TUI surfaces onto semantic theme tokens and preserve skin prompt symbols as bare tokens with renderer-owned spacing
- render focused placeholders as dim hint text in TTY mode instead of inverse/selected-looking synthetic cursor text
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).
Implementation:
* lib/fpsStore.ts — nanostore atom updated from a trackFrame()
sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
trackFrame is undefined when SHOW_FPS is off so ink's onFrame
short-circuits at the optional chain.
* components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
when SHOW_FPS is off (React skips the subtree entirely).
* entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
trackFrame (fps) so both flags can coexist. When both are off,
onFrame is undefined and ink never attaches the handler.
* appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
aligned Box below the composer, conditional on SHOW_FPS.
Usage:
HERMES_TUI_FPS=1 hermes --tui
# bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red)
Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.
126 files post-compile with React Compiler; 352 tests still pass.
Extends HERMES_DEV_PERF to capture the complete render pipeline, not
just React commits. Adds scripts/profile-tui.py to drive repeatable
hold-PageUp stress tests against a real long session.
perfPane.tsx:
Wires ink's onFrame callback (already plumbed through the fork) into
the same perf.log as the React.Profiler samples. Captures per-phase
timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch
optimize, stdout write) plus yoga counters (visited/measured/cache-
Hits/live) and patch counts per frame. Events are tagged
{src: 'react'|'frame'} so jq can split them. logFrameEvent is
undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach
the callback.
entry.tsx:
Passes logFrameEvent into render().
types/hermes-ink.d.ts:
Declares FrameEvent + onFrame on RenderOptions so the ui-tui side
type-checks against the plumbed-through ink option.
scripts/profile-tui.py:
New harness. Launches the built TUI under a PTY with the longest
session in state.db resumed, holds PageUp/PageDown/etc at a
configurable Hz for N seconds, then parses perf.log and prints
per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps
beyond stdlib. Exit 2 if nothing was captured (wiring broken).
Initial findings (1106-msg session, 6s PageUp hold at 30Hz):
- Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms
- 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput
- One pathological 97ms frame with yoga measuring 70,415 text cells
and Yoga visiting 225k nodes — the cold-unmeasured-region hit
- Ink's scroll fast-path (DECSTBM blit from prevScreen) is
disqualified because our spacer-based virtual history doesn't
keep heightDelta in sync with scroll.delta, so every PageUp step
falls through to a full 2000-4800 patch re-render instead of ~40
- entry.tsx no longer writes bootBanner() to the main screen before the
alt-screen enters. The <Banner> renders inside the alt screen via the
seeded intro row, so nothing is lost — just the flash that preceded it.
Fixes the torn first frame reported on Alacritty (blitz row 5 #17) and
shaves the 'starting agent' hang perception (row 5 #1) since the UI
paints straight into the steady-state view
- AlternateScreen prefixes ERASE_SCROLLBACK (\x1b[3J) to its entry so
strict emulators start from a pristine grid; named constants replace
the inline sequences for clarity
- bootBanner.ts deleted — dead code
KISS/DRY sweep — drops ~90 LOC with no behavior change.
- circularBuffer: drop unused pushAll/toArray/size; fold toArray into drain
- gracefulExit: inline Cleanup type + failsafe const; signal→code as a
record instead of nested ternary; drop dead .catch on Promise.allSettled;
drop unused forceExit
- memory: inline heapDumpRoot() + writeSnapshot() (single-use); collapse
the two fd/smaps try/catch blocks behind one `swallow` helper; build
potentialLeaks functionally (array+filter) instead of imperative
push-chain; UNITS at file bottom
- memoryMonitor: inline DEFAULTS; drop unused onSnapshot; collapse
dumpedHigh/dumpedCritical bools to a single Set; single callback
dispatch line instead of duplicated if-chains
- entry.tsx: factor `dumpNotice` formatter (used twice by onHigh +
onCritical)
- useMainApp resize debounce: drop redundant `if (timer)` guards
(clearTimeout(undefined) is a no-op); init as undefined not null
- useVirtualHistory: trim wall-of-text comment to one-line intent; hoist
`const n = items.length`; split comma-declared lets; remove the
`;[start, end] = frozenRange` destructure in favor of direct Math.min
clamps; hoist `hi` init in upperBound for consistency
Validation: tsc clean (both configs), eslint clean on touched files,
vitest 102/102, build produces shebang-preserved dist/entry.js,
performHeapDump smoke-test still writes valid snapshot + diagnostics.
Long TUI sessions were crashing Node via V8 fatal-OOM once transcripts +
reasoning blobs crossed the default 1.5–4GB heap cap. This adds defense
in depth: a bigger heap, leak-proofing the RPC hot path, bounded
diagnostic buffers, automatic heap dumps at high-water marks, and
graceful signal / uncaught handlers.
## Changes
### Heap budget
- hermes_cli/main.py: `_launch_tui` now injects `NODE_OPTIONS=
--max-old-space-size=8192 --expose-gc` (appended — does not clobber
user-supplied NODE_OPTIONS). Covers both `node dist/entry.js` and
`tsx src/entry.tsx` launch paths.
- ui-tui/src/entry.tsx: shebang rewritten to
`#!/usr/bin/env -S node --max-old-space-size=8192 --expose-gc` as a
fallback when the binary is invoked directly.
### GatewayClient (ui-tui/src/gatewayClient.ts)
- `setMaxListeners(0)` — silences spurious warnings from React hook
subscribers.
- `logs` and `bufferedEvents` replaced with fixed-capacity
CircularBuffer — O(1) push, no splice(0, …) copies under load.
- RPC timeout refactor: `setTimeout(this.onTimeout.bind(this), …, id)`
replaces the inline arrow closure that captured `method`/`params`/
`resolve`/`reject` for the full 120 s request timeout. Each Pending
record now stores its own timeout handle, `.unref()`'d so stuck
timers never keep the event loop alive, and `rejectPending()` clears
them (previously leaked the timer itself).
### Memory diagnostics (new)
- ui-tui/src/lib/memory.ts: `performHeapDump()` +
`captureMemoryDiagnostics()`. Writes heap snapshot + JSON diag
sidecar to `~/.hermes/heapdumps/` (override via
`HERMES_HEAPDUMP_DIR`). Diagnostics are written first so we still get
useful data if the snapshot crashes on very large heaps.
Captures: detached V8 contexts (closure-leak signal), active
handles/requests (`process._getActiveHandles/_getActiveRequests`),
Linux `/proc/self/fd` count + `/proc/self/smaps_rollup`, heap growth
rate (MB/hr), and auto-classifies likely leak sources.
- ui-tui/src/lib/memoryMonitor.ts: 10 s interval polling heapUsed. At
1.5 GB writes an auto heap dump (trigger=`auto-high`); at 2.5 GB
writes a final dump and exits 137 before V8 fatal-OOMs so the user
can restart cleanly. Handle is `.unref()`'d so it never holds the
process open.
### Graceful exit (new)
- ui-tui/src/lib/gracefulExit.ts: SIGINT/SIGTERM/SIGHUP run registered
cleanups through a 4 s failsafe `setTimeout` that hard-exits if
cleanup hangs.
`uncaughtException` / `unhandledRejection` are logged to stderr
instead of crashing — a transient TUI render error should not kill
an in-flight agent turn.
### Slash commands (new)
- ui-tui/src/app/slash/commands/debug.ts:
- `/heapdump` — manual snapshot + diagnostics.
- `/mem` — live heap / rss / external / array-buffer / uptime panel.
- Registered in `ui-tui/src/app/slash/registry.ts`.
### Utility (new)
- ui-tui/src/lib/circularBuffer.ts: small fixed-capacity ring buffer
with `push` / `tail(n)` / `drain()` / `clear()`. Replaces the ad-hoc
`array.splice(0, len - MAX)` pattern.
## Validation
- tsc `--noEmit` clean
- `vitest run`: 15 files, 102 tests passing
- eslint clean on all touched/new files
- build produces executable `dist/entry.js` with preserved shebang
- smoke-tested: `HERMES_HEAPDUMP_DIR=… performHeapDump('manual')`
writes both a valid `.heapsnapshot` and a `.diagnostics.json`
containing detached-contexts, active-handles, smaps_rollup.
## Env knobs
- `HERMES_HEAPDUMP_DIR` — override snapshot output dir
- `HERMES_HEAPDUMP_ON_START=1` — dump once at boot
- existing `NODE_OPTIONS` is respected and appended, not replaced
Dynamic-importing @hermes/ink + App costs ~170ms on cold start — during
that window the terminal was blank. Now `entry.tsx` writes a raw-ANSI
banner to stdout immediately after the TTY check, using hardcoded
DEFAULT_THEME colors. Ink's `<AlternateScreen>` wipes the normal-screen
buffer when it mounts, so the boot banner is replaced seamlessly by the
real React render a moment later — no double-banner, no flash.
T=2ms banner visible (vs. ~170ms before)
T=~170ms React + Ink mounts
T=~200ms alt screen takes over, Banner component repaints
Palette drift between `bootBanner.ts` and the live theme is harmless —
the live render overrides after ~200ms. Narrow terminals (cols < 98)
fall back to the one-line "⚕ NOUS HERMES" marker.
Before: entry.tsx imports @hermes/ink (394KB bundle) + App + GatewayClient
in declaration order, then calls `gw.start()` at ~T=220ms. Python fork +
server.py import starts then.
After: only `GatewayClient` is statically imported (5ms, node builtins
only). `gw.start()` fires at ~T=5ms. @hermes/ink + App load in parallel
via `Promise.all(import(...))`. Python gets ~215ms of free runway to do
its own module import before node even finishes loading.
Net: session.info arrives ~150ms earlier in cold start. First React frame
timing is unchanged (still ~240ms — still gated by ink+app imports).
Removed a previously-tried warm-thread in server.py that pre-imported
`run_agent` in the background. Measured variance showed occasional
5-10s outliers (GIL thrashing); median gain was <100ms. Not worth the
non-determinism.
- Switch tsconfig to nodenext module resolution for Node 22 (used by
installer script)
- Add shebang to entry.tsx, preserved into index.js
- Add HERMES_ROOT env var fallback for repo root resolution