hermes-agent/hermes_cli
Brooklyn Nicholson 0785aec444 fix(tui): harden against Node V8 OOM + GatewayClient memory leaks
Long TUI sessions were crashing Node via V8 fatal-OOM once transcripts +
reasoning blobs crossed the default 1.5–4GB heap cap. This adds defense
in depth: a bigger heap, leak-proofing the RPC hot path, bounded
diagnostic buffers, automatic heap dumps at high-water marks, and
graceful signal / uncaught handlers.

## Changes

### Heap budget
- hermes_cli/main.py: `_launch_tui` now injects `NODE_OPTIONS=
  --max-old-space-size=8192 --expose-gc` (appended — does not clobber
  user-supplied NODE_OPTIONS). Covers both `node dist/entry.js` and
  `tsx src/entry.tsx` launch paths.
- ui-tui/src/entry.tsx: shebang rewritten to
  `#!/usr/bin/env -S node --max-old-space-size=8192 --expose-gc` as a
  fallback when the binary is invoked directly.

### GatewayClient (ui-tui/src/gatewayClient.ts)
- `setMaxListeners(0)` — silences spurious warnings from React hook
  subscribers.
- `logs` and `bufferedEvents` replaced with fixed-capacity
  CircularBuffer — O(1) push, no splice(0, …) copies under load.
- RPC timeout refactor: `setTimeout(this.onTimeout.bind(this), …, id)`
  replaces the inline arrow closure that captured `method`/`params`/
  `resolve`/`reject` for the full 120 s request timeout. Each Pending
  record now stores its own timeout handle, `.unref()`'d so stuck
  timers never keep the event loop alive, and `rejectPending()` clears
  them (previously leaked the timer itself).

### Memory diagnostics (new)
- ui-tui/src/lib/memory.ts: `performHeapDump()` +
  `captureMemoryDiagnostics()`. Writes heap snapshot + JSON diag
  sidecar to `~/.hermes/heapdumps/` (override via
  `HERMES_HEAPDUMP_DIR`). Diagnostics are written first so we still get
  useful data if the snapshot crashes on very large heaps.
  Captures: detached V8 contexts (closure-leak signal), active
  handles/requests (`process._getActiveHandles/_getActiveRequests`),
  Linux `/proc/self/fd` count + `/proc/self/smaps_rollup`, heap growth
  rate (MB/hr), and auto-classifies likely leak sources.
- ui-tui/src/lib/memoryMonitor.ts: 10 s interval polling heapUsed. At
  1.5 GB writes an auto heap dump (trigger=`auto-high`); at 2.5 GB
  writes a final dump and exits 137 before V8 fatal-OOMs so the user
  can restart cleanly. Handle is `.unref()`'d so it never holds the
  process open.

### Graceful exit (new)
- ui-tui/src/lib/gracefulExit.ts: SIGINT/SIGTERM/SIGHUP run registered
  cleanups through a 4 s failsafe `setTimeout` that hard-exits if
  cleanup hangs.
  `uncaughtException` / `unhandledRejection` are logged to stderr
  instead of crashing — a transient TUI render error should not kill
  an in-flight agent turn.

### Slash commands (new)
- ui-tui/src/app/slash/commands/debug.ts:
  - `/heapdump` — manual snapshot + diagnostics.
  - `/mem` — live heap / rss / external / array-buffer / uptime panel.
- Registered in `ui-tui/src/app/slash/registry.ts`.

### Utility (new)
- ui-tui/src/lib/circularBuffer.ts: small fixed-capacity ring buffer
  with `push` / `tail(n)` / `drain()` / `clear()`. Replaces the ad-hoc
  `array.splice(0, len - MAX)` pattern.

## Validation

- tsc `--noEmit` clean
- `vitest run`: 15 files, 102 tests passing
- eslint clean on all touched/new files
- build produces executable `dist/entry.js` with preserved shebang
- smoke-tested: `HERMES_HEAPDUMP_DIR=… performHeapDump('manual')`
  writes both a valid `.heapsnapshot` and a `.diagnostics.json`
  containing detached-contexts, active-handles, smaps_rollup.

## Env knobs
- `HERMES_HEAPDUMP_DIR` — override snapshot output dir
- `HERMES_HEAPDUMP_ON_START=1` — dump once at boot
- existing `NODE_OPTIONS` is respected and appended, not replaced
2026-04-20 18:58:44 -05:00
..
__init__.py chore: release v0.10.0 (2026.4.16) (#11209) 2026-04-16 12:53:06 -07:00
auth.py fix(auth): use ssl.SSLContext for CA bundle instead of deprecated string path (#12706) 2026-04-19 22:44:35 -07:00
auth_commands.py fix(auth): restore --label for hermes auth add nous --type oauth 2026-04-17 19:13:40 -07:00
backup.py fix(backup): handle files with pre-1980 timestamps 2026-04-20 00:47:40 -07:00
banner.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
callbacks.py fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902) 2026-04-14 16:11:37 -07:00
claw.py fix: unify OpenClaw detection, add isatty guard, fix print_warning import 2026-04-12 16:40:37 -07:00
cli_output.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
clipboard.py feat: fix img pasting in new ink plus newline after tools 2026-04-11 13:14:32 -05:00
codex_models.py fix: remove codex spark model support 2026-04-20 04:51:44 -07:00
colors.py feat: respect NO_COLOR env var and TERM=dumb (#4079) 2026-03-30 17:07:21 -07:00
commands.py fix: publish plugin slash commands in Telegram menu 2026-04-20 05:11:39 -07:00
completion.py fix: preserve profile name completion in dynamic shell completion 2026-04-14 10:45:42 -07:00
config.py fix(cron): run due jobs in parallel to prevent serial tick starvation (#13021) 2026-04-20 11:53:07 -07:00
copilot_auth.py fix(copilot): resolve GHE token poisoning when GITHUB_TOKEN is set 2026-04-13 05:12:36 -07:00
cron.py feat(cron): track delivery failures in job status (#6042) 2026-04-07 22:49:01 -07:00
curses_ui.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
debug.py fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843) 2026-04-17 18:46:30 -07:00
default_soul.py fix: reset default SOUL.md to baseline identity text (#3159) 2026-03-26 01:34:27 -07:00
dingtalk_auth.py test(dingtalk): cover QR device-flow auth + OpenClaw branding disclosure 2026-04-17 05:08:07 -07:00
doctor.py fix(doctor): update config validation for current auth.py API 2026-04-20 02:41:25 -07:00
dump.py refactor: remove smart_model_routing feature (#12732) 2026-04-19 18:12:55 -07:00
env_loader.py fix: detect and strip non-ASCII characters from API keys (#6843) 2026-04-14 20:20:31 -07:00
gateway.py fix(gateway): detect legacy hermes.service + mark --replace SIGTERM as planned (#11909) 2026-04-17 19:27:58 -07:00
logs.py feat: component-separated logging with session context and filtering (#7991) 2026-04-11 17:23:36 -07:00
main.py fix(tui): harden against Node V8 OOM + GatewayClient memory leaks 2026-04-20 18:58:44 -05:00
mcp_config.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
memory_setup.py fix(memory): discover user-installed memory providers from $HERMES_HOME/plugins/ (#10529) 2026-04-15 14:25:40 -07:00
model_normalize.py fix(copilot): normalize vendor-prefixed and dash-notation model IDs (#6879) (#11561) 2026-04-17 04:19:36 -07:00
model_switch.py fix(model_switch): register custom: slug in seen_slugs for Section 3 providers 2026-04-20 12:21:54 -07:00
models.py feat: add moonshotai/Kimi-K2.6 to HuggingFace provider models (#13169) 2026-04-20 12:49:16 -07:00
nous_subscription.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
pairing.py chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119) 2026-03-25 19:47:58 -07:00
platforms.py feat(gateway): unify QQBot branding, add PLATFORM_HINTS, fix streaming, restore missing setup functions 2026-04-14 00:11:49 -07:00
plugins.py fix: publish plugin slash commands in Telegram menu 2026-04-20 05:11:39 -07:00
plugins_cmd.py feat(plugins): make all plugins opt-in by default 2026-04-20 04:46:45 -07:00
profiles.py fix(gateway): fix discrepancies in gateway status 2026-04-17 18:58:29 -07:00
providers.py docs(providers): drop stale 'TODO: Phase 4' from get_provider docstring (#12902) 2026-04-20 01:41:27 -07:00
runtime_provider.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
setup.py Update env vars for openclaw migration 2026-04-20 14:56:04 -07:00
skills_config.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
skills_hub.py Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor 2026-04-17 08:59:33 -05:00
skin_engine.py Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor 2026-04-16 10:47:41 -05:00
status.py fix(gateway): fix discrepancies in gateway status 2026-04-17 18:58:29 -07:00
timeouts.py fix(config): add stale timeout settings 2026-04-20 00:52:50 -07:00
tips.py chore(release): add sjz-ks to AUTHOR_MAP 2026-04-20 03:04:06 -07:00
tools_config.py test(dingtalk): cover get_connected_platforms + null platform_toolsets 2026-04-17 06:26:18 -07:00
uninstall.py feat(uninstall): offer to remove named profiles when uninstalling from default 2026-04-18 19:18:13 -07:00
web_server.py Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
webhook.py feat(webhook): direct delivery mode for zero-LLM push notifications (#12473) 2026-04-19 05:18:19 -07:00