hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
WideLee	cf55c738e7	refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py - Replace async create_bind_task/poll_bind_result with synchronous httpx.Client equivalents, eliminating manual event loop management - Move _render_qr and full qr_register() entry-point into onboard.py, mirroring the Feishu onboarding pattern - Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines); call site becomes a single qr_register() import - Fix potential segfault: previous code called loop.close() in the EXPIRED branch and again in the finally block (double-close crashed under uvloop)	2026-04-22 05:50:21 -07:00
Teknium	b8663813b6	feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861 ) * feat(state): auto-prune old sessions + VACUUM state.db at startup state.db accumulates every session, message, and FTS5 index entry forever. A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM brought it to 43MB. The prune command and VACUUM are not wired to run automatically anywhere — sessions grew unbounded until users noticed. Changes: - hermes_state.py: new state_meta key/value table, vacuum() method, and maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in state_meta so it only actually executes once per min_interval_hours across all Hermes processes for a given HERMES_HOME. Never raises. - hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG (auto_prune=True, retention_days=90, vacuum_after_prune=True, min_interval_hours=24). Added to _KNOWN_ROOT_KEYS. - cli.py: call maintenance once at HermesCLI init (shared helper _run_state_db_auto_maintenance reads config and delegates to DB). - gateway/run.py: call maintenance once at GatewayRunner init. - Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section. Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays bloated forever. VACUUM only runs when the prune actually removed rows, so tight DBs don't pay the I/O cost. Tests: 10 new tests in tests/test_hermes_state.py covering state_meta, vacuum, idempotency, interval skipping, VACUUM-only-when-needed, corrupt-marker recovery. All 246 existing state/config/gateway tests still pass. Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG exposes the new block, load_config() returns it for fresh installs, first call prunes+vacuums, second call within min_interval_hours skips, and the state_meta marker persists across connection close/reopen. * sessions.auto_prune defaults to false (opt-in) Session history powers session_search recall across past conversations, so silently pruning on startup could surprise users. Ship the machinery disabled and let users opt in when they notice state.db is hurting performance. - DEFAULT_CONFIG.sessions.auto_prune: True → False - Call-site fallbacks in cli.py and gateway/run.py match the new default (so unmigrated configs still see off) - Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff	2026-04-22 05:21:49 -07:00
keifergu	8bcd77a9c2	feat(wecom): add QR scan flow and interactive setup wizard for bot credentials	2026-04-22 05:15:32 -07:00
hengm3467	c6b1ef4e58	feat: add Step Plan provider support (salvage #6005 ) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>	2026-04-22 02:59:58 -07:00
Teknium	ff9752410a	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 ) * feat(plugins): pluggable image_gen backends + OpenAI provider Adds a ImageGenProvider ABC so image generation backends register as bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner gains three primitives to make this work generically: - `kind:` manifest field (`standalone` \| `backend` \| `exclusive`). Bundled `kind: backend` plugins auto-load — no `plugins.enabled` incantation. User-installed backends stay opt-in. - Path-derived keys: `plugins/image_gen/openai/` gets key `image_gen/openai`, so a future `tts/openai` cannot collide. - Depth-2 recursion into category namespaces (parent dirs without a `plugin.yaml` of their own). Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5 default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64 responses save to `$HERMES_HOME/cache/images/`; URL responses pass through. FAL stays in-tree for this PR — a follow-up ports it into `plugins/image_gen/fal/` so the in-tree `image_generation_tool.py` slims down. The dispatch shim in `_handle_image_generate` only fires when `image_gen.provider` is explicitly set to a non-FAL value, so existing FAL setups are untouched. - 41 unit tests (scanner recursion, kind parsing, gate logic, registry, OpenAI payload shapes) - E2E smoke verified: bundled plugin autoloads, registers, and `_handle_image_generate` routes to OpenAI when configured * fix(image_gen/openai): don't send response_format to gpt-image-* The live API rejects it: 'Unknown parameter: response_format' (verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return b64_json unconditionally, so the parameter was both unnecessary and actively broken. * feat(image_gen/openai): gpt-image-2 only, drop legacy catalog gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21) and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 / dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward (dall-e-2 squares only). Trim the catalog down to a single model. Live-verified end-to-end: landscape 1536x1024 render of a Moog-style synth matches prompt exactly, 2.4MB PNG saved to cache. * feat(image_gen/openai): expose gpt-image-2 as three quality tiers Users pick speed/fidelity via the normal model picker instead of a hidden quality knob. All three tier IDs resolve to the single underlying gpt-image-2 API model with a different quality parameter: gpt-image-2-low ~15s fast iteration gpt-image-2-medium ~40s default gpt-image-2-high ~2min highest fidelity Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the same 1024x1024 prompt. Config: image_gen.openai.model: gpt-image-2-high # or image_gen.model: gpt-image-2-low # or env var for scripts/tests OPENAI_IMAGE_MODEL=gpt-image-2-medium Live-verified end-to-end with the low tier: 18.8s landscape render of a golden retriever in wildflowers, vision-confirmed exact match. * feat(tools_config): plugin image_gen providers inject themselves into picker 'hermes tools' → Image Generation now shows plugin-registered backends alongside Nous Subscription and FAL.ai without tools_config.py needing to know about them. OpenAI appears as a third option today; future backends appear automatically as they're added. Mechanism: - ImageGenProvider gains an optional get_setup_schema() hook (name, badge, tag, env_vars). Default derived from display_name. - tools_config._plugin_image_gen_providers() pulls the schemas from every registered non-FAL plugin provider. - _visible_providers() appends those rows when rendering the Image Generation category. - _configure_provider() handles the new image_gen_plugin_name marker: writes image_gen.provider and routes to the plugin's list_models() catalog for the model picker. - _toolset_needs_configuration_prompt('image_gen') stops demanding a FAL key when any plugin provider reports is_available(). FAL is skipped in the plugin path because it already has hardcoded TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up PR the hardcoded rows go away and it surfaces through the same path as OpenAI. Verified live: picker shows Nous Subscription / FAL.ai / OpenAI. Picking OpenAI prompts for OPENAI_API_KEY, then shows the gpt-image-2-low/medium/high model picker sourced from the plugin. 397 tests pass across plugins/, tools_config, registry, and picker. * fix(image_gen): close final gaps for plugin-backend parity with FAL Two small places that still hardcoded FAL: - hermes_cli/setup.py status line: an OpenAI-only setup showed 'Image Generation: missing FAL_KEY'. Now probes plugin providers and reports '(OpenAI)' when one is_available() — or falls back to 'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured. - image_generate tool schema description: said 'using FAL.ai, default FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are user-configured' — and notes the 'image' field can be a URL or an absolute path, which the gateway delivers either way via extract_local_files().	2026-04-21 21:30:10 -07:00
Teknium	d1acf17773	feat(models): add minimax/minimax-m2.5:free to OpenRouter catalog (#13836 ) Surfaces the free variant alongside the paid minimax-m2.5 entry in both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter provider model list.	2026-04-21 21:27:40 -07:00
Teknium	7b79e0f4c9	chore(models): drop 3 models from nous portal recommended list (#13822 ) Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free, and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and arcee-thinking variants remain.	2026-04-21 21:10:20 -07:00
emozilla	29693f9d8e	feat(aux): use Portal /api/nous/recommended-models for auxiliary models Wire the auxiliary client (compaction, vision, session search, web extract) to the Nous Portal's curated recommended-models endpoint when running on Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for pricing. hermes_cli/models.py - fetch_nous_recommended_models(portal_base_url, force_refresh=False) 10-minute TTL cache, keyed per portal URL (staging vs prod don't collide). Public endpoint, no auth required. Returns {} on any failure so callers always get a dict. - get_nous_recommended_aux_model(vision, free_tier=None, ...) Tier-aware pick from the payload: - Paid tier → paidRecommended{Vision,Compaction}Model, falling back to freeRecommended* when the paid field is null (common during staged rollouts of new paid models). - Free tier → freeRecommended* only, never leaks paid models. When free_tier is None, auto-detects via the existing check_nous_free_tier() helper (already cached 3 min against /api/oauth/account). Detection errors default to paid so we never silently downgrade a paying user. agent/auxiliary_client.py — _try_nous() - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call to get_nous_recommended_aux_model(vision=vision). - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable or returns a null recommendation. - The Portal is now the source of truth for aux model selection; the xiaomi allowlist we used to carry is effectively dead. Tests (15 new) - tests/hermes_cli/test_models.py::TestNousRecommendedModels Fetch caching, per-portal keying, network failure, force_refresh; paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid, auto-detect, detection-error → paid default, null/blank modelName handling. - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh _try_nous honors Portal recommendation for text + vision, falls back to google/gemini-3-flash-preview on None or exception. Behavior won't visibly change today — both tier recommendations currently point at google/gemini-3-flash-preview — but the moment the Portal ships a better paid recommendation, subscribers pick it up within 10 minutes without a Hermes release.	2026-04-21 20:35:16 -07:00
emozilla	c22f4a76de	remove Nous Portal free-model allowlist Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call sites. Whatever Nous Portal prices as free now shows up in the picker as-is — no local allowlist gatekeeping. Free-tier partitioning (paid vs free in the menu) still runs via partition_nous_models_by_tier.	2026-04-21 20:35:16 -07:00
Teknium	b2ba351380	fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches: - KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1). The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK appends /v1/messages internally. /coding/v1 + SDK suffix produced /coding/v1/v1/messages (a 404). /coding + SDK suffix now yields /coding/v1/messages correctly. - kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are already redirected to api.kimi.com/coding via _resolve_kimi_base_url. - doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0) and rewrite /coding base URLs to /coding/v1 for the /models health check (Anthropic surface has no /models). - test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var. E2E verified: sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic) sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI) UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>	2026-04-21 19:48:39 -07:00
王强	6caf8bd994	fix: Enhance Kimi Coding API mode detection and User-Agent	2026-04-21 19:48:39 -07:00
王强	2a026eb762	fix: Update Kimi Coding API endpoint and User-Agent	2026-04-21 19:48:39 -07:00
王强	bad5471409	fix(kimi-coding): add KIMI_CODING_API_KEY fallback + api_mode detection for /coding endpoint	2026-04-21 19:48:39 -07:00
王强	fd403854b9	fix: auto-detect anthropic_messages mode for Kimi /coding/v1 endpoints	2026-04-21 19:48:39 -07:00
Teknium	8f167e8791	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 ) A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at 4000 chars, causing long inputs to be silently chopped even though the underlying APIs allow much more: - OpenAI: 4096 - xAI: 15000 - MiniMax: 10000 - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware) - Gemini: ~5000 - Edge: ~5000 The schema description also told the model 'Keep under 4000 characters', which encouraged the agent to self-chunk long briefs into multiple TTS calls (producing 3 separate audio files instead of one). New behavior: - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH encode the documented per-provider limits. - _resolve_max_text_length(provider, cfg) resolves: 1. tts.<provider>.max_text_length user override 2. ElevenLabs model_id lookup 3. provider default 4. 4000 fallback - text_to_speech_tool() and stream_tts_to_speaker() both call the resolver; old MAX_TEXT_LENGTH alias kept for back-compat. - Schema description no longer hardcodes 4000. Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253 voice-command/voice-cli tests still pass.	2026-04-21 17:49:39 -07:00
brooklyn!	3e198f37c9	Merge pull request #13641 from NousResearch/bb/tui-at-folder-filter fix(tui): @folder: / @file: completions respect the explicit prefix	2026-04-21 16:33:30 -05:00
pefontana	48ecb98f8a	feat(delegate): orchestrator role and configurable spawn depth (default flat) Adds role='leaf'\|'orchestrator' to delegate_task. With max_spawn_depth>=2, an orchestrator child retains the 'delegation' toolset and can spawn its own workers; leaf children cannot delegate further (identical to today). Default posture is flat — max_spawn_depth=1 means a depth-0 parent's children land at the depth-1 floor and orchestrator role silently degrades to leaf. Users opt into nested delegation by raising max_spawn_depth to 2 or 3 in config.yaml. Also threads acp_command/acp_args through the main agent loop's delegate dispatch (previously silently dropped in the schema) via a new _dispatch_delegate_task helper, and adds a DelegateEvent enum with legacy-string back-compat for gateway/ACP/CLI progress consumers. Config (hermes_cli/config.py defaults): delegation.max_concurrent_children: 3 # floor-only, no upper cap delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested delegation.orchestrator_enabled: true # global kill switch Salvaged from @pefontana's PR #11215. Overrides vs. the original PR: concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only, no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which silently enabled one level of orchestration for every user). Co-authored-by: pefontana <fontana.pedro93@gmail.com>	2026-04-21 14:23:45 -07:00
Brooklyn Nicholson	9d9db1e910	fix(tui): @folder: only yields directories, @file: only yields files Reported during TUI v2 blitz testing: typing `@folder:` in the composer pulled up .dockerignore, .env, .gitignore, and every other file in the cwd alongside the actual directories. The completion loop yielded every entry regardless of the explicit prefix and auto-rewrote each completion to @file: vs @folder: based on is_dir — defeating the user's choice. Also fixed a pre-existing adjacent bug: a bare `@file:` or `@folder:` (no path) used expanded=="." as both search_dir AND match_prefix, filtering the list to dotfiles only. When expanded is empty or ".", search in cwd with no prefix filter. - want_dir = prefix == "@folder:" drives an explicit is_dir filter - preserve the typed prefix in completion text instead of rewriting - three regression tests cover: folder-only, file-only, and the bare- prefix case where completions keep the `@folder:` prefix	2026-04-21 14:31:48 -05:00
Austin Pickett	b2111a2b45	Merge pull request #13526 from NousResearch/feat/dashboard-action-buttons feat: add buttons to update hermes and restart gateway	2026-04-21 08:40:26 -07:00
Teknium	244ae6db15	fix(web_server,whatsapp-bridge): validate Host header against bound interface (#13530 ) DNS rebinding attack: a victim browser that has the dashboard (or the WhatsApp bridge) open could be tricked into fetching from an attacker-controlled hostname that TTL-flips to 127.0.0.1. Same-origin and CORS checks don't help — the browser now treats the attacker origin as same-origin with the local service. Validating the Host header at the app layer rejects any request whose Host isn't one we bound for. Changes: hermes_cli/web_server.py: - New host_header_middleware runs before auth_middleware. Reads app.state.bound_host (set by start_server) and rejects requests whose Host header doesn't match the bound interface with HTTP 400. - Loopback binds accept localhost / 127.0.0.1 / ::1. Non-loopback binds require exact match. 0.0.0.0 binds skip the check (explicit --insecure opt-in; no app-layer defence possible). - IPv6 bracket notation parsed correctly: [::1] and [::1]:9119 both accepted. scripts/whatsapp-bridge/bridge.js: - Express middleware rejects non-loopback Host headers. Bridge already binds 127.0.0.1-only, this adds the complementary app-layer check for DNS rebinding defence. Tests: 8 new in tests/hermes_cli/test_web_server_host_header.py covering loopback/non-loopback/zero-zero binds, IPv6 brackets, case insensitivity, and end-to-end middleware rejection via TestClient. Reported in GHSA-ppp5-vxwm-4cf7 by @bupt-Yy-young. Hardening — not CVE per SECURITY.md §3. The dashboard's main trust boundary is the loopback bind + session token; DNS rebinding defeats the bind assumption but not the token (since the rebinding browser still sees a first-party fetch to 127.0.0.1 with the token-gated API). Host-header validation adds the missing belt-and-braces layer.	2026-04-21 06:26:35 -07:00
Teknium	7fc1e91811	security(runtime_provider): close OLLAMA_API_KEY substring-leak sweep miss (#13522 ) Two call sites still used a raw substring check to identify ollama.com: hermes_cli/runtime_provider.py:496: _is_ollama_url = "ollama.com" in base_url.lower() run_agent.py:6127: if fb_base_url_hint and "ollama.com" in fb_base_url_hint.lower() ... Same bug class as GHSA-xf8p-v2cg-h7h5 (OpenRouter substring leak), which was fixed in commit `dbb7e00e` via base_url_host_matches() across the codebase. The earlier sweep missed these two Ollama sites. Self-discovered during April 2026 security-advisory triage; filed as GHSA-76xc-57q6-vm5m. Impact is narrow — requires a user with OLLAMA_API_KEY configured AND a custom base_url whose path or look-alike host contains 'ollama.com'. Users on default provider flows are unaffected. Filed as a draft advisory to use the private-fork flow; not CVE-worthy on its own. Fix is mechanical: replace substring check with base_url_host_matches at both sites. Same helper the rest of the codebase uses. Tests: 67 -> 71 passing. 7 new host-matcher cases in tests/test_base_url_hostname.py (path injection, lookalike host, localtest.me subdomain, ollama.ai TLD confusion, localhost, genuine ollama.com, api.ollama.com subdomain) + 4 call-site tests in tests/hermes_cli/test_runtime_provider_resolution.py verifying OLLAMA_API_KEY is selected only when base_url actually targets ollama.com. Fixes GHSA-76xc-57q6-vm5m	2026-04-21 06:06:16 -07:00
Austin Pickett	fc21c14206	feat: add buttons to update hermes and restart gateway	2026-04-21 09:01:23 -04:00
Teknium	3f72b2fe15	fix(/model): accept provider switches when /models is unreachable Gateway /model <name> --provider opencode-go (or any provider whose /models endpoint is down, 404s, or doesn't exist) silently failed. validate_requested_model returned accepted=False whenever fetch_api_models returned None, switch_model returned success=False, and the gateway never wrote _session_model_overrides — so the switch appeared to succeed in the error message flow but the next turn kept calling the old provider. The validator already had static-catalog fallbacks for MiniMax and Codex (providers without a /models endpoint). Extended the same pattern as the terminal fallback: when the live probe fails, consult provider_model_ids() for the curated catalog. Known models → accepted+recognized. Close typos → auto-corrected. Unknown models → soft-accepted with a 'Not in curated catalog' warning. Providers with no catalog at all → soft-accepted with a generic 'Note:' warning, finally honoring the in-code comment ('Accept and persist, but warn') that had been lying since it was written. Tests: 7 new tests in test_opencode_go_validation_fallback.py covering the catalog lookup, case-insensitive match, auto-correct, unknown-with-suggestion, unknown-without-suggestion, and no-catalog paths. TestValidateApiFallback in test_model_validation.py updated — its four 'rejected_when_api_down' tests were encoding exactly the bug being fixed.	2026-04-21 05:19:43 -07:00
Teknium	c6974043ef	refactor(acp): validate method_id against advertised provider in authenticate() (#13468 ) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * refactor(acp): validate method_id against advertised provider in authenticate() Previously authenticate() accepted any method_id whenever the server had provider credentials configured. This was not a vulnerability under the personal-assistant trust model (ACP is stdio-only, local-trust — anything that can reach the transport is already code-execution-equivalent to the user), but it was sloppy API hygiene: the advertised auth_methods list from initialize() was effectively ignored. Now authenticate() only returns AuthenticateResponse when method_id matches the currently-advertised provider (case-insensitive). Mismatched or missing method_id returns None, consistent with the no-credentials case. Raised by xeloxa via GHSA-g5pf-8w9m-h72x. Declined as a CVE (ACP transport is stdio, local-trust model), but the correctness fix is worth having on its own.	2026-04-21 03:39:55 -07:00
Teknium	2e722ee29a	fix(fal): extend whitespace-only FAL_KEY handling to all call sites Follow-up to PR #2504. The original fix covered the two direct FAL_KEY checks in image_generation_tool but left four other call sites intact, including the managed-gateway gate where a whitespace-only FAL_KEY falsely claimed 'user has direct FAL' and skipped the Nous managed gateway fallback entirely. Introduce fal_key_is_configured() in tools/tool_backend_helpers.py as a single source of truth (consults os.environ, falls back to .env for CLI-setup paths) and route every FAL_KEY presence check through it: - tools/image_generation_tool.py : _resolve_managed_fal_gateway, image_generate_tool's upfront check, check_fal_api_key - hermes_cli/nous_subscription.py : direct_fal detection, selected toolset gating, tools_ready map - hermes_cli/tools_config.py : image_gen needs-setup check Verified by extending tests/tools/test_image_generation_env.py and by E2E exercising whitespace + managed-gateway composition directly.	2026-04-21 02:04:21 -07:00
Teknium	4fea1769d2	feat(opencode-go): add Kimi K2.6 and Qwen3.5/3.6 Plus to curated catalog (#13429 ) OpenCode Go's published model list (opencode.ai/docs/go) includes kimi-k2.6, qwen3.5-plus, and qwen3.6-plus, but Hermes' curated lists didn't carry them. When the live /models probe fails during `hermes model`, users fell back to the stale curated list and had to type newer models via 'Enter custom model name'. Adds kimi-k2.6 (now first in the Go list), qwen3.6-plus, and qwen3.5-plus to both the model picker (hermes_cli/models.py) and setup defaults (hermes_cli/setup.py). All routed through the existing opencode-go chat_completions path — no api_mode changes needed.	2026-04-21 01:56:55 -07:00
Teknium	2c69b3eca8	fix(auth): unify credential source removal — every source sticks (#13427 ) Every credential source Hermes reads from now behaves identically on `hermes auth remove`: the pool entry stays gone across fresh load_pool() calls, even when the underlying external state (env var, OAuth file, auth.json block, config entry) is still present. Before this, auth_remove_command was a 110-line if/elif with five special cases, and three more sources (qwen-cli, copilot, custom config) had no removal handler at all — their pool entries silently resurrected on the next invocation. Even the handled cases diverged: codex suppressed, anthropic deleted-without-suppressing, nous cleared without suppressing. Each new provider added a new gap. What's new: agent/credential_sources.py — RemovalStep registry, one entry per source (env, claude_code, hermes_pkce, nous device_code, codex device_code, qwen-cli, copilot gh_cli + env vars, custom config). auth_remove_command dispatches uniformly via find_removal_step(). Changes elsewhere: agent/credential_pool.py — every upsert in _seed_from_env, _seed_from_singletons, and _seed_custom_pool now gates on is_source_suppressed(provider, source) via a shared helper. hermes_cli/auth_commands.py — auth_remove_command reduced to 25 lines of dispatch; auth_add_command now clears ALL suppressions for the provider on re-add (was env:* only). Copilot is special: the same token is seeded twice (gh_cli via _seed_from_singletons + env:<VAR> via _seed_from_env), so removing one entry without suppressing the other variants lets the duplicate resurrect. The copilot RemovalStep suppresses gh_cli + all three env variants (COPILOT_GITHUB_TOKEN, GH_TOKEN, GITHUB_TOKEN) at once. Tests: 11 new unit tests + 4059 existing pass. 12 E2E scenarios cover every source in isolated HERMES_HOME with simulated fresh processes.	2026-04-21 01:52:49 -07:00
Teknium	b341b19fff	fix(auth): hermes auth remove sticks for shell-exported env vars (#13418 ) Removing an env-seeded credential only cleared ~/.hermes/.env and the current process's os.environ, leaving shell-exported vars (shell profile, systemd EnvironmentFile, launchd plist) to resurrect the entry on the next load_pool() call. This matched the pre-#11485 codex behaviour. Now we suppress env:<VAR> in auth.json on remove, gate _seed_from_env() behind is_source_suppressed(), clear env:* suppressions on auth add, and print a diagnostic pointing at the shell when the var lives there. Applies to every env:* seeded credential (xai, deepseek, moonshot, zai, nvidia, openrouter, anthropic, etc.), not just xai. Reported by @teknium1 from community user 'Artificial Brain' — couldn't remove their xAI key via hermes auth remove.	2026-04-21 01:34:50 -07:00
Teknium	2d7ff9c5bd	feat(tts): complete KittenTTS integration (tools/setup/docs/tests) Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed	2026-04-21 01:28:32 -07:00
alt-glitch	c312e8ecf5	fix(update): keep get_hermes_home late-bound in _install_hangup_protection Follow-up to the redundant-imports sweep. _install_hangup_protection used to import get_hermes_home locally; the sweep hoisted it to the module-level binding already present at line 164. test_non_fatal_if_log_setup_fails monkeypatches hermes_cli.config.get_hermes_home to raise, which only works when the function late-binds its lookup. The hoisted version captures the reference at import time and bypasses the monkeypatch. Restore the local import (with a distinct local alias) so the test seam works and the stdio-untouched-on-setup-failure invariant is actually exercised.	2026-04-21 00:50:58 -07:00
alt-glitch	28b3f49aaa	refactor: remove remaining redundant local imports (comprehensive sweep) Full AST-based scan of all .py files to find every case where a module or name is imported locally inside a function body but is already available at module level. This is the second pass — the first commit handled the known cases from the lint report; this one catches everything else. Files changed (19): cli.py — 16 removals: time as _time/_t/_tmod (×10), re / re as _re (×2), os as _os, sys, partial os from combo import, from model_tools import get_tool_definitions gateway/run.py — 8 removals: MessageEvent as _ME / MessageType as _MT (×3), os as _os2, MessageEvent+MessageType (×2), Platform, BasePlatformAdapter as _BaseAdapter run_agent.py — 6 removals: get_hermes_home as _ghh, partial (contextlib, os as _os), cleanup_vm, cleanup_browser, set_interrupt as _sif (×2), partial get_toolset_for_tool hermes_cli/main.py — 4 removals: get_hermes_home, time as _time, logging as _log, shutil hermes_cli/config.py — 1 removal: get_hermes_home as _ghome hermes_cli/runtime_provider.py — 1 removal: load_config as _load_bedrock_config hermes_cli/setup.py — 2 removals: importlib.util (×2) hermes_cli/nous_subscription.py — 1 removal: from hermes_cli.config import load_config hermes_cli/tools_config.py — 1 removal: from hermes_cli.config import load_config, save_config cron/scheduler.py — 3 removals: concurrent.futures, json as _json, from hermes_cli.config import load_config batch_runner.py — 1 removal: list_distributions as get_all_dists (kept print_distribution_info, not at top level) tools/send_message_tool.py — 2 removals: import os (×2) tools/skills_tool.py — 1 removal: logging as _logging tools/browser_camofox.py — 1 removal: from hermes_cli.config import load_config tools/image_generation_tool.py — 1 removal: import fal_client environments/tool_context.py — 1 removal: concurrent.futures gateway/platforms/bluebubbles.py — 1 removal: httpx as _httpx gateway/platforms/whatsapp.py — 1 removal: import asyncio tui_gateway/server.py — 2 removals: from datetime import datetime, import time All alias references (_time, _t, _tmod, _re, _os, _os2, _json, _ghh, _ghome, _sif, _ME, _MT, _BaseAdapter, _load_bedrock_config, _httpx, _logging, _log, get_all_dists) updated to use the top-level names.	2026-04-21 00:50:58 -07:00
alt-glitch	1010e5fa3c	refactor: remove redundant local imports already available at module level Sweep ~74 redundant local imports across 21 files where the same module was already imported at the top level. Also includes type fixes and lint cleanups on the same branch.	2026-04-21 00:50:58 -07:00
Teknium	328223576b	feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384 ) * feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates When a skill loads, the activation message now exposes the absolute skill directory and substitutes ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with bundled scripts can instruct the agent to run them by absolute path without an extra skill_view round-trip. Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md are pre-executed (with the skill directory as CWD) and their stdout is inlined into the message before the agent reads it. Off by default — enable via skills.inline_shell in config.yaml — because any snippet runs on the host without approval. Changes: - agent/skill_commands.py: template substitution, inline-shell expansion, absolute skill-dir header, supporting-files list now shows both relative and absolute forms. - hermes_cli/config.py: new skills.template_vars, skills.inline_shell, skills.inline_shell_timeout knobs. - tests/agent/test_skill_commands.py: coverage for header, both template tokens (present and missing session id), template_vars disable, inline-shell default-off, enabled, CWD, and timeout. - website/docs/developer-guide/creating-skills.md: documents the template tokens, the absolute-path header, and the opt-in inline shell with its security caveat. Validation: tests/agent/ 1591 passed (includes 9 new tests). E2E: loaded a real skill in an isolated HERMES_HOME; confirmed ${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID} resolves to the passed task_id, !`date` runs when opt-in is set, and stays literal when it isn't. * feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot bash login shells don't source ~/.bashrc, so tools that install themselves there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to the environment snapshot Hermes builds once per session. Under systemd or any context with a minimal parent env, that surfaces as 'node: command not found' in the terminal tool even though the binary is reachable from every interactive shell on the machine. Changes: - tools/environments/local.py: before the login-shell snapshot bootstrap runs, prepend guarded 'source <file>' lines for each resolved init file. Missing files are skipped, each source is wrapped with a '[ -r ... ] && . ... \|\| true' guard so a broken rc can't abort the bootstrap. - hermes_cli/config.py: new terminal.shell_init_files (explicit list, supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on) knobs. When shell_init_files is set it takes precedence; when it's empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced. - tests/tools/test_local_shell_init.py: 10 tests covering the resolver (auto-bashrc, missing file, explicit override, ~/${VAR} expansion, opt-out) and the prelude builder (quoting, guarded sourcing), plus a real-LocalEnvironment snapshot test that confirms exports in the init file land in subsequent commands' environment. - website/docs/reference/faq.md: documents the fix in Troubleshooting, including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh directly via shell_init_files. Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40 pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py 50/50 pass. E2E in an isolated HERMES_HOME: confirmed that a fake ~/.bashrc setting a marker var and PATH addition shows up in a real LocalEnvironment().execute() call, that auto_source_bashrc=false suppresses it, that an explicit shell_init_files entry wins over the auto default, and that a missing bashrc is silently skipped.	2026-04-21 00:39:19 -07:00
helix4u	b48ea41d27	feat(voice): add cli beep toggle	2026-04-21 00:29:29 -07:00
Teknium	9c0fc0b4e8	fix(whatsapp): remove shadowing shutil import in cmd_whatsapp (#13364 ) The re-pair branch had a redundant 'import shutil' inside cmd_whatsapp, which made shutil a function-local throughout the whole scope. The earlier 'shutil.which("npm")' call at the dependency-install step then crashed with UnboundLocalError before control ever reached the local import. shutil is already imported at module level (line 48), so the local import was dead code anyway. Drop it.	2026-04-21 00:12:44 -07:00
Teknium	b6b5acfc8e	fix(whatsapp): remove 120s timeout on bridge npm install (#13339 ) The WhatsApp bridge depends on @whiskeysockets/baileys pulled directly from a GitHub commit tarball, which on slower connections or when GitHub is sluggish routinely exceeds 120s. The hardcoded timeout surfaced as a raw TimeoutExpired traceback during 'hermes whatsapp' setup. Switch to the same pattern used by the TUI npm install at line ~945: no timeout, --no-fund/--no-audit/--progress=false to keep output clean, stderr captured and tailed on failure. Also resolve npm via shutil.which so missing Node.js gives a clean error instead of FileNotFoundError, and handle Ctrl+C cleanly. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-20 22:22:05 -07:00
Teknium	b4edf9e6be	refactor(ai-gateway): single source of truth for model catalog (#13304 ) Delete the stale literal `_PROVIDER_MODELS["ai-gateway"]` (gpt-5, gemini-2.5-pro, claude-4.5 — outdated the moment PR #13223 landed with its curated `AI_GATEWAY_MODELS` snapshot) and derive it from `AI_GATEWAY_MODELS` instead, so the picker tuples and the bare-id fallback catalog stay in sync automatically. Also fixes `get_default_model_for_provider('ai-gateway')` to return kimi-k2.6 (the curated recommendation) instead of claude-opus-4.6.	2026-04-20 22:21:21 -07:00
Teknium	dbb7e00e7e	fix: sweep remaining provider-URL substring checks across codebase Completes the hostname-hardening sweep — every substring check against a provider host in live-routing code is now hostname-based. This closes the same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen, ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI, Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI, and Anthropic. New helper: - utils.base_url_host_matches(base_url, domain) — safe counterpart to 'domain in base_url'. Accepts hostname equality and subdomain matches; rejects path segments, host suffixes, and prefix collisions. Call sites converted (real-code only; tests, optional-skills, red-teaming scripts untouched): run_agent.py (10 sites): - AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check) - header cascade for openrouter / copilot / kimi / qwen / chatgpt - interleaved-thinking trigger (openrouter + claude) - _is_openrouter_url(), _is_qwen_portal() - is_native_anthropic check - github-models-vs-copilot detection (3 sites) - reasoning-capable route gate (nousresearch, vercel, github) - codex-backend detection in API kwargs build - fallback api_mode Bedrock detection agent/auxiliary_client.py (7 sites): - extra-headers cascades in 4 distinct client-construction paths (resolve custom, resolve auto, OpenRouter-fallback-to-custom, _async_client_from_sync, resolve_provider_client explicit-custom, resolve_auto_with_codex) - _is_openrouter_client() base_url sniff agent/usage_pricing.py: - resolve_billing_route openrouter branch agent/model_metadata.py: - _is_openrouter_base_url(), Bedrock context-length lookup hermes_cli/providers.py: - determine_api_mode Bedrock heuristic hermes_cli/runtime_provider.py: - _is_openrouter_url flag for API-key preference (issues #420, #560) hermes_cli/doctor.py: - Kimi User-Agent header for /models probes tools/delegate_tool.py: - subagent Codex endpoint detection trajectory_compressor.py: - _detect_provider() cascade (8 providers: openrouter, nous, codex, zai, kimi-coding, arcee, minimax-cn, minimax) cli.py, gateway/run.py: - /model-switch cache-enabled hint (openrouter + claude) Bedrock detection tightened from 'bedrock-runtime in url' to 'hostname starts with bedrock-runtime. AND host is under amazonaws.com'. ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'. Tests: - tests/test_base_url_hostname.py extended with a base_url_host_matches suite (exact match, subdomain, path-segment rejection, host-suffix rejection, host-prefix rejection, empty-input, case-insensitivity, trailing dot). Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock, gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback, fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution, delegate, credential_pool, context_compressor, plus the 4 hostname test modules). 26-assertion E2E call-site verification across 6 modules passes.	2026-04-20 22:14:29 -07:00
Teknium	cecf84daf7	fix: extend hostname-match provider detection across remaining call sites Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the two openai/xai sites in run_agent.py. This finishes the sweep: the same substring-match false-positive class (e.g. https://api.openai.com.evil/v1, https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1) existed in eight more call sites, and the hostname helper was duplicated in two modules. - utils: add shared base_url_hostname() (single source of truth). - hermes_cli/runtime_provider, run_agent: drop local duplicates, import from utils. Reuse the cached AIAgent._base_url_hostname attribute everywhere it's already populated. - agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg selection to hostname equality. - run_agent: native-anthropic check in the Claude-style model branch and in the AIAgent init provider-auto-detect branch. - agent/model_metadata: Anthropic /v1/models context-length lookup. - hermes_cli/providers.determine_api_mode: anthropic / openai URL heuristics for custom/unknown providers (the /anthropic path-suffix convention for third-party gateways is preserved). - tools/delegate_tool: anthropic detection for delegated subagent runtimes. - hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint native-OpenAI detection (paired with deduping the repeated check into a single is_native_openai boolean per branch). Tests: - tests/test_base_url_hostname.py covers the helper directly (path-containing-host, host-suffix, trailing dot, port, case). - tests/hermes_cli/test_determine_api_mode_hostname.py adds the same regression class for determine_api_mode, plus a test that the /anthropic third-party gateway convention still wins. Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.	2026-04-20 22:14:29 -07:00
Aslaaen	5356797f1b	fix: restrict provider URL detection to exact hostname matches	2026-04-20 22:14:29 -07:00
Teknium	fdd0ecaf13	fix(env_loader): warn when non-ASCII stripped from credential env vars (#13300 ) Load-time sanitizer silently removed non-ASCII codepoints from any env var ending in _API_KEY / _TOKEN / _SECRET / _KEY, turning copy-paste artifacts (Unicode lookalikes, ZWSP, NBSP) into opaque provider-side API_KEY_INVALID errors. Warn once per key to stderr with the offending codepoints (U+XXXX) and guidance to re-copy from the provider dashboard.	2026-04-20 22:14:03 -07:00
jerilynzheng	f81c0394d0	fix: correct AI_GATEWAY_MODELS slugs to match Vercel's catalog The original list was copied from OpenRouter conventions and didn't match what Vercel actually hosts. Verified against the live /v1/models endpoint (266 models): - qwen/qwen3.6-plus → alibaba/qwen3.6-plus (Vercel hosts Qwen under alibaba/) - z-ai/glm-5.1 → zai/glm-5.1 (no hyphen) - x-ai/grok-4.20 → xai/grok-4.20-reasoning (no hyphen, picks reasoning variant) - google/gemini-3-flash-preview → google/gemini-3-flash (no -preview suffix) - moonshotai/kimi-k2.5 → moonshotai/kimi-k2.6 (newest available)	2026-04-20 21:02:28 -07:00
jerilynzheng	29f57ec954	feat: use Vercel's deep-link for ai-gateway API key creation prompt Vercel provides a d?to= redirect URL that routes users through their team picker to the AI Gateway API keys management page. Using this specific URL lands users directly on the "Create key" page instead of the generic AI Gateway dashboard.	2026-04-20 21:02:28 -07:00
jerilynzheng	5bb2d11b07	feat: auto-promote free Moonshot models to top of ai-gateway picker When the live Vercel AI Gateway catalog exposes a Moonshot model with zero input AND output pricing, it's promoted to position #1 as the recommended default — even if the exact ID isn't in the curated AI_GATEWAY_MODELS list. This enables dynamic discovery of new free Moonshot variants without requiring a PR to update curation. Paid Moonshot models are unaffected; falls back to the normal curated recommended tag when no free Moonshot is live.	2026-04-20 21:02:28 -07:00
jerilynzheng	ac26a460f9	feat: promote ai-gateway in provider picker ordering Moves Vercel AI Gateway from the bottom of the list to near the top, adjacent to other multi-model aggregators. The existing bottom position was a result of the list growing by appending new providers over time — the new position makes it more discoverable.	2026-04-20 21:02:28 -07:00
jerilynzheng	7004374404	feat: curated picker with live pricing for ai-gateway provider - Curated AI_GATEWAY_MODELS list in hermes_cli/models.py (OSS first, kimi-k2.5 as recommended default). - fetch_ai_gateway_models() filters the curated list against the live /v1/models catalog; falls back to the snapshot on network failure. - fetch_ai_gateway_pricing() translates Vercel's input/output field names to the prompt/completion shape the shared picker expects; carries input_cache_read / input_cache_write through unchanged. - get_pricing_for_provider() now handles ai-gateway. - _model_flow_ai_gateway() provides a guided URL prompt when no key is set and a pricing-column picker; routes ai-gateway to it instead of the generic api-key flow.	2026-04-20 21:02:28 -07:00
Peter Fontana	3988c3c245	feat: shell hooks — wire shell scripts as Hermes hook callbacks Users can declare shell scripts in config.yaml under a hooks: block that fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call, subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on stdout to block tool calls or inject context pre-LLM. Key design: - Registers closures on existing PluginManager._hooks dict — zero changes to invoke_hook() call sites - subprocess.run(shell=False) via shlex.split — no shell injection - First-use consent per (event, command) pair, persisted to allowlist JSON - Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept - hermes hooks list/test/revoke/doctor CLI subcommands - Adds subagent_stop hook event fired after delegate_task children exit - Claude Code compatible response shapes accepted Cherry-picked from PR #13143 by @pefontana.	2026-04-20 20:53:51 -07:00
mavrickdeveloper	1fdf9a730c	fix(tools): keep default-off toolsets disabled	2026-04-20 20:52:50 -07:00
Brooklyn Nicholson	e1ce7c6b1f	fix(tui): address PR #13231 review comments Six small fixes, all valid review feedback: - gatewayClient: onTimeout is now a class-field arrow so setTimeout gets a stable reference — no per-request bind allocation (the whole point of the original refactor). - memory: growth rate was lifetime average of rss/uptime, which reports phantom growth for stable processes. Now computed as delta since a module-load baseline (STARTED_AT). Sanity-checked: 0.00 MB/hr at steady-state, non-zero after an allocation. - hermes_cli: NODE_OPTIONS merge is now token-aware — respects a user-supplied --max-old-space-size (don't downgrade a deliberate 16GB setting) and avoids duplicating --expose-gc. - useVirtualHistory: if items shrink past the frozen range's start mid-freeze (/clear, compaction), drop the freeze and fall through to the normal range calc instead of collapsing to an empty mount. - circularBuffer: throw on non-positive capacity instead of silently producing NaN indices. - debug slash help: /heapdump mentions HERMES_HEAPDUMP_DIR override instead of hardcoding the default path. Validation: tsc clean, eslint clean, vitest 102/102, growth-rate smoke test confirms baseline=0 → post-alloc>0.	2026-04-20 19:09:09 -05:00
Brooklyn Nicholson	0785aec444	fix(tui): harden against Node V8 OOM + GatewayClient memory leaks Long TUI sessions were crashing Node via V8 fatal-OOM once transcripts + reasoning blobs crossed the default 1.5–4GB heap cap. This adds defense in depth: a bigger heap, leak-proofing the RPC hot path, bounded diagnostic buffers, automatic heap dumps at high-water marks, and graceful signal / uncaught handlers. ## Changes ### Heap budget - hermes_cli/main.py: `_launch_tui` now injects `NODE_OPTIONS= --max-old-space-size=8192 --expose-gc` (appended — does not clobber user-supplied NODE_OPTIONS). Covers both `node dist/entry.js` and `tsx src/entry.tsx` launch paths. - ui-tui/src/entry.tsx: shebang rewritten to `#!/usr/bin/env -S node --max-old-space-size=8192 --expose-gc` as a fallback when the binary is invoked directly. ### GatewayClient (ui-tui/src/gatewayClient.ts) - `setMaxListeners(0)` — silences spurious warnings from React hook subscribers. - `logs` and `bufferedEvents` replaced with fixed-capacity CircularBuffer — O(1) push, no splice(0, …) copies under load. - RPC timeout refactor: `setTimeout(this.onTimeout.bind(this), …, id)` replaces the inline arrow closure that captured `method`/`params`/ `resolve`/`reject` for the full 120 s request timeout. Each Pending record now stores its own timeout handle, `.unref()`'d so stuck timers never keep the event loop alive, and `rejectPending()` clears them (previously leaked the timer itself). ### Memory diagnostics (new) - ui-tui/src/lib/memory.ts: `performHeapDump()` + `captureMemoryDiagnostics()`. Writes heap snapshot + JSON diag sidecar to `~/.hermes/heapdumps/` (override via `HERMES_HEAPDUMP_DIR`). Diagnostics are written first so we still get useful data if the snapshot crashes on very large heaps. Captures: detached V8 contexts (closure-leak signal), active handles/requests (`process._getActiveHandles/_getActiveRequests`), Linux `/proc/self/fd` count + `/proc/self/smaps_rollup`, heap growth rate (MB/hr), and auto-classifies likely leak sources. - ui-tui/src/lib/memoryMonitor.ts: 10 s interval polling heapUsed. At 1.5 GB writes an auto heap dump (trigger=`auto-high`); at 2.5 GB writes a final dump and exits 137 before V8 fatal-OOMs so the user can restart cleanly. Handle is `.unref()`'d so it never holds the process open. ### Graceful exit (new) - ui-tui/src/lib/gracefulExit.ts: SIGINT/SIGTERM/SIGHUP run registered cleanups through a 4 s failsafe `setTimeout` that hard-exits if cleanup hangs. `uncaughtException` / `unhandledRejection` are logged to stderr instead of crashing — a transient TUI render error should not kill an in-flight agent turn. ### Slash commands (new) - ui-tui/src/app/slash/commands/debug.ts: - `/heapdump` — manual snapshot + diagnostics. - `/mem` — live heap / rss / external / array-buffer / uptime panel. - Registered in `ui-tui/src/app/slash/registry.ts`. ### Utility (new) - ui-tui/src/lib/circularBuffer.ts: small fixed-capacity ring buffer with `push` / `tail(n)` / `drain()` / `clear()`. Replaces the ad-hoc `array.splice(0, len - MAX)` pattern. ## Validation - tsc `--noEmit` clean - `vitest run`: 15 files, 102 tests passing - eslint clean on all touched/new files - build produces executable `dist/entry.js` with preserved shebang - smoke-tested: `HERMES_HEAPDUMP_DIR=… performHeapDump('manual')` writes both a valid `.heapsnapshot` and a `.diagnostics.json` containing detached-contexts, active-handles, smaps_rollup. ## Env knobs - `HERMES_HEAPDUMP_DIR` — override snapshot output dir - `HERMES_HEAPDUMP_ON_START=1` — dump once at boot - existing `NODE_OPTIONS` is respected and appended, not replaced	2026-04-20 18:58:44 -05:00

1 2 3 4 5 ...

1268 commits