hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	4350668ae4	fix(transcription): fall back to CPU when CUDA runtime libs are missing faster-whisper's device="auto" picks CUDA when ctranslate2's wheel ships CUDA shared libs, even on hosts without the NVIDIA runtime (libcublas.so.12 / libcudnn*). On those hosts the model often loads fine but transcribe() fails at first dlopen, and the broken model stays cached in the module-global — every subsequent voice message in the gateway process fails identically until restart. - Add _load_local_whisper_model() wrapper: try auto, catch missing-lib errors, retry on device=cpu compute_type=int8. - Wrap transcribe() with the same fallback: evict cached model, reload on CPU, retry once. Required because the dlopen failure only surfaces at first kernel launch, not at model construction. - Narrow marker list (libcublas, libcudnn, libcudart, 'cannot be loaded', 'no kernel image is available', 'no CUDA-capable device', driver mismatch). Deliberately excludes 'CUDA out of memory' and similar — those are real runtime failures that should surface, not be silently retried on CPU. - Tests for load-time fallback, runtime fallback (with cached-model eviction verified), and the OOM non-fallback path. Reported via Telegram voice-message dumps on WSL2 hosts where libcublas isn't installed by default.	2026-04-24 02:50:14 -07:00
Teknium	34c3e67109	fix: sanitize tool schemas for llama.cpp backends; restore MCP in TUI (#15032 ) Local llama.cpp servers (e.g. ggml-org/llama.cpp:full-cuda) fail the entire request with HTTP 400 'Unable to generate parser for this template. ... Unrecognized schema: "object"' when any tool schema contains shapes its json-schema-to-grammar converter can't handle: * 'type': 'object' without 'properties' * bare string schema values ('additionalProperties: "object"') * 'type': ['X', 'null'] arrays (nullable form) Cloud providers accept these silently, so they ship from external MCP servers (Atlassian, GCloud, Datadog) and from a couple of our own tools. Changes - tools/schema_sanitizer.py: walks the finalized tool list right before it leaves get_tool_definitions() and repairs the hostile shapes in a deep copy. No-op on well-formed schemas. Recurses into properties, items, additionalProperties, anyOf/oneOf/allOf, and $defs. - model_tools.get_tool_definitions(): invoke the sanitizer as the last step so all paths (built-in, MCP, plugin, dynamically-rebuilt) get covered uniformly. - tools/browser_cdp_tool.py, tools/mcp_tool.py: fix our own bare-object schemas so sanitization isn't load-bearing for in-repo tools. - tui_gateway/server.py: _load_enabled_toolsets() was passing include_default_mcp_servers=False at runtime. That's the config-editing variant (see PR #3252) — it silently drops every default MCP server from the TUI's enabled_toolsets, which is why the TUI didn't hit the llama.cpp crash (no MCP tools sent at all). Switch to True so TUI matches CLI behavior. Tests tests/tools/test_schema_sanitizer.py (17 tests) covers the individual failure modes, well-formed pass-through, deep-copy isolation, and required-field pruning. E2E: loaded the default 'hermes-cli' toolset with MCP discovery and confirmed all 27 resolved tool schemas pass a llama.cpp-compatibility walk (no 'object' node missing 'properties', no bare-string schema values).	2026-04-24 02:44:46 -07:00
brooklyn!	5dda4cab41	Merge pull request #14968 from NousResearch/bb/tui-section-visibility feat(tui): per-section visibility for the details accordion	2026-04-24 03:02:26 -05:00
Brooklyn Nicholson	6604e94c75	fix(tui): gate messageLine on content-bearing sections, not all sections Round-2 Copilot review on #14968 caught two leftover spots that didn't fully respect per-section overrides: - messageLine.tsx (trail branch): the previous fix gated on `SECTION_NAMES.some(...)`, which stayed true whenever any section was visible. With `thinking: 'expanded'` as the new built-in default, that meant `display.sections.tools: hidden` left an empty wrapper Box alive for trail messages. Now gates on the actual content-bearing sections for a trail message — `tools` OR `activity` — so a tools-hidden config drops the wrapper cleanly. - messageLine.tsx (showDetails): still keyed off the global `detailsMode !== 'hidden'`, so per-section overrides like `sections.thinking: expanded` couldn't escape global hidden for assistant messages with reasoning + tool metadata. Recomputed via resolved per-section modes (`thinkingMode`/`toolsMode`). - types.ts: rewrote the SectionVisibility doc comment to reflect the actual resolution order (explicit override → SECTION_DEFAULTS → global), so the docstring stops claiming "missing keys fall back to the global mode" when SECTION_DEFAULTS now layers in between. All three lookups (thinking/tools/activity) are computed once at the top of MessageLine and shared by every branch.	2026-04-24 03:01:06 -05:00
Brooklyn Nicholson	67bfd4b828	feat(tui): stream thinking + tools expanded by default Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as a live transcript (reasoning + tool calls streaming inline) instead of a wall of `▸` chevrons the user has to click every turn. Final default matrix: - thinking: expanded - tools: expanded - activity: hidden (unchanged from the previous commit) - subagents: falls through to details_mode (collapsed by default) Everything explicit in `display.sections` still wins, so anyone who already pinned an override keeps their layout. One-line revert is `display.sections.<name>: collapsed`.	2026-04-24 02:53:44 -05:00
Brooklyn Nicholson	70925363b6	fix(tui): per-section overrides escape global details_mode: hidden Copilot review on #14968 caught that the early returns gated on the global `detailsMode === 'hidden'` short-circuited every render path before sectionMode() got a chance to apply per-section overrides — so `details_mode: hidden` + `sections.tools: expanded` was silently a no-op. Three call sites had the same bug shape; all now key off the resolved section modes: - ToolTrail: replace the `detailsMode === 'hidden'` early return with an `allHidden = every section resolved to hidden` check. When that's true, fall back to the floating-alert backstop (errors/warnings) so quiet-mode users aren't blind to ambient failures, and update the comment block to match the actual condition. - messageLine.tsx: drop the same `detailsMode === 'hidden'` pre-check on `msg.kind === 'trail'`; only skip rendering the wrapper when every section resolves to hidden (`SECTION_NAMES.some(...) !== 'hidden'`). - useMainApp.ts: rebuild `showProgressArea` around `anyPanelVisible` instead of branching on the global mode. This also fixes the suppressed Copilot concern about an empty wrapper Box rendering above the streaming area when ToolTrail returns null. Regression test in details.test.ts pins the override-escapes-hidden behaviour for tools/thinking/activity. 271/271 vitest, lints clean.	2026-04-24 02:49:58 -05:00
Brooklyn Nicholson	005cc29e98	refactor(tui): /clean pass on per-section visibility plumbing - domain/details: extract `norm()`, fold parseDetailsMode + resolveSections into terser functional form, reject array values for resolveSections - slash /details: destructure tokens, factor reset/mode into one dispatch, drop DETAIL_MODES set + DetailsMode/SectionName imports (parseDetailsMode + isSectionName narrow + return), centralize usage strings - ToolTrail: collapse 4 separate xxxSection vars into one memoized `visible` map; effect deps stabilize on the memo identity instead of 4 primitives	2026-04-24 02:42:03 -05:00
Brooklyn Nicholson	728767e910	feat(tui): hide the activity panel by default The activity panel (gateway hints, terminal-parity nudges, background notifications) is noise for the typical day-to-day user, who only cares about thinking + tools + streamed content. Make `hidden` the built-in default for that section so users land on the quiet mode out of the box. Tool failures still render inline on the failing tool row, so this default suppresses the noise feed without losing the signal. Opt back in with `display.sections.activity: collapsed` (chevron) or `expanded` (always open) in `~/.hermes/config.yaml`, or live with `/details activity collapsed`. Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the fallback in `sectionMode()` between the explicit override and the global details_mode. Existing `display.sections.activity` overrides take precedence — no migration needed for users who already set it.	2026-04-24 02:37:42 -05:00
Brooklyn Nicholson	78481ac124	feat(tui): per-section visibility for the details accordion Adds optional per-section overrides on top of the existing global details_mode (hidden \| collapsed \| expanded). Lets users keep the accordion collapsed by default while auto-expanding tools, or hide the activity panel entirely without touching thinking/tools/subagents. Config (~/.hermes/config.yaml): display: details_mode: collapsed sections: thinking: expanded tools: expanded activity: hidden Slash command: /details show current global + overrides /details [hidden\|collapsed\|expanded] set global mode (existing) /details <section> <mode\|reset> per-section override (new) /details <section> reset clear override Sections: thinking, tools, subagents, activity. Implementation: - ui-tui/src/types.ts SectionName + SectionVisibility - ui-tui/src/domain/details.ts parseSectionMode / resolveSections / sectionMode + SECTION_NAMES - ui-tui/src/app/uiStore.ts + app/interfaces.ts + app/useConfigSync.ts sections threaded into UiState - ui-tui/src/components/ thinking.tsx ToolTrail consults per-section mode for hidden/expanded behaviour; expandAll skips hidden sections; floating-alert fallback respects activity:hidden - ui-tui/src/components/ messageLine.tsx + appLayout.tsx pass sections through render tree - ui-tui/src/app/slash/ commands/core.ts /details <section> <mode\|reset> syntax - tui_gateway/server.py config.set details_mode.<section> writes to display.sections.<section> (empty value clears the override) - website/docs/user-guide/tui.md documented Tests: 14 new (4 domain, 4 useConfigSync, 3 slash, 3 gateway). Total: 269/269 vitest, all gateway tests pass.	2026-04-24 02:34:32 -05:00
Teknium	6051fba9dc	feat(banner): hyperlink startup banner title to latest GitHub release (#14945 ) Wrap the existing version label in the welcome-banner panel title ('Hermes Agent v… · upstream … · local …') with an OSC-8 terminal hyperlink pointing at the latest git tag's GitHub release page (https://github.com/NousResearch/hermes-agent/releases/tag/<tag>). Clickable in modern terminals (iTerm2, WezTerm, Windows Terminal, GNOME Terminal, Kitty, etc.); degrades to plain text on terminals without OSC-8 support. No new line added to the banner. New get_latest_release_tag() helper runs 'git describe --tags --abbrev=0' in the Hermes checkout (3s timeout, per-process cache, silent fallback for non-git/pip installs and forks without tags).	2026-04-23 23:28:34 -07:00
Teknium	2acc8783d1	fix(errors): classify OpenRouter privacy-guardrail 404s distinctly (#14943 ) OpenRouter returns a 404 with the specific message 'No endpoints available matching your guardrail restrictions and data policy. Configure: https://openrouter.ai/settings/privacy' when a user's account-level privacy setting excludes the only endpoint serving a model (e.g. DeepSeek V4 Pro, which today is hosted only by DeepSeek's own endpoint that may log inputs). Before this change we classified it as model_not_found, which was misleading (the model exists) and triggered provider fallback (useless — the same account setting applies to every OpenRouter call). Now it classifies as a new FailoverReason.provider_policy_blocked with retryable=False, should_fallback=False. The error body already contains the fix URL, so the user still gets actionable guidance.	2026-04-23 23:26:29 -07:00
brooklyn!	acdcb167fb	fix(tui): harden terminal dimming and multiplexer copy (#14906 ) - disable ANSI dim on VTE terminals by default so dark-background reasoning and accents stay readable - suppress local multiplexer OSC52 echo while preserving remote passthrough and add regression coverage	2026-04-23 22:46:28 -07:00
Teknium	51f4c9827f	fix(context): resolve real Codex OAuth context windows (272k, not 1M) (#14935 ) On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens, but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev) because openai-codex aliases to the openai entry there. At 1.05M the compressor never fires and requests hard-fail with 'context window exceeded' around the real 272k boundary. Verified live against chatgpt.com/backend-api/codex/models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.1-codex-max → context_window = 272000 Changes: - agent/model_metadata.py: * _fetch_codex_oauth_context_lengths() — probe the Codex /models endpoint with the OAuth bearer token and read context_window per slug (1h in-memory TTL). * _resolve_codex_oauth_context_length() — prefer the live probe, fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k). * Wire into get_model_context_length() when provider=='openai-codex', running BEFORE the models.dev lookup (which returns 1.05M). Result persists via save_context_length() so subsequent lookups skip the probe entirely. * Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5 entry (400k was never right for Codex; it's the catch-all for providers we can't probe live). Tests (4 new in TestCodexOAuthContextLength): - fallback table used when no token is available (no models.dev leakage) - live probe overrides the fallback - probe failure (non-200) falls back to hardcoded 272k - non-codex providers (openrouter, direct openai) unaffected Non-codex context resolution is unchanged — the Codex branch only fires when provider=='openai-codex'.	2026-04-23 22:39:47 -07:00
Teknium	2e78a2b6b2	feat(models): add deepseek-v4-pro and deepseek-v4-flash (#14934 ) - OpenRouter: deepseek/deepseek-v4-pro, deepseek/deepseek-v4-flash - Nous Portal (fallback list): same two slugs - Native DeepSeek provider: bare deepseek-v4-pro, deepseek-v4-flash alongside existing deepseek-chat/deepseek-reasoner Context length resolves via existing 'deepseek' substring entry (128K) in DEFAULT_CONTEXT_LENGTHS.	2026-04-23 22:35:04 -07:00
Teknium	5a1c599412	feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540 ) * docs: browser CDP supervisor design (for upcoming PR) Design doc ahead of implementation — dialog + iframe detection/interaction via a persistent CDP supervisor. Covers backend capability matrix (verified live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split, non-goals, and test plan. Supersedes #12550. No code changes in this commit. * feat(browser): add persistent CDP supervisor for dialog + frame detection Single persistent CDP WebSocket per Hermes task_id that subscribes to Page/Runtime/Target events and maintains thread-safe state for pending dialogs, frame tree, and console errors. Supervisor lives in its own daemon thread running an asyncio loop; external callers use sync API (snapshot(), respond_to_dialog()) that bridges onto the loop. Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true} and enables Page+Runtime on each so iframe-origin dialogs surface through the same supervisor. Dialog policies: must_respond (default, 300s safety timeout), auto_dismiss, auto_accept. Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot payloads bounded on ad-heavy pages. E2E verified against real Chrome via smoke test — detects + responds to main-frame alerts, iframe-contentWindow alerts, preserves frame tree, graceful no-dialog error path, clean shutdown. No agent-facing tool wiring in this commit (comes next). * feat(browser): add browser_dialog tool wired to CDP supervisor Agent-facing response-only tool. Schema: action: 'accept' \| 'dismiss' (required) prompt_text: response for prompt() dialogs (optional) dialog_id: disambiguate when multiple dialogs queued (optional) Handler: SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...) check_fn shares _browser_cdp_check with browser_cdp so both surface and hide together. When no supervisor is attached (Camofox, default Playwright, or no browser session started yet), tool is hidden; if somehow invoked it returns a clear error pointing the agent to browser_navigate / /browser connect. Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp / hermes-api-server toolsets alongside browser_cdp. * feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot Supervisor lifecycle: * _get_session_info lazy-starts the supervisor after a session row is materialized — covers every backend code path (Browserbase, cdp_url override, /browser connect, future providers) with one hook. * cleanup_browser(task_id) stops the supervisor for that task first (before the backend tears down CDP). * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all(). * /browser connect eagerly starts the supervisor for task 'default' so the first snapshot already shows pending_dialogs. * /browser disconnect stops the supervisor. CDP URL resolution for the supervisor: 1. BROWSER_CDP_URL / browser.cdp_url override. 2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase). browser_snapshot merges supervisor state (pending_dialogs + frame_tree) into its JSON output when a supervisor is active — the agent reads pending_dialogs from the snapshot it already requests, then calls browser_dialog to respond. No extra tool surface. Config defaults: * browser.dialog_policy: 'must_respond' (new) * browser.dialog_timeout_s: 300 (new) No version bump — new keys deep-merge into existing browser section. Deadlock fix in supervisor event dispatch: * _on_dialog_opening and _on_target_attached used to await CDP calls while the reader was still processing an event — but only the reader can set the response Future, so the call timed out. * Both now fire asyncio.create_task(...) so the reader stays pumping. * auto_dismiss/auto_accept now actually close the dialog immediately. Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome): * supervisor start/snapshot * main-frame alert detection + dismiss * iframe.contentWindow alert * prompt() with prompt_text reply * respond with no pending dialog -> clean error * auto_dismiss clears on event * registry idempotency * registry stop -> snapshot reports inactive * browser_dialog tool no-supervisor error * browser_dialog invalid action * browser_dialog end-to-end via tool handler xdist-safe: chrome_cdp fixture uses a per-worker port. Skipped when google-chrome/chromium isn't installed. * docs(browser): document browser_dialog tool + CDP supervisor - user-guide/features/browser.md: new browser_dialog section with workflow, availability gate, and dialog_policy table - reference/tools-reference.md: row for browser_dialog, tool count bumped 53 -> 54, browser tools count 11 -> 12 - reference/toolsets-reference.md: browser_dialog added to browser toolset row with note on pending_dialogs / frame_tree snapshot fields Full design doc lives at developer-guide/browser-supervisor.md (committed earlier). * fix(browser): reconnect loop + recent_dialogs for Browserbase visibility Found via Browserbase E2E test that revealed two production-critical issues: 1. Supervisor WebSocket drops when other clients disconnect. Browserbase's CDP proxy tears down our long-lived WebSocket whenever a short-lived client (e.g. agent-browser CLI's per-command CDP connection) disconnects. Fixed with a reconnecting _run loop that re-attaches with exponential backoff on drops. _page_session_id and _child_sessions are reset on each reconnect; pending_dialogs and frames are preserved across reconnects. 2. Browserbase auto-dismisses dialogs server-side within ~10ms. Their Playwright-based CDP proxy dismisses alert/confirm/prompt before our Page.handleJavaScriptDialog call can respond. So pending_dialogs is empty by the time the agent reads a snapshot on Browserbase. Added a recent_dialogs ring buffer (capacity 20) that retains a DialogRecord for every dialog that opened, with a closed_by tag: * 'agent' — agent called browser_dialog * 'auto_policy' — local auto_dismiss/auto_accept fired * 'watchdog' — must_respond timeout auto-dismissed (300s default) * 'remote' — browser/backend closed it on us (Browserbase) Agents on Browserbase now see the dialog history with closed_by='remote' so they at least know a dialog fired, even though they couldn't respond. 3. Page.javascriptDialogClosed matching bug. The event doesn't include a 'message' field (CDP spec has only 'result' and 'userInput') but our _on_dialog_closed was matching on message. Fixed to match by session_id + oldest-first, with a safety assumption that only one dialog is in flight per session (the JS thread is blocked while a dialog is up). Docs + tests updated: * browser.md: new availability matrix showing the three backends and which mode (pending / recent / response) each supports * developer-guide/browser-supervisor.md: three-field snapshot schema with closed_by semantics * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12 passing against real Chrome) E2E verified both backends: * Local Chrome via /browser connect: detect + respond full workflow (smoke_supervisor.py all 7 scenarios pass) * Browserbase: detect via recent_dialogs with closed_by='remote' (smoke_supervisor_browserbase_v2.py passes) Camofox remains out of scope (REST-only, no CDP) — tracked for upstream PR 3. * feat(browser): XHR bridge for dialog response on Browserbase (FIXED) Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so Page.handleJavaScriptDialog calls lose the race. Solution: bypass native dialogs entirely. The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a JavaScript override for window.alert/confirm/prompt. Those overrides perform a synchronous XMLHttpRequest to a magic host ('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable with a requestStage=Request pattern. Flow when a page calls alert('hi'): 1. window.alert override intercepts, builds XHR GET to http://hermes-dialog-bridge.invalid/?kind=alert&message=hi 2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics) 3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces it as a pending dialog with bridge_request_id set 4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog 5. Supervisor calls Fetch.fulfillRequest with JSON body: {accept: true\|false, prompt_text: '...', dialog_id: 'd-N'} 6. The injected script parses the body, returns the appropriate value from the override (undefined for alert, bool for confirm, string\|null for prompt) This works identically on Browserbase AND local Chrome — no native dialog ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog policies (must_respond / auto_dismiss / auto_accept) all still work. Bridge is installed on every attached session (main page + OOPIF child sessions) so iframe dialogs are captured too. Native-dialog path kept as a fallback for backends that don't auto-dismiss (so a page that somehow bypasses our override — e.g. iframes that load after Fetch.enable but before the init-script runs — still gets observed via Page.javascriptDialogOpening). E2E VERIFIED: * Local Chrome: 13/13 pytest tests green (12 original + new test_bridge_captures_prompt_and_returns_reply_text that asserts window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds) * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS: - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓ - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY' → page.prompt_ret === 'AGENT-REPLY' ✓ - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓ - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓ Docs updated in browser.md and developer-guide/browser-supervisor.md — availability matrix now shows Browserbase at full parity with local Chrome for both detection and response. * feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...) Adds iframe interaction to the CDP supervisor PR (was queued as PR 2). Design: browser_cdp gets an optional frame_id parameter. When set, the tool looks up the frame in the supervisor's frame_tree, grabs its child cdp_session_id (OOPIF session), and dispatches the CDP call through the supervisor's already-connected WebSocket via run_coroutine_threadsafe. Why not stateless: on Browserbase, each fresh browser_cdp WebSocket must re-negotiate against a signed connectUrl. The session info carries a specific URL that can expire while the supervisor's long-lived connection stays valid. Routing via the supervisor sidesteps this. Agent workflow: 1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true 2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>, params={'expression': 'document.title', 'returnByValue': True}) 3. Supervisor dispatches the call on the OOPIF's child session Supervisor state fixes needed along the way: * _on_frame_detached now skips reason='swap' (frame migrating processes) * _on_frame_detached also skips when the frame is an OOPIF with a live child session — Browserbase fires spurious remove events when a same-origin iframe gets promoted to OOPIF * _on_target_detached clears cdp_session_id but KEEPS the frame record so the agent still sees the OOPIF in frame_tree during transient session flaps E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py): browser_cdp(method='Runtime.evaluate', params={'expression': 'document.title', 'returnByValue': True}, frame_id=<OOPIF>) → {'success': True, 'result': {'value': 'Example Domain'}} The iframe is <iframe src='https://example.com/'> inside a top-level data: URL page on a real Browserbase session. The agent Runtime.evaluates INSIDE the cross-origin iframe and gets example.com's title back. Tests (tests/tools/test_browser_supervisor.py — 16 pass total): * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF, verifies routing via supervisor, Runtime.evaluate returns 1+1=2 * test_browser_cdp_frame_id_missing_supervisor — clean error when no supervisor attached * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad frame_id Docs (browser.md and developer-guide/browser-supervisor.md) updated with the iframe workflow, availability matrix now shows OOPIF eval as shipped for local Chrome + Browserbase. * test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process When asked 'did you test the iframe stuff' I had only done a mocked pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/ smoke_local_oopif.py: * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906) * Chrome with --site-per-process so the cross-origin iframe becomes a real OOPIF in its own process * Navigate, find OOPIF in supervisor.frame_tree, call browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes through the supervisor's child session * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the inner page, retrieved via OOPIF eval) PASSED on 2026-04-23. Tried to embed this as a pytest but hit an asyncio version quirk between venv (3.11) and the system python (3.13) — Page.navigate hangs in the pytest harness but works in standalone. Left a self-documenting skip test that points to the smoke script + describes the verification. chrome_cdp fixture now passes --site-per-process so future iframe tests can rely on OOPIF behavior. Result: 16 pass + 1 documented-skip = 17 tests in tests/tools/test_browser_supervisor.py. * docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count Pre-merge docs audit revealed two gaps: 1. user-guide/configuration.md browser config example was missing the two new dialog_* knobs. Added with a short table explaining must_respond / auto_dismiss / auto_accept semantics and a link to the feature page for the full workflow. 2. reference/tools-reference.md header said '54 built-in tools' — real count on main is 54, this branch adds browser_dialog so it's 55. Fixed the header. (browser count was already correctly bumped 11 -> 12 in the earlier docs commit.) No code changes.	2026-04-23 22:23:37 -07:00
Teknium	0f6eabb890	docs(website): dedicated page per bundled + optional skill (#14929 ) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.	2026-04-23 22:22:11 -07:00
Austin Pickett	809868e628	feat: refac	2026-04-24 01:04:19 -04:00
Teknium	eb93f88e1d	chore(release): add MattMaximo to AUTHOR_MAP for PR #10450 salvage	2026-04-23 22:01:24 -07:00
Matt Maximo	3ccda2aa05	fix(mcp): seed protocol header before HTTP initialize	2026-04-23 22:01:24 -07:00
Austin Pickett	e5d2815b41	feat: add sidebar	2026-04-24 00:56:19 -04:00
Teknium	983bbe2d40	feat(skills): add design-md skill for Google's DESIGN.md spec (#14876 ) * feat(config): make tool output truncation limits configurable Port from anomalyco/opencode#23770: expose a new `tool_output` config section so users can tune the hardcoded truncation caps that apply to terminal output and read_file pagination. Three knobs under `tool_output`: - max_bytes (default 50_000) — terminal stdout/stderr cap - max_lines (default 2000) — read_file pagination cap - max_line_length (default 2000) — per-line cap in line-numbered view All three keep their existing hardcoded values as defaults, so behaviour is unchanged when the section is absent. Power users on big-context models can raise them; small-context local models can lower them. Implementation: - New `tools/tool_output_limits.py` reads the section with defensive fallback (missing/invalid values → defaults, never raises). - `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from get_max_bytes(). - `tools/file_operations.py` normalize_read_pagination() and _add_line_numbers() now pull the limits at call time. - `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section so `hermes setup` writes defaults into fresh configs. - Docs page `user-guide/configuration.md` gains a "Tool Output Truncation Limits" section with large-context and small-context example configs. Tests (18 new in tests/tools/test_tool_output_limits.py): - Default resolution with missing / malformed / non-dict config. - Full and partial user overrides. - Coercion of bad values (None, negative, wrong type, str int). - Shortcut accessors delegate correctly. - DEFAULT_CONFIG exposes the section with the right defaults. - Integration: normalize_read_pagination clamps to the configured max_lines. * feat(skills): add design-md skill for Google's DESIGN.md spec Built-in skill under skills/creative/ that teaches the agent to author, lint, diff, and export DESIGN.md files — Google's open-source (Apache-2.0) format for describing a visual identity to coding agents. Covers: - YAML front matter + markdown body anatomy - Full token schema (colors, typography, rounded, spacing, components) - Canonical section order + duplicate-heading rejection - Component property whitelist + variants-as-siblings pattern - CLI workflow via 'npx @google/design.md' (lint/diff/export/spec) - Lint rule reference including WCAG contrast checks - Common YAML pitfalls (quoted hex, negative dimensions, dotted refs) - Starter template at templates/starter.md Package verified live on npm (@google/design.md@0.1.1).	2026-04-23 21:51:19 -07:00
Teknium	379b2273d9	fix(mcp): route stdio subprocess stderr to log file, not user TTY (#14901 ) MCP stdio servers' stderr was being dumped directly onto the user's terminal during hermes launch. Servers like FastMCP-based ones print a large ASCII banner at startup; slack-mcp-server emits JSON logs; etc. With prompt_toolkit / Rich rendering the TUI concurrently, these unsolicited writes corrupt the terminal state — hanging the session ~80% of the time for one user with Google Ads Tools + slack-mcp configured, forcing Ctrl+C and restart loops. Root cause: `stdio_client(server_params)` in tools/mcp_tool.py was called without `errlog=`, and the SDK's default is `sys.stderr` — i.e. the real parent-process stderr, which is the TTY. Fix: open a shared, append-mode log at $HERMES_HOME/logs/mcp-stderr.log (created once per process, line-buffered, real fd required by asyncio's subprocess machinery) and pass it as `errlog` to every stdio_client. Each server's spawn writes a timestamped header so the shared log stays readable when multiple servers are running. Falls back to /dev/null if the log file cannot be opened. Verified by E2E spawning a subprocess with the log fd as its stderr: banner lines land in the log file, nothing reaches the calling TTY.	2026-04-23 21:50:25 -07:00
ethernet	7db2703b33	Merge pull request #14895 from NousResearch/tui-resume fix(tui): keep FloatingOverlays visible when input is blocked	2026-04-24 01:44:50 -03:00
Ari Lotter	7c59e1a871	fix(tui): keep FloatingOverlays visible when input is blocked FloatingOverlays (SessionPicker, ModelPicker, SkillsHub, pager, completions) was nested inside the !isBlocked guard in ComposerPane. When any overlay opened, isBlocked became true, which removed the entire composer box from the tree — including the overlay that was trying to render. This made /resume with no args appear to do nothing (the input line vanished and no picker appeared). Since `99d859ce` (feat: refactor by splitting up app and doing proper state), isBlocked gated only the text input lines so that approval/clarify prompts and pickers rendered above a hidden composer. The regression happened in `408fc893` (fix(tui): tighten composer — status sits directly above input, overlays anchor to input) when FloatingOverlays was moved into the input row for anchoring but accidentally kept inside the !isBlocked guard. so here, we render FloatingOverlays outside the !isBlocked guard inside the same position:relative Box, so overlays stay visible even when text input is hidden. Only the actual input buffer lines and TextInput are gated now. Fixes: /resume, /history, /logs, /model, /skills, and completion dropdowns when blocked overlays are active.	2026-04-23 23:44:52 -04:00
brooklyn!	6fdbf2f2d7	Merge pull request #14820 from NousResearch/bb/tui-at-fuzzy-match fix(tui): @<name> fuzzy-matches filenames across the repo	2026-04-23 19:40:43 -05:00
Brooklyn Nicholson	0a679cb7ad	fix(tui): restore voice/panic handlers + scope fuzzy paths to cwd Two fixes on top of the fuzzy-@ branch: (1) Rebase artefact: re-apply only the fuzzy additions on top of fresh `tui_gateway/server.py`. The earlier commit was cut from a base 58 commits behind main and clobbered ~170 lines of voice.toggle / voice.record handlers and the gateway crash hooks (`_panic_hook`, `_thread_panic_hook`). Reset server.py to origin/main and re-add only: - `_FUZZY_*` constants + `_list_repo_files` + `_fuzzy_basename_rank` - the new fuzzy branch in the `complete.path` handler (2) Path scoping (Copilot review): `git ls-files` returns repo-root- relative paths, but completions need to resolve under the gateway's cwd. When hermes is launched from a subdirectory, the previous code surfaced `@file:apps/web/src/foo.tsx` even though the agent would resolve that relative to `apps/web/` and miss. Fix: - `git -C root rev-parse --show-toplevel` to get repo top - `git -C top ls-files …` for the listing - `os.path.relpath(top + p, root)` per result, dropping anything starting with `../` so the picker stays scoped to cwd-and-below (matches Cmd-P workspace semantics) `apps/web/src/foo.tsx` ends up as `@file:src/foo.tsx` from inside `apps/web/`, and sibling subtrees + parent-of-cwd files don't leak. New test `test_fuzzy_paths_relative_to_cwd_inside_subdir` builds a 3-package mono-repo, runs from `apps/web/`, and verifies completion paths are subtree-relative + outside-of-cwd files don't appear. Copilot review threads addressed: #3134675504 (path scoping), #3134675532 (`voice.toggle` regression), #3134675541 (`voice.record` regression — both were stale-base artefacts, not behavioural changes).	2026-04-23 19:38:33 -05:00
Brooklyn Nicholson	41b4d69167	Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-at-fuzzy-match	2026-04-23 19:35:18 -05:00
brooklyn!	3f343cf7cf	Merge pull request #14822 from NousResearch/bb/tui-inline-diff-segment-anchor fix(tui): anchor inline_diff to the segment where the edit happened	2026-04-23 19:32:21 -05:00
Brooklyn Nicholson	4ae5b58cb1	fix(tui): restore voice handlers + address copilot review Rebase-artefact cleanup on this branch: - Restore `voice.status` and `voice.transcript` cases in createGatewayEventHandler plus the `voice` / `submission` / `composer.setInput` ctx destructuring. They were added to main in the 58-commit gap that this branch was originally cut behind; dropping them was unintentional. - Rebase the test ctx shape to match main (voice.* fakes, submission.submitRef, composer.setInput) and apply the same segment-anchor test rewrites on top. - Drop the `#14XXX` placeholder from the tool.complete comment; replace with a plain-English rationale. - Rewrite the broken mid-word "pushInlineDiff- Segment" in turnController's dedupe comment to refer to pushInlineDiffSegment and `kind: 'diff'` plainly. - Collapse the filter predicate in recordMessageComplete from a 4-line if/return into one boolean expression — same semantics, reads left-to-right as a single predicate. Copilot review threads resolved: #3134668789, #3134668805, #3134668822.	2026-04-23 19:22:41 -05:00
Brooklyn Nicholson	2258a181f0	fix(tui): give inline_diff segments blank-line breathing room Visual polish on top of the segment-anchor change: diff blocks were butting up against the narration around them. Tag diff-only segments with `kind: 'diff'` (extended on Msg) and give them `marginTop={1}` + `marginBottom={1}` in MessageLine, matching the spacing we already use for user messages. Also swaps the regex-based `diffSegmentBody` check for an explicit `kind === 'diff'` guard so the dedupe path is clearer.	2026-04-23 19:11:59 -05:00
Brooklyn Nicholson	11b2942f16	fix(tui): anchor inline_diff to the segment where the edit happened Revisits #13729. That PR buffered each `tool.complete`'s inline_diff and merged them into the final assistant message body as a fenced ```diff block. The merge-at-end placement reads as "the agent wrote this after the summary", even when the edit fired mid-turn — which is both misleading and (per blitz feedback) feels like noise tacked onto the end of every task. Segment-anchored placement instead: - On tool.complete with inline_diff, `pushInlineDiffSegment` calls `flushStreamingSegment` first (so any in-progress narration lands as its own segment), then pushes the ```diff block as its own segment into segmentMessages. The diff is now anchored BETWEEN the narration that preceded the edit and whatever the agent streams afterwards, which is where the edit actually happened. - `recordMessageComplete` no longer merges buffered diffs. The only remaining dedupe is "drop diff-only segments whose body the final assistant text narrates verbatim (or whose diff fence the final text already contains)" — same tradeoff as before, kept so an agent that narrates its own diff doesn't render two stacked copies. - Drops `pendingInlineDiffs` and `queueInlineDiff` — buffer + end- merge machinery is gone; segmentMessages is now the only source of truth. Side benefit: Ctrl+C interrupt (`interruptTurn`) iterates segmentMessages, so diff segments are now preserved in the transcript when the user cancels after an edit. Previously the pending buffer was silently dropped on interrupt. Reported by Teknium during blitz usage: "no diffs are ever at the end because it didn't make this file edit after the final message".	2026-04-23 19:02:44 -05:00
Brooklyn Nicholson	b08cbc7a79	fix(tui): @<name> fuzzy-matches filenames across the repo Typing `@appChrome` in the composer should surface `ui-tui/src/components/appChrome.tsx` without requiring the user to first type the full directory path — matches the Cmd-P behaviour users expect from modern editors. The gateway's `complete.path` handler was doing a plain `os.listdir(".")` + `startswith` prefix match, so basenames only resolved inside the current working directory. This reworks it to: - enumerate repo files via `git ls-files -z --cached --others --exclude-standard` (fast, honours `.gitignore`); fall back to a bounded `os.walk` that skips common vendor / build dirs when the working dir isn't a git repo. Results cached per-root with a 5s TTL so rapid keystrokes don't respawn git processes. - rank basenames with a 5-tier scorer: exact → prefix → camelCase / word-boundary → substring → subsequence. Shorter basenames win ties; shorter rel paths break basename-length ties. - only take the fuzzy branch when the query is bare (no `/`), is a context reference (`@...`), and isn't `@folder:` — path-ish queries and folder tags fall through to the existing directory-listing path so explicit navigation intent is preserved. Completion rows now carry `display = basename`, `meta = directory`, so the picker renders `appChrome.tsx ui-tui/src/components` on one row (basename bold, directory dim) — the meta column was previously "dir" / "" and is a more useful signal for fuzzy hits. Reported by Ben Barclay during the TUI v2 blitz test.	2026-04-23 19:01:27 -05:00
ethernet	c95c6bdb7c	Merge pull request #14818 from NousResearch/ink-perf perf(ink): cache text measurements across yoga flex re-passes	2026-04-23 20:58:54 -03:00
Ari Lotter	bd929ea514	perf(ink): cache text measurements across yoga flex re-passes Adds a per-ink-text measurement cache keyed by width\|widthMode to avoid re-squashing and re-wrapping the same text when yoga calls measureFunc multiple times per frame with different widths during flex layout re-pass.	2026-04-23 19:45:10 -04:00
Teknium	6a20e187dd	test,chore: cover stringified array/object coercion + AUTHOR_MAP entry Follow-up to the cherry-picked coercion commit: adds 9 regression tests covering array/object parsing, invalid-JSON passthrough, wrong-shape preservation, and the issue #3947 gmail-mcp scenario end-to-end. Adds dan@danlynn.com -> danklynn to scripts/release.py AUTHOR_MAP so the salvage PR's contributor attribution doesn't break CI.	2026-04-23 16:38:38 -07:00
Dan Lynn	9ff21437a0	fix(mcp): coerce stringified arrays/objects in tool args When a tool schema declares `type: array` or `type: object` and the model emits the value as a JSON string (common with complex oneOf discriminated unions), the MCP server rejects it with -32602 "expected array, received string". Extend `_coerce_value` to attempt `json.loads` for these types and replace the string with the parsed value before dispatch. Root cause confirmed via live testing: `add_reminders.reminders` uses a oneOf discriminated union (relative/absolute/location) that triggers model output drift. Sending a real array passes validation; sending a string reproduces the exact error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 16:38:38 -07:00
0xbyt4	44a0cbe525	fix(tui): voice mode starts OFF each launch (CLI parity) The voice.toggle handler was persisting display.voice_enabled / display.voice_tts to config.yaml, so a TUI session that ever turned voice on would re-open with it already on (and the mic badge lit) on every subsequent launch. cli.py treats voice strictly as runtime state: _voice_mode = False at __init__, only /voice on flips it, and nothing writes it back to disk. Drop the _write_config_key calls in voice.toggle on/off/tts and the config.yaml fallback in _voice_mode_enabled / _voice_tts_enabled. State is now env-var-only (HERMES_VOICE / HERMES_VOICE_TTS), scoped to the live gateway subprocess — the next launch starts clean.	2026-04-23 16:18:15 -07:00
0xbyt4	2af0848f3c	fix(tui): ignore SIGPIPE so stderr back-pressure can't kill the gateway Crash-log stack trace (tui_gateway_crash.log) from the user's session pinned the regression: SIGPIPE arrived while main thread was blocked on for-raw-in-sys.stdin — i.e., a background thread (debug print to stderr, most likely from HERMES_VOICE_DEBUG=1) wrote to a pipe whose buffer the TUI hadn't drained yet, and SIG_DFL promptly killed the process. Two fixes that together restore CLI parity: - entry.py: SIGPIPE → SIG_IGN instead of the _log_signal handler that then exited. With SIG_IGN, Python raises BrokenPipeError on the offending write, which write_json already handles with a clean exit via _log_exit. SIGTERM / SIGHUP still route through _log_signal so real termination signals remain diagnosable. - hermes_cli/voice.py:_debug: wrap the stderr print in a BrokenPipeError / OSError try/except. This runs from daemon threads (silence callback, TTS playback, beep), so a broken stderr must not escape and ride up into the main event loop. Verified by spawning the gateway subprocess locally: voice.toggle status → 200 OK, process stays alive, clean exit on stdin close logs "reason=stdin EOF" instead of a silent reap.	2026-04-23 16:18:15 -07:00
0xbyt4	7baf370d3d	chore(tui): capture signal-triggered gateway exits in crash log SIG_DFL for SIGPIPE means the kernel reaps the gateway subprocess the instant a background thread (TTS playback, silence callback, voice status emitter) writes to a stdout the TUI stopped reading — before the Python interpreter can run excepthook, threading.excepthook, atexit, or the entry.py post-loop _log_exit. Replace the three SIG_DFL / SIG_IGN bindings with a _log_signal handler that: - records which signal (SIGPIPE / SIGTERM / SIGHUP) fired and when; - dumps the main-thread stack at signal delivery AND every live thread's stack via sys._current_frames — the background-thread write that provoked SIGPIPE is almost always visible here; - writes everything to ~/.hermes/logs/tui_gateway_crash.log and prints a [gateway-signal] breadcrumb to stderr so the TUI Activity surfaces it as well. SIGINT stays ignored (TUI handles Ctrl+C for the user).	2026-04-23 16:18:15 -07:00
0xbyt4	eeda18a9b7	chore(tui): record gateway exit reason in crash log Gateway exits weren't reaching the panic hook because entry.py calls sys.exit(0) on broken stdout — clean termination, no exception. That left "gateway exited" in the TUI with zero forensic trail when pipe breaks happened mid-turn. Entry.py now tags each exit path — startup-write failure, parse-error- response write failure, per-method response write failure, stdin EOF — with a one-line entry in ~/.hermes/logs/tui_gateway_crash.log and a gateway.stderr breadcrumb. Includes the JSON-RPC method name on the dispatch path, which is the only way to tell "died right after handling voice.toggle on" from "died emitting the second message.complete".	2026-04-23 16:18:15 -07:00
0xbyt4	3a9598337f	chore(tui): dump gateway crash traces to ~/.hermes/logs/tui_gateway_crash.log When the gateway subprocess raises an unhandled exception during a voice-mode turn, nothing survives: stdout is the JSON-RPC pipe, stderr flushes but the process is already exiting, and no log file catches Python's default traceback print. The user is left with an undiagnosable "gateway exited" banner. Install: - sys.excepthook → write full traceback to tui_gateway_crash.log + echo the first line to stderr (which the TUI pumps into Activity as a gateway.stderr event). Chains to the default hook so the process still terminates. - threading.excepthook → same, tagged with the thread name so it's clear when the crash came from a daemon thread (beep playback, TTS, silence callback, etc.). - Turn-dispatcher except block now also appends a traceback to the crash log before emitting the user-visible error event — str(e) alone was too terse to identify where in the voice pipeline the failure happened. Zero behavioural change on the happy path; purely forensics.	2026-04-23 16:18:15 -07:00
0xbyt4	98418afd5d	fix(tui): break TTS→STT feedback loop + colorize REC badge TTS feedback loop (hermes_cli/voice.py) The VAD loop kept the microphone live while speak_text played the agent's reply over the speakers, so the reply itself was picked up, transcribed, and submitted — the agent then replied to its own echo ("Ha, looks like we're in a loop"). Ported cli.py:_voice_tts_done synchronisation: - _tts_playing: threading.Event (initially set = "not playing"). - speak_text cancels the active recorder before opening the speakers, clears _tts_playing, and on exit waits 300 ms before re-starting the recorder — long enough for the OS audio device to settle so afplay and sounddevice don't race for it. - _continuous_on_silence now waits on _tts_playing (up to 60 s) before re-arming the mic with another 300 ms gap, mirroring cli.py:10619-10621. If the user flips voice off during the wait the loop exits cleanly instead of fighting for the device. Without both halves the loop races: if the silence callback fires before TTS starts it re-arms immediately; if TTS is already playing the pause-and-resume path catches it. Red REC badge (ui-tui appChrome + useMainApp) Classic CLI (cli.py:_get_voice_status_fragments) renders "● REC" in red and "◉ STT" in amber. TUI was showing a dim "REC" with no dot, making it hard to spot at a glance. voiceLabel now emits the same glyphs and appChrome colours them via t.color.error / t.color.warn, falling back to dim for the idle label.	2026-04-23 16:18:15 -07:00
0xbyt4	42ff785771	fix(tui): voice TTS speak-back + transcript-key bug + auto-submit Three issues surfaced during end-to-end testing of the CLI-parity voice loop and are fixed together because they all blocked "speak → agent responds → TTS reads it back" from working at all: 1. Wrong result key (hermes_cli/voice.py) transcribe_recording() returns {"success": bool, "transcript": str}, matching cli.py:_voice_stop_and_transcribe. The wrapper was reading result.get("text"), which is None, so every successful Groq / local STT response was thrown away and the 3-strikes halt fired after three silent-looking cycles. Fixed by reading "transcript" and also honouring "success" like the CLI does. Updated the loop simulation tests to return the correct shape. 2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py) The TUI had a voice.toggle "tts" subcommand but nothing downstream actually read the flag — agent replies never spoke. Mirrored cli.py:8747-8754's dispatch: on message.complete with status == "complete", if _voice_tts_enabled() is true, spawn a daemon thread running speak_text(response). Rewrote speak_text as a full port of cli.py:_voice_speak_response — same markdown-strip regex pipeline (code blocks, links, bold/italic, inline code, headers, list bullets, horizontal rules, excessive newlines), same 4000-char cap, same explicit mp3 output path, same MP3-over-OGG playback choice (afplay misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS audible output byte-for-byte identical to the classic CLI. 3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts) The voice.transcript handler branched on prev input via a setInput updater and fired submitRef.current inside the updater when prev was empty. React strict mode double-invokes state updaters, which would queue the submit twice; and when the composer had any content the transcript was merely appended — the agent never saw it. CLI _pending_input.put(transcript) unconditionally feeds the transcript as the next turn, so match that: always clear the composer and setTimeout(() => submitRef.current(text), 0) outside any updater. Side effect can't run twice this way, and a half-typed draft on the rare occasion is a fair trade vs. silently dropping the turn. Also added peak_rms to the rec.stop debug line so "recording too quiet" is diagnosable at a glance when HERMES_VOICE_DEBUG=1.	2026-04-23 16:18:15 -07:00
0xbyt4	04c489b587	feat(tui): match CLI's voice slash + VAD-continuous recording model The TUI had drifted from the CLI's voice model in two ways: - /voice on was lighting up the microphone immediately and Ctrl+B was interpreted as a mode toggle. The CLI separates the two: /voice on just flips the umbrella bit, recording only starts once the user presses Ctrl+B, which also sets _voice_continuous so the VAD loop auto-restarts until the user presses Ctrl+B again or three silent cycles pass. - /voice tts was missing entirely, so users couldn't turn agent reply speech on/off from inside the TUI. This commit brings the TUI to parity. Python - hermes_cli/voice.py: continuous-mode API (start_continuous, stop_continuous, is_continuous_active) layered on the existing PTT wrappers. The silence callback transcribes, fires on_transcript, tracks consecutive no-speech cycles, and auto-restarts — mirroring cli.py:_voice_stop_and_transcribe + _restart_recording. - tui_gateway/server.py: - voice.toggle now supports on / off / tts / status. The umbrella bit lives in HERMES_VOICE + display.voice_enabled; tts lives in HERMES_VOICE_TTS + display.voice_tts. /voice off also tears down any active continuous loop so a toggle-off really releases the microphone. - voice.record start/stop now drives start_continuous/stop_continuous. start is refused with a clear error when the mode is off, matching cli.py:handle_voice_record's early return on `not _voice_mode`. - New voice.transcript / voice.status events emit through _voice_emit (remembers the sid that last enabled the mode so events land in the right session). TypeScript - gatewayTypes.ts: voice.status + voice.transcript event discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse gains status for the new "started/stopped" responses. - interfaces.ts: GatewayEventHandlerContext gains composer.setInput + submission.submitRef + voice.{setRecording, setProcessing, setVoiceEnabled}; InputHandlerContext.voice gains enabled + setVoiceEnabled for the mode-aware Ctrl+B handler. - createGatewayEventHandler.ts: voice.status drives REC/STT badges; voice.transcript auto-submits when the composer is empty (CLI _pending_input.put parity) and appends when a draft is in flight. no_speech_limit flips voice off + sys line. - useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop), not voice.toggle, and nudges the user with a sys line when the mode is off instead of silently flipping it on. - useMainApp.ts: wires the new event-handler context fields. - slash/commands/session.ts: /voice handles on / off / tts / status with CLI-matching output ("voice: mode on · tts off"). Backward compat preserved for voice.record (was always PTT shape; gateway still honours start/stop with mode-gating added).	2026-04-23 16:18:15 -07:00
0xbyt4	0bb460b070	fix(tui): add missing hermes_cli.voice wrapper for gateway RPC tui_gateway/server.py:3486/3491/3509 imports start_recording, stop_and_transcribe, and speak_text from hermes_cli.voice, but the module never existed (not in git history — never shipped, never deleted). Every voice.record / voice.tts RPC call hit the ImportError branch and the TUI surfaced it as "voice module not available — install audio dependencies" even on boxes with sounddevice / faster-whisper / numpy installed. Adds a thin wrapper on top of tools.voice_mode (recording + transcription) and tools.tts_tool (text-to-speech): - start_recording() — idempotent; stores the active AudioRecorder in a module-global guarded by a Lock so repeat Ctrl+B presses don't fight over the mic. - stop_and_transcribe() — returns None for no-op / no-speech / Whisper-hallucination cases so the TUI's existing "no speech detected" path keeps working unchanged. - speak_text(text) — lazily imports tts_tool (optional provider SDKs stay unloaded until the first /voice tts call), parses the tool's JSON result, and plays the audio via play_audio_file. Paired with the Ctrl+B keybinding fix in the prior commit, the TUI voice pipeline now works end-to-end for the first time.	2026-04-23 16:18:15 -07:00
0xbyt4	3504bd401b	fix(tui): route Ctrl+B to voice toggle, not composer input When the user runs /voice and then presses Ctrl+B in the TUI, three handlers collaborate to consume the chord and none of them dispatch voice.record: - isAction() is platform-aware — on macOS it requires Cmd (meta/super), so Ctrl+B fails the match in useInputHandlers and never triggers voiceStart/voiceStop. - TextInput's Ctrl+B pass-through list doesn't include 'b', so the keystroke falls through to the wordMod backward-word branch on Linux and to the printable-char insertion branch on macOS — the latter is exactly what timmie reported ("enters a b into the tui"). - /voice emits "voice: on" with no hint, so the user has no way to know Ctrl+B is the recording toggle. Introduces isVoiceToggleKey(key, ch) in lib/platform.ts that matches raw Ctrl+B on every platform (mirrors tips.py and config.yaml's voice.record_key default) and additionally accepts Cmd+B on macOS so existing muscle memory keeps working. Wires it into useInputHandlers, adds Ctrl+B to TextInput's pass-through list so the global handler actually receives the chord, and appends "press Ctrl+B to record" to the /voice on message. Empirically verified with hermes --tui: Ctrl+B no longer leaks 'b' into the composer and now dispatches the voice.record RPC (the downstream ImportError for hermes_cli.voice is a separate upstream bug — follow-up patch).	2026-04-23 16:18:15 -07:00
Teknium	50d97edbe1	feat(delegation): bump default child_timeout_seconds to 600s (#14809 ) The 300s default was too tight for high-reasoning models on non-trivial delegated tasks — e.g. gpt-5.5 xhigh reviewing 12 files would burn >5min on reasoning tokens before issuing its first tool call, tripping the hard wall-clock timeout with 0 api_calls logged. - tools/delegate_tool.py: DEFAULT_CHILD_TIMEOUT 300 -> 600 - hermes_cli/config.py: surface delegation.child_timeout_seconds in DEFAULT_CONFIG so it's discoverable (previously the key was read by _get_child_timeout() but absent from the default config schema) Users can still override via config.yaml delegation.child_timeout_seconds or DELEGATION_CHILD_TIMEOUT_SECONDS env var (floor 30s, no ceiling).	2026-04-23 16:14:55 -07:00
Teknium	e26c4f0e34	fix(kimi,mcp): Moonshot schema sanitizer + MCP schema robustness (#14805 ) Fixes a broader class of 'tools.function.parameters is not a valid moonshot flavored json schema' errors on Nous / OpenRouter aggregators routing to moonshotai/kimi-k2.6 with MCP tools loaded. ## Moonshot sanitizer (agent/moonshot_schema.py, new) Model-name-routed (not base-URL-routed) so Nous / OpenRouter users are covered alongside api.moonshot.ai. Applied in ChatCompletionsTransport.build_kwargs when is_moonshot_model(model). Two repairs: 1. Fill missing 'type' on every property / items / anyOf-child schema node (structural walk — only schema-position dicts are touched, not container maps like properties/$defs). 2. Strip 'type' at anyOf parents; Moonshot rejects it. ## MCP normalizer hardened (tools/mcp_tool.py) Draft-07 $ref rewrite from PR #14802 now also does: - coerce missing / null 'type' on object-shaped nodes (salvages #4897) - prune 'required' arrays to names that exist in 'properties' (salvages #4651; Gemini 400s on dangling required) - apply recursively, not just top-level These repairs are provider-agnostic so the same MCP schema is valid on OpenAI, Anthropic, Gemini, and Moonshot in one pass. ## Crash fix: safe getattr for Tool.inputSchema _convert_mcp_schema now uses getattr(t, 'inputSchema', None) so MCP servers whose Tool objects omit the attribute entirely no longer abort registration (salvages #3882). ## Validation - tests/agent/test_moonshot_schema.py: 27 new tests (model detection, missing-type fill, anyOf-parent strip, non-mutation, real-world MCP shape) - tests/tools/test_mcp_tool.py: 7 new tests (missing / null type, required pruning, nested repair, safe getattr) - tests/agent/transports/test_chat_completions.py: 2 new integration tests (Moonshot route sanitizes, non-Moonshot route doesn't) - Targeted suite: 49 passed - E2E via execute_code with a realistic MCP tool carrying all three Moonshot rejection modes + dangling required + draft-07 refs: sanitizer produces a schema valid on Moonshot and Gemini	2026-04-23 16:11:57 -07:00
helix4u	24f139e16a	fix(mcp): rewrite definitions refs to in input schemas	2026-04-23 15:56:57 -07:00
Teknium	ef5eaf8d87	feat(cron): honor `hermes tools` config for the cron platform (#14798 ) Cron now resolves its toolset from the same per-platform config the gateway uses — `_get_platform_tools(cfg, 'cron')` — instead of blindly loading every default toolset. Existing cron jobs without a per-job override automatically lose `moa`, `homeassistant`, and `rl` (the `_DEFAULT_OFF_TOOLSETS` set), which stops the "surprise $4.63 mixture_of_agents run" class of bug (Norbert, Discord). Precedence inside `run_job`: 1. per-job `enabled_toolsets` (PR #14767 / #6130) — wins if set 2. `_get_platform_tools(cfg, 'cron')` — new, the blanket gate 3. `None` fallback (legacy) — only on resolver exception Changes: - hermes_cli/platforms.py: register 'cron' with default_toolset 'hermes-cron' - toolsets.py: add 'hermes-cron' toolset (mirrors 'hermes-cli'; `_get_platform_tools` then filters via `_DEFAULT_OFF_TOOLSETS`) - cron/scheduler.py: add `_resolve_cron_enabled_toolsets(job, cfg)`, call it at the `AIAgent(...)` kwargs site - tests/cron/test_scheduler.py: replace the 'None when not set' test (outdated contract) with an invariant ('moa not in default cron toolset') + new per-job-wins precedence test - tests/hermes_cli/test_tools_config.py: mark 'cron' as non-messaging in the gateway-toolset-coverage test	2026-04-23 15:48:50 -07:00

... 2 3 4 5 6 ...

5807 commits