hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

History

DavidMetcalfe 865a09a610 fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice Two-part fix: Part 1 (classifier override at agent/error_classifier.py:720-738): A transport disconnect on a reasoning model — even on a large session — now routes to FailoverReason.timeout instead of context_overflow. Without this, large-session reasoning-model disconnects route to the compression branch and silently delete conversation history on a phantom context-length error. The override is strictly targeted: non-reasoning models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to context_overflow on large sessions — the existing intentional behavior for chat models whose proxy doesn't idle-kill during prefill/generation. Part 2 (new agent/thinking_timeout_guidance.py + integration at agent/conversation_loop.py:3488-3567): New is_thinking_timeout() and build_thinking_timeout_guidance() helpers. When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3, Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning) hits a transport-kill on a small session (classifier says timeout directly) or after Part 1 routes correctly (large session), the user now sees reasoning-specific guidance with three actionable workarounds in priority order: 1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900 in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s for known reasoning models; raise further if upstream is even tighter). 2. Lower reasoning_budget or set reasoning_effort: medium on this model if the provider supports it. 3. Use a smaller / faster reasoning model if the task doesn't require deep thinking. The new guidance takes precedence via if/elif over the existing _is_stream_drop block, so a reasoning-model user with a transport-kill message sees actionable advice instead of the misleading "try execute_code with Python's open() for large files" advice (which is correct for the unrelated large-file-write stream-drop case but actively wrong for the thinking-timeout case). Verified: - 478 tests passing across 9 directly-relevant files (49 new + 429 existing, zero regressions). - Ruff lint clean on all 4 modified/new files. - Negative test: 6 parametrized regression guards confirm non-reasoning models still route to context_overflow on large sessions; 4 parametrized gates confirm non-timeout classifier reasons never trigger the guidance; 5 parametrized cases confirm non-transport messages never trigger it. - Regression guard: new guidance message does NOT contain "execute_code" or "open()" — the misleading advice is fully replaced, not appended alongside. - Cross-vendor dual review via agy -p: - Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one SHOULD-FIX (vprint block duplication — fixed by extracting detection into a helper module). - GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits (test placement — adopted at tests/agent/test_thinking_timeout_guidance.py; primary-model capture — accepted as non-issue per Flash's nit). Dependency note for maintainers: This PR includes agent/reasoning_timeouts.py (the reasoning-model allowlist module from PR #52238) because the Layer 1 override is load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238 lands on main, this PR's duplicate agent/reasoning_timeouts.py should be rebased away. Either PR can land first; the other rebase is mechanical. Fixes #52271.		2026-06-25 19:00:48 -07:00
..
acp	fix(codex): seed app-server sessions with configured cwd	2026-06-21 16:39:02 -07:00
acp_adapter
agent	fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice	2026-06-25 19:00:48 -07:00
ci	fix(ci): classify should default to no MCP	2026-06-23 10:32:27 -07:00
cli	feat(cli): note background delegate_task dispatch in _on_tool_complete	2026-06-25 19:57:58 -05:00
computer_use	feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842 )	2026-06-22 09:57:16 -07:00
cron	fix(cron): add default retention to per-run job output (#52383 ) (#52646 )	2026-06-25 16:00:13 -07:00
docker	fix(docker): replace dashboard --insecure with basic-auth provider	2026-06-21 19:05:27 -07:00
e2e	refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins	2026-06-20 10:26:45 -07:00
fakes
fixtures/plugins/example-dashboard/dashboard	feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383 )	2026-06-02 12:37:40 -04:00
gateway	fix(gateway): defer cross-process cache cleanup off the cache lock (#52197 ) (#52761 )	2026-06-25 18:58:47 -07:00
hermes_cli	fix(cron): detect partial job loss in restore_cron_jobs_if_emptied (#52144 )	2026-06-25 18:49:18 -07:00
hermes_state	fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes	2026-06-25 16:29:09 -07:00
honcho_plugin	feat(memory): Honcho OAuth connect — desktop and CLI flows + token refresh (#44335 )	2026-06-22 19:16:47 -05:00
integration	refactor(gateway): migrate Home Assistant adapter to bundled plugin	2026-06-06 11:46:24 -07:00
openviking_plugin	feat(openviking): add full recall prefetch policy	2026-06-24 18:53:49 +05:30
plugins	fix shape	2026-06-25 12:38:33 -07:00
providers	fix(models): pass model.base_url to fetch_models in /model picker	2026-06-16 13:09:40 -07:00
run_agent	feat(moa): expose MoA presets as selectable virtual models (#46081 )	2026-06-25 13:52:06 -07:00
scripts	fix(skills-hub): stop shipping a degenerate index when GitHub taps collapse (#42347 )	2026-06-08 15:21:28 -07:00
skills	feat(skills): add cloudflare-temporary-deploy optional skill (#50849 )	2026-06-22 12:14:30 -07:00
stress	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
tools	fix(approval): fold Windows absolute home paths in dangerous-command detection	2026-06-25 17:49:39 -07:00
tui_gateway	feat(gateway): build authoritative project tree	2026-06-25 16:40:27 -05:00
website	feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143 )	2026-06-01 19:52:28 -07:00
__init__.py
conftest.py	feat(managed-scope): add managed_scope module (resolver, loaders, key helpers)	2026-06-19 07:46:33 -07:00
run_interrupt_test.py
test_account_usage.py
test_assistant_ui_tap_compat.py	test(deps): guard @assistant-ui cluster on one tap version	2026-06-15 11:55:02 -04:00
test_atomic_replace_symlinks.py	fix(utils): copy fallback for atomic replace across devices (#43852 )	2026-06-13 14:50:05 -07:00
test_base_url_hostname.py
test_batch_runner_checkpoint.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_bitwarden_secrets.py	fix(bitwarden): prevent zip-slip path traversal when extracting bws binary (#40569 )	2026-06-06 18:33:44 -07:00
test_cli_file_drop.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_cli_manual_compress.py
test_cli_skin_integration.py
test_code_skew.py	fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError	2026-06-24 04:16:54 +05:30
test_ctx_halving_fix.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_dashboard_sidecar_close_on_disconnect.py	fix(dashboard): hide sidecar sessions from history (#49269 )	2026-06-19 18:06:38 -04:00
test_delegate_cascade_49148.py	fix(agent): stop delegate cascade from deleting the parent session	2026-06-21 12:09:16 -07:00
test_desktop_electron_pin.py	fix(desktop): resolve electronDist dynamically + self-heal blocked installs (supersedes #48081/#48082) (#48091 )	2026-06-17 18:48:35 -05:00
test_desktop_mac_entitlements.py	test(desktop): assert macOS device entitlements are inherited	2026-06-03 07:32:00 +07:00
test_dispatch_session_id.py	fix(dispatch): forward session_id into registry.dispatch (#28479 )	2026-06-14 00:27:59 -04:00
test_docker_home_override_scripts.py	Repair cron ownership on container restart (#41976 )	2026-06-10 15:32:34 +10:00
test_docker_stage2_browser_discovery.py	fix(docker): discover Playwright headless_shell browser (#35717 )	2026-06-01 16:06:44 +10:00
test_docker_webui_install_surface.py	fix(docker): support WebUI installs from read-only sources (#48541 )	2026-06-19 10:52:16 +10:00
test_dockerfile_tini_compat_shim.py	fix(docker): add /usr/bin/tini compatibility shim for legacy wrappers (#34192 ) (#34382 )	2026-06-01 13:32:55 +10:00
test_empty_model_fallback.py	test(models): guard Nous silent default against expensive-flagship escalation	2026-06-05 02:54:34 -07:00
test_empty_session_hygiene.py	fix: in-memory transcript blocks empty-session prune	2026-06-10 17:37:34 -07:00
test_env_loader_secret_sources.py	fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271 )	2026-05-25 15:18:55 -07:00
test_evidence_store.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_gateway_streaming_nested_config.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_get_tool_definitions_cache_isolation.py	fix(gateway): close residual memory-leak sites under heavy scheduled workload	2026-06-08 06:32:42 -07:00
test_hermes_bootstrap.py	fix(tui): stop a cwd package named utils/proxy/ui from crashing the gateway child (#51693 )	2026-06-23 23:29:45 -07:00
test_hermes_constants.py	fix(browser): validate agent-browser is runnable, not just present (#51740 )	2026-06-24 00:14:49 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py	refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins	2026-06-20 10:26:45 -07:00
test_hermes_state.py	Merge pull request #49037 from NousResearch/bb/projects-paradigm	2026-06-25 17:49:05 -05:00
test_hermes_state_compression_locks.py	fix(compression): prevent session-id fork from concurrent compressions (#34351 )	2026-05-28 21:40:39 -07:00
test_hermes_state_wal_fallback.py	fix(kanban): skip redundant WAL pragma on already-WAL connections	2026-05-27 14:31:55 -07:00
test_honcho_client_concurrency.py	fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759 ) (#42150 )	2026-06-08 09:35:22 -07:00
test_honcho_client_config.py	fix(honcho): harden self-hosted setup paths	2026-05-29 22:29:48 -07:00
test_honcho_session_context.py	fix(honcho): align user context peer perspective	2026-05-27 10:49:33 -07:00
test_honcho_startup_fail_open.py	fix: make Honcho startup fail open	2026-06-01 20:13:42 -07:00
test_install_no_initial_commit.py	fix(install): move broken checkout aside instead of deleting it	2026-06-08 02:18:21 -07:00
test_install_ps1_native_stderr_eap.py	fix(install): fail fast when uv venv genuinely fails under relaxed EAP	2026-06-18 22:11:35 +05:30
test_install_ps1_python_fallback_venv.py	test(installer): lock Python-fallback propagation into the venv stage (#50769 )	2026-06-23 21:33:08 -07:00
test_install_ps1_uv_powershell_host.py	test(install): lock uv installer to a resolved PowerShell host	2026-06-18 16:26:34 +07:00
test_install_sh_browser_install.py	test(install): assert no system-browser auto-detect + snap override repair	2026-06-23 10:38:15 -07:00
test_install_sh_install_method_stamp.py	fix(update): scope install-method stamp to the code tree, not $HERMES_HOME (#48188 )	2026-06-18 14:14:41 +10:00
test_install_sh_node_global_prefix.py	fix(install): repair existing managed-Node global prefix on re-run	2026-06-14 17:34:11 +07:00
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py	test(install): harden uv-python-path regression test against future drift	2026-05-27 13:55:51 -07:00
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_install_sh_termux_network_prereqs.py
test_install_unmerged_index.py	fix(install): harden venv-resident process sweep on Windows	2026-06-24 13:25:44 -04:00
test_ipv4_preference.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_lazy_session_regressions.py	fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884 )	2026-06-24 23:51:31 +05:30
test_lint_config.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_live_system_guard_self_test.py
test_mcp_serve.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_minisweagent_path.py
test_model_forces_max_completion_tokens.py	fix(params): send max_completion_tokens for newer OpenAI families on custom endpoints	2026-06-09 23:22:10 -07:00
test_model_picker_scroll.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_model_tools.py	feat(moa): expose MoA presets as selectable virtual models (#46081 )	2026-06-25 13:52:06 -07:00
test_model_tools_async_bridge.py	fix(web): run URL SSRF checks off the event loop in async paths	2026-06-04 18:04:47 -07:00
test_ollama_num_ctx.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_output_cap_parsing.py	test(agent): cover char-based output-cap overflow parsing (#42741 )	2026-06-09 03:17:12 -07:00
test_package_json_lazy_deps.py
test_packaging_metadata.py	feat(mcp-catalog): add official Unreal Engine 5.8 MCP server	2026-06-18 09:16:40 -07:00
test_plugin_skills.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_plugin_utils.py	fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759 ) (#42150 )	2026-06-08 09:35:22 -07:00
test_process_loop_event_loop_warning.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_project_metadata.py	fix(deps): align anthropic extra pin with lazy pin + guard whole pin surface (#42335 )	2026-06-08 12:11:54 -07:00
test_retry_utils.py	fix: handle named custom providers and Z.AI overload retries	2026-06-25 00:17:17 -07:00
test_run_tests_parallel.py	fix(ci): remove pytest-timeout, use per-file timeout only	2026-06-12 13:42:42 -04:00
test_sanitize_tool_error.py
test_slash_worker_watchdog.py	feat(slash-worker): self-terminate on parent death via create_time watchdog	2026-06-08 07:03:12 -07:00
test_sql_injection.py
test_stale_utils_module_import.py	fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError	2026-06-24 04:16:54 +05:30
test_state_db_malformed_repair.py	fix(state.db): recover from malformed sqlite_master so hidden sessions reappear (#43149 )	2026-06-09 18:49:08 -05:00
test_subprocess_home_isolation.py	fix: make profile subprocess HOME policy explicit	2026-06-14 03:20:21 -07:00
test_termux_all_extra_compat.py
test_timezone.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_toolset_distributions.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_toolsets.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_trajectory_compressor.py	fix(research): keep tool_call/tool_response pairs intact when compressing trajectories	2026-06-07 05:01:27 -07:00
test_trajectory_compressor_async.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py	test: stub has_hook in transform_tool_result hook tests	2026-06-03 06:36:46 -07:00
test_tui_gateway_queue_on_busy.py	fix(tui_gateway): queue mid-turn prompts instead of dropping them on a busy retry	2026-06-25 12:29:49 -05:00
test_tui_gateway_server.py	fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes	2026-06-25 16:29:09 -07:00
test_tui_gateway_ws.py	feat(desktop): composer status stack, live subagent windows, editable prompts (#44630 )	2026-06-12 08:30:06 -05:00
test_tui_mcp_late_refresh.py	fix(tui): refresh tool snapshot when MCP discovery lands after agent build (#48403 )	2026-06-18 05:41:23 -07:00
test_utils_truthy_values.py
test_web_server.py	fix(dashboard): serve uvicorn on SelectorEventLoop on Windows (#50641 ) (#51717 )	2026-06-23 23:43:24 -07:00
test_wheel_locales_e2e.py	fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383 )	2026-06-03 12:00:27 -07:00
test_yaml_indent_consistency_31999.py	fix(utils): unify YAML list indent across all config writers (#31999 )	2026-06-25 23:27:44 +05:30
test_yuanbao_integration.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_yuanbao_markdown.py
test_yuanbao_pipeline.py	feat(Yuanbao): support wechat forward msg (#43508 )	2026-06-12 02:06:47 -07:00
test_yuanbao_proto.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_yuanbao_shutdown.py	fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (#40607 )	2026-06-07 17:49:38 -07:00