hermes-agent/tests
Teknium 80a899a8e2
fix: enable fine-grained tool streaming for Claude/OpenRouter + retry SSE errors (#3497)
Root cause: Anthropic buffers entire tool call arguments and goes silent
for minutes while thinking (verified: 167s gap with zero SSE events on
direct API).  OpenRouter's upstream proxy times out after ~125s of
inactivity and drops the connection with 'Network connection lost'.

Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14
header for Claude models on OpenRouter.  This makes Anthropic stream
tool call arguments token-by-token instead of buffering them, keeping
the connection alive through OpenRouter's proxy.

Live-tested: the exact prompt that consistently failed at ~128s now
completes successfully — 2,972 lines written, 49K tokens, 8 minutes.

Additional improvements:

1. Send explicit max_tokens for Claude through OpenRouter.  Without it,
   OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) —
   only half of Opus 4.6's 128K limit.

2. Classify SSE 'Network connection lost' as retryable in the streaming
   inner retry loop.  The OpenAI SDK raises APIError from SSE error
   events, which was bypassing our transient error retry logic.

3. Actionable diagnostic guidance when stream-drop retries exhaust.
2026-03-28 08:01:37 -07:00
..
acp fix(acp): preserve session provider when switching models 2026-03-21 15:54:10 -07:00
agent feat: tool-use enforcement + strip budget warnings from history (#3528) 2026-03-28 07:38:36 -07:00
cron fix(cron): prevent recurring job re-fire on gateway crash/restart loop (#3396) 2026-03-27 08:02:58 -07:00
fakes fix: streaming tool call parsing, error handling, and fake HA state mutation 2026-03-14 14:27:20 +03:00
gateway fix(gateway): scope progress thread fallback to Slack only (salvage #3414) (#3488) 2026-03-27 22:37:53 -07:00
hermes_cli fix(gateway): include user-local bin paths in systemd unit PATH (#3527) 2026-03-28 07:47:40 -07:00
honcho_integration feat(honcho): instance-local config via HERMES_HOME, default session strategy to per-directory 2026-03-21 09:34:00 -07:00
integration refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804) 2026-03-24 07:30:25 -07:00
skills feat(migration): comprehensive OpenClaw migration v2 — 17 new modules, terminal recap (#2906) 2026-03-24 19:44:02 -07:00
tools fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449) 2026-03-27 15:28:19 -07:00
__init__.py A bit of restructuring for simplicity and organization 2025-10-01 23:29:25 +00:00
conftest.py fix(approval): show full command in dangerous command approval (#1553) 2026-03-17 02:02:33 -07:00
run_interrupt_test.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_413_compression.py feat: improve context compaction handoff summaries (#1273) 2026-03-14 02:33:31 -07:00
test_860_dedup.py fix: eliminate 3x SQLite message duplication in gateway sessions (#860) 2026-03-10 15:22:44 -07:00
test_1630_context_overflow_loop.py fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files (#1630, #1558) 2026-03-17 01:50:59 -07:00
test_agent_guardrails.py feat: pre-call sanitization and post-call tool guardrails (#1732) 2026-03-17 04:24:27 -07:00
test_agent_loop.py fix: salvage gateway dedup and executor cleanup from PR #993 2026-03-14 11:03:20 -07:00
test_agent_loop_tool_calling.py fix: skip hanging tests + add global test timeout 2026-03-12 01:23:28 -07:00
test_agent_loop_vllm.py test: restore vllm integration coverage and add dict-args regression 2026-03-15 08:02:29 -07:00
test_anthropic_adapter.py fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426) 2026-03-27 13:02:52 -07:00
test_anthropic_error_handling.py fix(ci): pin acp <0.9 and update retry-exhaust test (#3320) 2026-03-26 19:21:34 -07:00
test_anthropic_oauth_flow.py fix: preflight Anthropic auth and prefer Claude store 2026-03-14 19:38:55 -07:00
test_anthropic_provider_persistence.py fix: preflight Anthropic auth and prefer Claude store 2026-03-14 19:38:55 -07:00
test_api_key_providers.py feat: curate HF model picker with OpenRouter analogues (#3440) 2026-03-27 13:54:46 -07:00
test_async_httpx_del_neuter.py fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398) 2026-03-27 09:45:25 -07:00
test_atomic_json_write.py test: cover atomic temp cleanup on interrupts 2026-03-14 22:31:51 -07:00
test_atomic_yaml_write.py test: cover atomic temp cleanup on interrupts 2026-03-14 22:31:51 -07:00
test_auth_codex_provider.py refactor(auth): transition Codex OAuth tokens to Hermes auth store 2026-03-01 19:59:24 -08:00
test_auth_nous_provider.py Fix nous refresh token rotation failure in case where api key mint/retrieval fails 2026-03-02 17:18:15 +11:00
test_auxiliary_config_bridge.py feat(compression): add summary_base_url + move compression config to YAML-only 2026-03-17 04:46:15 -07:00
test_batch_runner_checkpoint.py fix: sanitize chat payloads and provider precedence 2026-03-13 23:59:12 -07:00
test_cli_approval_ui.py fix(cli): repair dangerous command approval UI 2026-03-14 11:57:44 -07:00
test_cli_background_tui_refresh.py fix(cli): refresh TUI before background task output to prevent status bar overlap (#3048) 2026-03-25 15:00:33 -07:00
test_cli_extension_hooks.py refactor(cli): add protected TUI extension hooks for wrapper CLIs 2026-03-21 09:42:07 -07:00
test_cli_init.py feat(cli): configurable busy input mode + fix /queue always working (#3298) 2026-03-26 17:58:40 -07:00
test_cli_interrupt_subagent.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_cli_loading_indicator.py fix(cli): add loading indicators for slow slash commands 2026-03-10 17:31:00 -07:00
test_cli_mcp_config_watch.py fix: auto-reload MCP tools when mcp_servers config changes without restart (#1474) 2026-03-15 19:03:34 -07:00
test_cli_new_session.py fix: complete session reset — missing compressor counters + test 2026-03-20 04:35:17 -07:00
test_cli_plan_command.py fix: save /plan output in workspace (#1381) 2026-03-14 21:28:51 -07:00
test_cli_prefix_matching.py feat: add /tools disable/enable/list slash commands with session reset (#1652) 2026-03-17 02:05:26 -07:00
test_cli_preloaded_skills.py fix: move activated skills line below welcome text 2026-03-23 06:20:19 -07:00
test_cli_provider_resolution.py feat: overhaul context length detection with models.dev and provider-aware resolution (#2158) 2026-03-20 06:04:33 -07:00
test_cli_retry.py test: lock retry replacement semantics 2026-03-14 21:19:22 -07:00
test_cli_secret_capture.py feat: secure skill env setup on load (core #688) 2026-03-13 03:14:04 -07:00
test_cli_skin_integration.py fix(test): add missing voice state attrs to CLI stub in skin tests 2026-03-14 15:00:45 +03:00
test_cli_status_bar.py fix(tui): status bar duplicates and degrades during long sessions (#3291) 2026-03-26 17:33:11 -07:00
test_cli_tools_command.py feat: add /tools disable/enable/list slash commands with session reset (#1652) 2026-03-17 02:05:26 -07:00
test_codex_execution_paths.py feat: simple fallback model for provider resilience 2026-03-08 20:22:33 -07:00
test_codex_models.py fix: add codex forward-compat model listing 2026-03-13 21:34:01 -07:00
test_compression_boundary.py fix(agent): prevent silent tool result loss during context compression (#1993) 2026-03-18 15:22:51 -07:00
test_compressor_fallback_update.py fix(agent): update context compressor limits after fallback activation (#3305) 2026-03-26 18:10:50 -07:00
test_config_env_expansion.py feat(config): support ${ENV_VAR} substitution in config.yaml (#2684) 2026-03-23 16:02:06 -07:00
test_context_pressure.py fix: cap context pressure percentage at 100% in display (#3480) 2026-03-27 21:42:09 -07:00
test_context_references.py fix(context): restrict @ references to safe workspace paths (#2601) 2026-03-23 06:40:05 -07:00
test_context_token_tracking.py fix(tests): resolve all consistently failing tests 2026-03-22 05:58:26 -07:00
test_crossloop_client_cache.py fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701) 2026-03-25 17:31:56 -07:00
test_dict_tool_call_args.py test: restore vllm integration coverage and add dict-args regression 2026-03-15 08:02:29 -07:00
test_display.py fix: add upstream guard for non-dict function_args + tests for build_tool_preview 2026-03-09 21:01:40 -07:00
test_evidence_store.py feat: add OSS Security Forensics skill (Skills Hub) (#1482) 2026-03-15 21:59:53 -07:00
test_exit_cleanup_interrupt.py fix: catch KeyboardInterrupt in exit cleanup handlers (#3257) 2026-03-26 14:34:31 -07:00
test_external_credential_detection.py refactor(auth): transition Codex OAuth tokens to Hermes auth store 2026-03-01 19:59:24 -08:00
test_fallback_model.py feat: upgrade MiniMax default to M2.7 + add new OpenRouter models 2026-03-18 02:42:58 -07:00
test_file_permissions.py security: enforce 0600/0700 file permissions on sensitive files (inspired by openclaw) 2026-03-09 02:19:32 -07:00
test_flush_memories_codex.py fix: update all test mocks for call_llm migration 2026-03-11 21:06:54 -07:00
test_hermes_state.py feat(sessions): add --source flag for third-party session isolation (#3255) 2026-03-26 14:35:31 -07:00
test_honcho_client_config.py fix(honcho): auto-enable when API key is present 2026-03-01 03:12:37 -05:00
test_insights.py feat: add route-aware pricing estimates (#1695) 2026-03-17 03:44:44 -07:00
test_interactive_interrupt.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_interrupt_propagation.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_managed_server_tool_support.py test: fix stale CI assumptions in parser and quick-command coverage (#1236) 2026-03-13 21:56:12 -07:00
test_minisweagent_path.py chore: remove all remaining mini-swe-agent references 2026-03-24 08:19:23 -07:00
test_model_metadata_local_ctx.py fix: prefer loaded instance context size over max for LM Studio 2026-03-19 21:24:53 +01:00
test_model_provider_persistence.py feat: integrate GitHub Copilot providers across Hermes 2026-03-17 23:40:22 -07:00
test_model_tools.py test: strengthen assertions across 3 more test files (batch 2) 2026-03-05 18:46:30 -08:00
test_model_tools_async_bridge.py fix: use per-thread persistent event loops in worker threads 2026-03-20 15:41:06 -04:00
test_openai_client_lifecycle.py fix: audit fixes — 5 bugs found and resolved 2026-03-16 06:35:46 -07:00
test_personality_none.py feat(cli,gateway): add /personality none and custom personality support 2026-03-09 17:31:54 +03:00
test_plugins.py fix(tests): resolve all consistently failing tests 2026-03-22 05:58:26 -07:00
test_plugins_cmd.py feat(cli): add hermes plugins install/remove/list command 2026-03-21 09:47:33 -07:00
test_provider_parity.py fix: align Nous Portal model slugs with OpenRouter naming (#3253) 2026-03-26 13:49:43 -07:00
test_quick_commands.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_real_interrupt_subagent.py fix: thread safety for concurrent subagent delegation (#1672) 2026-03-17 02:53:33 -07:00
test_reasoning_command.py fix: prevent reasoning box from rendering 3x during tool-calling loops (#3405) 2026-03-27 09:57:50 -07:00
test_redirect_stdout_issue.py fix: use session_key instead of chat_id for adapter interrupt lookups 2026-03-12 08:35:45 -07:00
test_resume_display.py feat: display previous messages when resuming a session in CLI 2026-03-08 17:45:45 -07:00
test_run_agent.py fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries (#3444) 2026-03-27 15:29:30 -07:00
test_run_agent_codex_responses.py fix(codex): handle reasoning-only responses and replay path (#2070) 2026-03-19 10:34:44 -07:00
test_runtime_provider_resolution.py fix: alibaba provider default endpoint and model list (#3484) 2026-03-27 22:10:10 -07:00
test_session_reset_fix.py fix(session): clear compressor summary and turn counter on /clear and /new (#3102) 2026-03-25 18:22:21 -07:00
test_setup_model_selection.py fix(setup): remove dead code causing is_coding_plan NameError crash 2026-03-13 04:42:26 +03:00
test_sql_injection.py fix(security): eliminate SQL string formatting in execute() calls 2026-03-19 15:16:35 +01:00
test_streaming.py fix: enable fine-grained tool streaming for Claude/OpenRouter + retry SSE errors (#3497) 2026-03-28 08:01:37 -07:00
test_timezone.py fix: skip stale cron jobs on gateway restart instead of firing immediately 2026-03-16 23:48:14 -07:00
test_tool_call_parsers.py fix(mistral-parser): handle nested JSON in fallback extraction 2026-03-21 09:41:17 -07:00
test_toolset_distributions.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_toolsets.py fix: add missing Platform.SIGNAL to toolset mappings, update test + config docs 2026-03-09 23:27:19 -07:00
test_trajectory_compressor.py fix: harden trajectory compressor summary content handling 2026-03-14 11:03:25 -07:00
test_worktree.py fix: harden salvaged worktree include checks 2026-03-14 21:51:27 -07:00
test_worktree_security.py fix: harden salvaged worktree include checks 2026-03-14 21:51:27 -07:00