hermes-agent/tests
Teknium c9094f5e5f
fix(stream): don't report dropped mid-tool-call streams as output truncation (#42314)
* fix(stream): don't report dropped mid-tool-call streams as output truncation

A streaming tool call whose SSE ends with no finish_reason (the upstream
delivers the tool name + opening '{' then closes the connection cleanly,
no terminator, no [DONE]) was stamped finish_reason='length' by the mock
builder. That routed it through the output-cap truncation path: 3 useless
max_tokens-boosted retries, then the misleading 'Response truncated due to
output length limit' error — even though the model never reported hitting
any cap.

Reproduced live on nvidia/nemotron-3-ultra:free via the Nous dedicated
endpoint, which stalls/drops during large tool-arg generation (50s-4m41s).

Now: when tool args are incomplete AND the provider sent no finish_reason,
tag the response as a partial-stream stub so the loop reports an honest
mid-tool-call drop and asks the model to chunk its output (existing
continuation machinery), instead of escalating output budget and lying.
A provider-reported finish_reason='length' still takes the real-truncation
path unchanged.

* test(stream): update truncated-tool-args test for drop-vs-cap split

test_truncated_tool_call_args_upgrade_finish_reason_to_length pinned the
old behaviour where ANY incomplete tool args → finish_reason='length' with
tool_calls preserved. That single-chunk-no-finish_reason scenario is exactly
the mid-tool-call stream drop now reclassified as a partial-stream stub.

Split into two tests matching the new contract:
- no finish_reason + incomplete args → PARTIAL_STREAM_STUB_ID, tool_calls=None,
  _dropped_tool_names set (the drop path)
- explicit finish_reason='length' + incomplete args → tool_calls preserved,
  'length' upgrade unchanged (the genuine output-cap path)
2026-06-08 11:56:10 -07:00
..
acp feat(acp): emit session provenance metadata for compression rotation (#41724) 2026-06-07 22:22:21 -07:00
acp_adapter
agent fix(anthropic): strip Responses-only kwargs before Messages SDK call (#31673) (#42155) 2026-06-08 09:36:38 -07:00
cli refactor(cli): extract 32 slash-command handlers into CLICommandsMixin (god-file Phase 4) 2026-06-08 02:13:07 -07:00
cron feat(cron): title cron sessions from the job, not the [IMPORTANT] hint 2026-06-06 12:51:12 -05:00
docker feat(dashboard): always enable embedded chat; remove dashboard --tui flag 2026-06-04 03:03:35 -07:00
e2e chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
fakes
fixtures/plugins/example-dashboard/dashboard feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383) 2026-06-02 12:37:40 -04:00
gateway fix(gateway): prevent duplicate user messages in state.db 2026-06-08 11:29:53 -07:00
hermes_cli fix(desktop): stop running Hermes.exe locking win-unpacked before Windows pack (#42100) 2026-06-08 11:51:31 -07:00
hermes_state
honcho_plugin fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759) (#42150) 2026-06-08 09:35:22 -07:00
integration refactor(gateway): migrate Home Assistant adapter to bundled plugin 2026-06-06 11:46:24 -07:00
openviking_plugin fix(openviking): add missing /agent/{agent}/ segment to memory URI — fixes #36969 2026-06-04 17:40:33 -07:00
plugins fix(hindsight): send only new-turn delta on append retains instead of whole session (#40605) 2026-06-07 17:41:10 -07:00
providers test(kimi): align stale parity/profile tests with thinking-xor-effort contract (#41095) 2026-06-07 01:52:49 -07:00
run_agent fix(stream): don't report dropped mid-tool-call streams as output truncation (#42314) 2026-06-08 11:56:10 -07:00
scripts
skills fix(google-workspace): fall back to uv when venv has no pip (#39516) 2026-06-05 13:30:02 +10:00
stress chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
tools fix(approval): gate resolved Hermes config paths 2026-06-08 11:55:40 -07:00
tui_gateway fix(desktop): scope in-session /model switch per-session, stop process-env leak (#41120) 2026-06-07 02:33:28 -07:00
website feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143) 2026-06-01 19:52:28 -07:00
__init__.py
conftest.py fix: batch of small robustness/correctness fixes from @kyssta-exe 2026-06-01 19:51:03 -07:00
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py
test_base_url_hostname.py
test_batch_runner_checkpoint.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_bitwarden_secrets.py fix(bitwarden): prevent zip-slip path traversal when extracting bws binary (#40569) 2026-06-06 18:33:44 -07:00
test_cli_file_drop.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cli_manual_compress.py
test_cli_skin_integration.py
test_ctx_halving_fix.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_dashboard_sidecar_close_on_disconnect.py fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) 2026-06-08 10:02:05 -07:00
test_desktop_mac_entitlements.py test(desktop): assert macOS device entitlements are inherited 2026-06-03 07:32:00 +07:00
test_docker_home_override_scripts.py feat(dashboard): always enable embedded chat; remove dashboard --tui flag 2026-06-04 03:03:35 -07:00
test_docker_stage2_browser_discovery.py fix(docker): discover Playwright headless_shell browser (#35717) 2026-06-01 16:06:44 +10:00
test_dockerfile_tini_compat_shim.py fix(docker): add /usr/bin/tini compatibility shim for legacy wrappers (#34192) (#34382) 2026-06-01 13:32:55 +10:00
test_empty_model_fallback.py test(models): guard Nous silent default against expensive-flagship escalation 2026-06-05 02:54:34 -07:00
test_env_loader_secret_sources.py fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271) 2026-05-25 15:18:55 -07:00
test_evidence_store.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_gateway_streaming_nested_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_get_tool_definitions_cache_isolation.py fix(gateway): close residual memory-leak sites under heavy scheduled workload 2026-06-08 06:32:42 -07:00
test_hermes_bootstrap.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_hermes_constants.py fix(constants): use windows native default hermes home 2026-06-03 19:37:29 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py fix(gateway): tolerate Unicode in stderr log handlers on Windows 2026-06-06 19:57:44 -07:00
test_hermes_state.py fix(cron): bound the desktop run-history query to one job (#41088) 2026-06-07 02:41:01 -07:00
test_hermes_state_compression_locks.py fix(compression): prevent session-id fork from concurrent compressions (#34351) 2026-05-28 21:40:39 -07:00
test_hermes_state_wal_fallback.py fix(kanban): skip redundant WAL pragma on already-WAL connections 2026-05-27 14:31:55 -07:00
test_honcho_client_concurrency.py fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759) (#42150) 2026-06-08 09:35:22 -07:00
test_honcho_client_config.py fix(honcho): harden self-hosted setup paths 2026-05-29 22:29:48 -07:00
test_honcho_session_context.py fix(honcho): align user context peer perspective 2026-05-27 10:49:33 -07:00
test_honcho_startup_fail_open.py fix: make Honcho startup fail open 2026-06-01 20:13:42 -07:00
test_install_no_initial_commit.py fix(install): move broken checkout aside instead of deleting it 2026-06-08 02:18:21 -07:00
test_install_sh_browser_install.py
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py test(install): harden uv-python-path regression test against future drift 2026-05-27 13:55:51 -07:00
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_install_sh_termux_network_prereqs.py
test_ipv4_preference.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lazy_session_regressions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lint_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_live_system_guard_self_test.py
test_mcp_serve.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_model_tools.py feat(middleware): add adaptive execution intercepts 2026-06-03 11:22:06 -07:00
test_model_tools_async_bridge.py fix(web): run URL SSRF checks off the event loop in async paths 2026-06-04 18:04:47 -07:00
test_ollama_num_ctx.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_output_cap_parsing.py fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (#40589) 2026-06-07 03:52:09 -07:00
test_package_json_lazy_deps.py
test_packaging_metadata.py fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) 2026-06-03 12:00:27 -07:00
test_plugin_skills.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_plugin_utils.py fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759) (#42150) 2026-06-08 09:35:22 -07:00
test_process_loop_event_loop_warning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_project_metadata.py fix(deps): align aiohttp extras pins with lazy Slack pin (3.13.4) 2026-06-08 11:30:48 -07:00
test_retry_utils.py
test_run_tests_parallel.py
test_sanitize_tool_error.py
test_slash_worker_watchdog.py feat(slash-worker): self-terminate on parent death via create_time watchdog 2026-06-08 07:03:12 -07:00
test_sql_injection.py
test_subprocess_home_isolation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_termux_all_extra_compat.py
test_timezone.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolset_distributions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolsets.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_trajectory_compressor.py fix(research): keep tool_call/tool_response pairs intact when compressing trajectories 2026-06-07 05:01:27 -07:00
test_trajectory_compressor_async.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py test: stub has_hook in transform_tool_result hook tests 2026-06-03 06:36:46 -07:00
test_tui_gateway_server.py fix: resolve rebase conflict in _teardown_session worker cleanup 2026-06-08 10:02:05 -07:00
test_tui_gateway_ws.py fix: resolve rebase conflict in _teardown_session worker cleanup 2026-06-08 10:02:05 -07:00
test_utils_truthy_values.py
test_web_server.py fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) 2026-06-08 10:02:05 -07:00
test_wheel_locales_e2e.py fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) 2026-06-03 12:00:27 -07:00
test_yuanbao_integration.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_markdown.py
test_yuanbao_pipeline.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_proto.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_shutdown.py fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (#40607) 2026-06-07 17:49:38 -07:00