hermes-agent/tests
fesalfayed 64628ea89b fix(anthropic): demote dead thinking signature when orphan-strip mutates the latest turn
Extended-thinking Claude models (4.6+, e.g. Opus 4.8) emit a signed `thinking`
block on assistant turns that also carry parallel `tool_use` blocks. Anthropic
signs that block against the full, original turn content.

When a parallel tool batch is interrupted before every `tool_result` returns,
`_strip_orphaned_tool_blocks` removes the unanswered `tool_use` on replay — which
mutates the turn. The latest-assistant branch of `_manage_thinking_signatures`
then replays the now-stale signed thinking block verbatim, and Anthropic rejects
the request with a non-retryable HTTP 400:

    messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
    assistant message cannot be modified. These blocks must remain as they were
    in the original response.

Because the poisoned turn is rebuilt from the persisted store every turn, the
gateway crash-loops with no self-recovery (a soft session reset does not clear
it). The drifting content index in the error is the changing count of stripped
`tool_use` blocks across rebuilds.

Fix: when orphan-stripping removes a `tool_use` from a turn that also holds a
thinking/redacted_thinking block, flag the turn. `_manage_thinking_signatures`
then demotes every thinking block on that latest turn to a plain text block
(preserving the reasoning text) instead of replaying a signature that can no
longer validate. An intact turn is unaffected — its signed thinking is still
replayed verbatim. The internal flag is stripped before the payload is sent.

Adds two regression tests:
- demotion when an orphaned parallel tool_use is stripped
- control: signed thinking preserved verbatim when nothing is stripped
2026-05-31 06:14:34 -07:00
..
acp chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
acp_adapter feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
agent fix(anthropic): demote dead thinking signature when orphan-strip mutates the latest turn 2026-05-31 06:14:34 -07:00
cli fix(cli): clamp post-compression token sentinel in status bar (#35858) 2026-05-31 06:03:01 -07:00
cron chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
docker fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates 2026-05-29 12:17:12 +10:00
e2e chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
fakes
gateway fix(gateway): detach pending_watchers batch + normalize LRU caches + align test fixtures + AUTHOR_MAP 2026-05-31 00:50:19 -07:00
hermes_cli feat(tools): always show Nous Tool Gateway backends, login on select (#35792) 2026-05-31 03:39:17 -07:00
hermes_state feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590) 2026-05-17 23:28:45 -07:00
honcho_plugin fix(honcho): harden self-hosted setup paths 2026-05-29 22:29:48 -07:00
integration chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
openviking_plugin
plugins feat(kanban): file attachments on tasks (#35395) 2026-05-30 07:41:04 -07:00
providers chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
run_agent revert: drop cumulative-resend tool-arg heuristic from shared streaming path (#35718) (#35860) 2026-05-31 06:14:32 -07:00
scripts feat(acp-registry): switch to uvx distribution, drop npm launcher 2026-05-14 22:27:09 -07:00
skills fix(google-workspace): handle Gmail header casing case-insensitively 2026-05-30 02:38:18 -07:00
stress chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
tools fix(gateway): detach pending_watchers batch + normalize LRU caches + align test fixtures + AUTHOR_MAP 2026-05-31 00:50:19 -07:00
tui_gateway perf(tui): stop slow/dead MCP servers from freezing TUI startup 2026-05-30 02:53:37 -07:00
website
__init__.py
conftest.py fix(gateway,cron): reuse existing _HERMES_GATEWAY marker; tighten cron regex 2026-05-30 23:05:56 -07:00
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py
test_base_url_hostname.py
test_batch_runner_checkpoint.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_bitwarden_secrets.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cli_file_drop.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cli_manual_compress.py fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) 2026-05-18 21:43:59 -07:00
test_cli_skin_integration.py
test_ctx_halving_fix.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_docker_home_override_scripts.py docker: opt in to dashboard --insecure via env var, never derive from bind host 2026-05-29 09:56:40 +10:00
test_empty_model_fallback.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_env_loader_secret_sources.py fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271) 2026-05-25 15:18:55 -07:00
test_evidence_store.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_gateway_streaming_nested_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_get_tool_definitions_cache_isolation.py
test_hermes_bootstrap.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_hermes_constants.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py fix(logging): recover gateway.log handler from external rotation (#34349) 2026-05-28 22:26:00 -07:00
test_hermes_state.py fix(session): survive missing FTS5 runtimes 2026-05-30 18:59:08 -07:00
test_hermes_state_compression_locks.py fix(compression): prevent session-id fork from concurrent compressions (#34351) 2026-05-28 21:40:39 -07:00
test_hermes_state_wal_fallback.py fix(kanban): skip redundant WAL pragma on already-WAL connections 2026-05-27 14:31:55 -07:00
test_honcho_client_config.py fix(honcho): harden self-hosted setup paths 2026-05-29 22:29:48 -07:00
test_honcho_session_context.py fix(honcho): align user context peer perspective 2026-05-27 10:49:33 -07:00
test_install_sh_browser_install.py fix(install): support non-sudo service-user installs on apt distros (#25814) 2026-05-14 09:05:31 -07:00
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py test(install): harden uv-python-path regression test against future drift 2026-05-27 13:55:51 -07:00
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_install_sh_termux_network_prereqs.py
test_ipv4_preference.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lazy_session_regressions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lint_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_live_system_guard_self_test.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
test_mcp_serve.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_model_tools.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_model_tools_async_bridge.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_ollama_num_ctx.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_package_json_lazy_deps.py fix(update): make Camofox lazy-installed instead of eager (#27055) 2026-05-16 12:15:45 -07:00
test_packaging_metadata.py security: pin patched Starlette (>=1.0.1) for CVE-2026-48710 BadHost (#35118) 2026-05-29 23:23:54 -07:00
test_plugin_skills.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_process_loop_event_loop_warning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_project_metadata.py fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted (#34841) 2026-05-29 13:24:12 -07:00
test_retry_utils.py
test_run_tests_parallel.py test: use subprocesses for each test file (#29016) 2026-05-21 16:40:04 +05:30
test_sanitize_tool_error.py security: sanitize tool error strings before injecting into model context (#26823) 2026-05-16 00:57:39 -07:00
test_sql_injection.py
test_subprocess_home_isolation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_termux_all_extra_compat.py
test_timezone.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolset_distributions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolsets.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_trajectory_compressor.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_trajectory_compressor_async.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_tui_gateway_server.py test(tui-gateway): isolate completion_queue in poller requeue test 2026-05-29 13:29:24 -07:00
test_utils_truthy_values.py
test_yuanbao_integration.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_markdown.py
test_yuanbao_pipeline.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_proto.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00