hermes-agent/tests
Teknium 1fa44180b0
fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
..
acp fix(ci): rip out some xdist legacy stuff... how did these ever work?? 2026-06-26 19:15:18 -07:00
acp_adapter
agent test(auxiliary): cover env-only proxy policy for auxiliary clients (#53702) 2026-06-27 21:22:49 -07:00
ci fix(ci): classify should default to no MCP 2026-06-23 10:32:27 -07:00
cli feat(moa): show each reference model's output as a labelled block before the aggregator (#53793) 2026-06-27 12:45:23 -07:00
computer_use feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842) 2026-06-22 09:57:16 -07:00
cron fix(cron): make per-profile cron isolation intentional and tested (#4707) (#53570) 2026-06-27 03:55:01 -07:00
docker change(ci): migrate docker smoketests to real tests 2026-06-26 19:15:18 -07:00
e2e refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins 2026-06-20 10:26:45 -07:00
fakes
fixtures/plugins/example-dashboard/dashboard feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383) 2026-06-02 12:37:40 -04:00
gateway test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
hermes_cli feat(desktop): config-driven Electron launch flags + GPU policy 2026-06-27 22:26:43 -07:00
hermes_state fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes 2026-06-25 16:29:09 -07:00
honcho_plugin feat(memory): Honcho OAuth connect — desktop and CLI flows + token refresh (#44335) 2026-06-22 19:16:47 -05:00
integration refactor(gateway): migrate Home Assistant adapter to bundled plugin 2026-06-06 11:46:24 -07:00
openviking_plugin feat(openviking): add full recall prefetch policy 2026-06-24 18:53:49 +05:30
plugins fix(byterover): honor auto extract config 2026-06-27 04:04:15 -07:00
providers fix(models): pass model.base_url to fetch_models in /model picker 2026-06-16 13:09:40 -07:00
run_agent fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007) 2026-06-27 22:52:25 -07:00
scripts revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
skills feat(skills): add cloudflare-temporary-deploy optional skill (#50849) 2026-06-22 12:14:30 -07:00
stress chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
tools fix(skills): skip shadowing when external_dirs provides the skill 2026-06-27 21:07:53 -07:00
tui_gateway fix(tui): route completion RPCs to the pool so they can't freeze the TUI (#53895) 2026-06-27 19:06:01 -07:00
website feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143) 2026-06-01 19:52:28 -07:00
__init__.py
conftest.py feat(managed-scope): add managed_scope module (resolver, loaders, key helpers) 2026-06-19 07:46:33 -07:00
run_interrupt_test.py
test_account_usage.py
test_assistant_ui_tap_compat.py test(deps): guard @assistant-ui cluster on one tap version 2026-06-15 11:55:02 -04:00
test_atomic_replace_symlinks.py fix(utils): copy fallback for atomic replace across devices (#43852) 2026-06-13 14:50:05 -07:00
test_base_url_hostname.py
test_batch_runner_checkpoint.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_bitwarden_secrets.py fix(bitwarden): prevent zip-slip path traversal when extracting bws binary (#40569) 2026-06-06 18:33:44 -07:00
test_cli_file_drop.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cli_manual_compress.py
test_cli_skin_integration.py
test_code_skew.py fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
test_ctx_halving_fix.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_dashboard_sidecar_close_on_disconnect.py fix(dashboard): hide sidecar sessions from history (#49269) 2026-06-19 18:06:38 -04:00
test_delegate_cascade_49148.py fix(agent): stop delegate cascade from deleting the parent session 2026-06-21 12:09:16 -07:00
test_desktop_electron_pin.py fix(desktop): resolve electronDist dynamically + self-heal blocked installs (supersedes #48081/#48082) (#48091) 2026-06-17 18:48:35 -05:00
test_desktop_mac_entitlements.py test(desktop): assert macOS device entitlements are inherited 2026-06-03 07:32:00 +07:00
test_dispatch_session_id.py fix(dispatch): forward session_id into registry.dispatch (#28479) 2026-06-14 00:27:59 -04:00
test_empty_model_fallback.py test(models): guard Nous silent default against expensive-flagship escalation 2026-06-05 02:54:34 -07:00
test_empty_session_hygiene.py fix: in-memory transcript blocks empty-session prune 2026-06-10 17:37:34 -07:00
test_env_loader_secret_sources.py
test_evidence_store.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_gateway_streaming_nested_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_get_tool_definitions_cache_isolation.py fix(gateway): close residual memory-leak sites under heavy scheduled workload 2026-06-08 06:32:42 -07:00
test_hermes_bootstrap.py revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
test_hermes_constants.py revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins 2026-06-20 10:26:45 -07:00
test_hermes_state.py Merge pull request #49037 from NousResearch/bb/projects-paradigm 2026-06-25 17:49:05 -05:00
test_hermes_state_compression_locks.py fix(compression): prevent session-id fork from concurrent compressions (#34351) 2026-05-28 21:40:39 -07:00
test_hermes_state_wal_fallback.py fix(kanban): skip redundant WAL pragma on already-WAL connections 2026-05-27 14:31:55 -07:00
test_honcho_client_concurrency.py fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759) (#42150) 2026-06-08 09:35:22 -07:00
test_honcho_client_config.py fix(honcho): harden self-hosted setup paths 2026-05-29 22:29:48 -07:00
test_honcho_session_context.py
test_honcho_startup_fail_open.py fix: make Honcho startup fail open 2026-06-01 20:13:42 -07:00
test_install_lockfile_churn.py fix(install): discard managed lockfile churn before stashing 2026-06-25 23:49:11 -07:00
test_install_no_initial_commit.py fix(install): move broken checkout aside instead of deleting it 2026-06-08 02:18:21 -07:00
test_install_ps1_native_stderr_eap.py fix(install): fail fast when uv venv genuinely fails under relaxed EAP 2026-06-18 22:11:35 +05:30
test_install_ps1_python_fallback_venv.py test(installer): lock Python-fallback propagation into the venv stage (#50769) 2026-06-23 21:33:08 -07:00
test_install_ps1_uv_powershell_host.py test(install): lock uv installer to a resolved PowerShell host 2026-06-18 16:26:34 +07:00
test_install_sh_browser_install.py test(install): assert no system-browser auto-detect + snap override repair 2026-06-23 10:38:15 -07:00
test_install_sh_install_method_stamp.py fix(update): scope install-method stamp to the code tree, not $HERMES_HOME (#48188) 2026-06-18 14:14:41 +10:00
test_install_sh_node_global_prefix.py fix(hermes): heal broken managed Node tree instead of PATH fallback 2026-06-26 20:10:20 +05:30
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_install_sh_termux_network_prereqs.py
test_install_unmerged_index.py fix(install): discard managed lockfile churn before stashing 2026-06-25 23:49:11 -07:00
test_ipv4_preference.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_lazy_session_regressions.py fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884) 2026-06-24 23:51:31 +05:30
test_lint_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_live_system_guard_self_test.py
test_mcp_serve.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_minisweagent_path.py
test_model_forces_max_completion_tokens.py fix(params): send max_completion_tokens for newer OpenAI families on custom endpoints 2026-06-09 23:22:10 -07:00
test_model_picker_scroll.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_model_tools.py feat(moa): expose MoA presets as selectable virtual models (#46081) 2026-06-25 13:52:06 -07:00
test_model_tools_async_bridge.py fix(web): run URL SSRF checks off the event loop in async paths 2026-06-04 18:04:47 -07:00
test_ollama_num_ctx.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_output_cap_parsing.py test(agent): cover char-based output-cap overflow parsing (#42741) 2026-06-09 03:17:12 -07:00
test_package_json_lazy_deps.py
test_packaging_metadata.py feat(mcp-catalog): add official Unreal Engine 5.8 MCP server 2026-06-18 09:16:40 -07:00
test_plugin_skills.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_plugin_utils.py fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759) (#42150) 2026-06-08 09:35:22 -07:00
test_process_loop_event_loop_warning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_project_metadata.py fix(deps): align anthropic extra pin with lazy pin + guard whole pin surface (#42335) 2026-06-08 12:11:54 -07:00
test_retry_utils.py fix: handle named custom providers and Z.AI overload retries 2026-06-25 00:17:17 -07:00
test_run_tests_parallel.py fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008) 2026-06-27 22:43:26 -07:00
test_sanitize_tool_error.py
test_setup_temporary_outputs.py refactor(ci): rewrite docker tests to check built container 2026-06-26 19:15:18 -07:00
test_slash_worker_watchdog.py feat(slash-worker): self-terminate on parent death via create_time watchdog 2026-06-08 07:03:12 -07:00
test_sql_injection.py
test_stale_utils_module_import.py fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
test_state_db_malformed_repair.py fix(state): detect and repair FTS write corruption that silently drops gateway history (#52798) 2026-06-25 21:18:41 -07:00
test_subprocess_home_isolation.py fix: make profile subprocess HOME policy explicit 2026-06-14 03:20:21 -07:00
test_termux_all_extra_compat.py
test_timezone.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolset_distributions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_toolsets.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_trajectory_compressor.py fix(research): keep tool_call/tool_response pairs intact when compressing trajectories 2026-06-07 05:01:27 -07:00
test_trajectory_compressor_async.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py test: stub has_hook in transform_tool_result hook tests 2026-06-03 06:36:46 -07:00
test_tui_gateway_queue_on_busy.py fix(tui_gateway): queue mid-turn prompts instead of dropping them on a busy retry 2026-06-25 12:29:49 -05:00
test_tui_gateway_server.py test(tui-gateway): stop deferred-resume build thread leaking into next test 2026-06-27 21:07:53 -07:00
test_tui_gateway_ws.py feat(desktop): composer status stack, live subagent windows, editable prompts (#44630) 2026-06-12 08:30:06 -05:00
test_tui_mcp_late_refresh.py fix(tui): refresh tool snapshot when MCP discovery lands after agent build (#48403) 2026-06-18 05:41:23 -07:00
test_utils_truthy_values.py
test_web_server.py fix(dashboard): serve uvicorn on SelectorEventLoop on Windows (#50641) (#51717) 2026-06-23 23:43:24 -07:00
test_wheel_locales_e2e.py fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) 2026-06-03 12:00:27 -07:00
test_yaml_indent_consistency_31999.py fix(utils): unify YAML list indent across all config writers (#31999) 2026-06-25 23:27:44 +05:30
test_yuanbao_integration.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_markdown.py
test_yuanbao_pipeline.py feat(Yuanbao): support wechat forward msg (#43508) 2026-06-12 02:06:47 -07:00
test_yuanbao_proto.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_yuanbao_shutdown.py fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (#40607) 2026-06-07 17:49:38 -07:00