hermes-agent/tests/run_agent
Teknium 1fa44180b0
fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
..
__init__.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
conftest.py ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861) 2026-05-19 17:27:24 -07:00
repro_48013_image_shrink_brick.py fix(agent): accept pixel-correct image downscale when bytes grow (#48013) 2026-06-19 11:37:51 -07:00
test_413_compression.py test(agent): regression for token-only compression progress (#39550, #23767) 2026-06-22 15:26:29 +05:30
test_860_dedup.py fix: harden gateway startup and turn persistence 2026-06-07 02:15:23 -07:00
test_1630_context_overflow_loop.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_18028_content_policy_blocked.py fix(agent): fallback immediately on provider content-policy blocks (#33883) 2026-05-28 07:28:24 -07:00
test_24996_fallback_exhaustion_cooldown.py test(24996): freeze monotonic clock to de-flake fallback cooldown timing 2026-06-27 21:07:53 -07:00
test_28161_anthropic_stream_pool_cleanup.py fix(streaming): rebuild Anthropic client on stream cleanup instead of OpenAI client 2026-06-27 19:37:33 -07:00
test_31273_402_not_retried.py fix(agent): abort on HTTP 402 after pool rotation and fallback fail (#31443) 2026-05-24 15:14:13 -07:00
test_agent_guardrails.py fix(agent): include name field on every role:tool message for Gemini compatibility (#16478) 2026-05-04 05:06:33 -07:00
test_anthropic_prompt_cache_policy.py fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778) 2026-05-12 20:46:04 -07:00
test_anthropic_third_party_oauth_guard.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_anthropic_truncation_continuation.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
test_api_max_retries_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_async_httpx_del_neuter.py fix(auxiliary): honor main fallback chain for auto tasks (#47235) 2026-06-16 06:23:24 -07:00
test_auth_provider_failover.py fix(agent): fail over to fallback provider on persistent auth failure (401/403) 2026-06-20 11:38:01 -07:00
test_background_review.py fix(session): opt the background-review fork out of session finalization 2026-06-21 11:35:09 -07:00
test_background_review_cache_parity.py chore: trim verbose comments/docstrings, add AUTHOR_MAP entry 2026-05-21 12:49:21 +05:30
test_background_review_cost_controls.py feat(background-review): aux-model selector for the self-improvement review (#49252) 2026-06-22 14:54:53 -07:00
test_background_review_summary.py fix(agent): exclude prior-history tool messages from background review summary 2026-04-24 03:10:19 -07:00
test_background_review_toolset_restriction.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_callable_api_key.py refactor(cli): extract agent-construction cluster into CLIAgentSetupMixin (god-file Phase 4) 2026-06-08 09:41:34 -07:00
test_codex_app_server_integration.py fix(codex): seed app-server sessions with configured cwd 2026-06-21 16:39:02 -07:00
test_codex_multimodal_tool_result.py feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955) 2026-05-09 21:06:19 -07:00
test_codex_no_tools_nonetype.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_codex_silent_hang_hint.py fix(codex): update silent-hang workaround hint 2026-05-27 01:52:34 -07:00
test_codex_xai_oauth_recovery.py fix(agent): summarize structured provider error messages 2026-06-18 21:37:52 -07:00
test_commit_memory_session_context_engine.py fix(agent): notify context engine on commit_memory_session (#22764) 2026-05-09 12:28:42 -07:00
test_compress_focus_plugin_fallback.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compression_boundary.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compression_boundary_hook.py test(compression): pin rotation-fallback tests to in_place=False ahead of default flip 2026-06-25 12:56:05 -07:00
test_compression_feasibility.py fix(auxiliary): honor fallback chain when compression provider auth is unavailable 2026-06-25 13:08:18 -07:00
test_compression_persistence.py fix(agent): rebaseline in-place compression flushes 2026-06-27 03:04:26 -07:00
test_compression_trigger_excludes_reasoning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compressor_fallback_update.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_concurrent_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_context_token_tracking.py refactor(session-log): delete _save_session_log and all callers 2026-05-20 11:44:10 -07:00
test_copilot_native_vision_headers.py fix(copilot): mark native image requests as vision 2026-04-27 08:35:50 -07:00
test_create_openai_client_disables_sdk_retries.py fix(agent): disable OpenAI SDK auto-retry that double-fires inside the rate-limit loop 2026-06-27 19:23:15 -07:00
test_create_openai_client_kwargs_isolation.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_create_openai_client_proxy_env.py fix(agent): preserve copilot routed headers 2026-06-21 11:29:49 -07:00
test_create_openai_client_reuse.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_credential_pool_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_credits_notices_toggle.py fix(credits): suppress usage gauge when top-up funds exist + add display.credits_notices toggle (#44716) 2026-06-12 01:06:46 -07:00
test_deepseek_reasoning_content_echo.py fix(agent): strip stale reasoning_content when falling back to a strict provider (#50480) 2026-06-21 18:05:07 -07:00
test_deepseek_v4_thinking_live.py fix(deepseek): preserve v4 reasoning_content on replay 2026-04-30 11:18:39 -07:00
test_dict_tool_call_args.py test(run_agent): align test_dict_tool_call_args with explainer suffix 2026-05-29 19:23:05 -07:00
test_empty_response_recovery_persistence.py refactor(session-log): delete _save_session_log and all callers 2026-05-20 11:44:10 -07:00
test_exit_cleanup_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_fallback_credential_isolation.py fix(fallback): attach credential pool after provider switch 2026-06-27 04:39:26 -07:00
test_file_mutation_verifier.py feat(agent): require verification before finishing edits 2026-06-24 23:02:48 -05:00
test_identity_flush.py fix(agent): persist repaired-turn responses (#46071) 2026-06-14 03:20:25 -07:00
test_image_rejection_fallback.py fix(image-routing): unblock message queue on OpenRouter 'no endpoints' image 404 (#53901) 2026-06-27 19:07:02 -07:00
test_image_shrink_recovery.py fix(agent): shrink anthropic-native image history 2026-06-22 18:23:21 -05:00
test_in_place_compaction.py test(compression): pin rotation-fallback tests to in_place=False ahead of default flip 2026-06-25 12:56:05 -07:00
test_infinite_compaction_loop.py fix(compaction): prevent infinite loop when transcript fits in tail budget 2026-06-07 21:50:57 -07:00
test_init_fallback_on_exhausted_pool.py fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929) 2026-05-02 02:09:46 -07:00
test_interactive_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_interrupt_propagation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_invalid_context_length_warning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_iteration_budget_race.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_jsondecodeerror_retryable.py fix(agent): classify TypeError('NoneType ... not iterable') as retryable provider shape error 2026-05-27 11:30:55 -07:00
test_last_reasoning_per_turn.py test: pin per-turn reasoning extraction semantics 2026-05-05 05:00:05 -07:00
test_long_context_tier_429.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_materialize_data_url_cleanup.py fix(misc): three small defensive fixes from PR #1974 2026-05-10 22:28:01 -07:00
test_memory_nudge_counter_hydration.py refactor(agent): extract run_conversation prologue into agent/turn_context.py 2026-06-07 22:17:35 -07:00
test_memory_provider_init.py feat(memory): improve OpenViking setup UX 2026-06-17 01:04:26 +08:00
test_memory_sync_interrupted.py fix(openviking): sanitize skill memory input 2026-06-16 10:37:37 -07:00
test_message_sequence_repair.py fix(agent): rewind flush cursor exactly when repair compacts before the cursor 2026-06-12 16:29:01 -07:00
test_moa_loop_mode.py fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007) 2026-06-27 22:52:25 -07:00
test_multimodal_tool_content_recovery.py fix(vision): proactive downgrade for providers rejecting list-type tool content (#41072) 2026-06-07 21:50:57 -07:00
test_nonretryable_error_html_summary.py test(agent): cover non-retryable error HTML summarization 2026-06-18 15:46:19 +05:30
test_notice_spine.py feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011) 2026-06-06 13:18:18 +05:30
test_nous_429_fallback_reentry.py test: regression guard for Nous 429 fallback re-entry; AUTHOR_MAP entry 2026-06-12 12:21:29 -07:00
test_openai_client_lifecycle.py fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults 2026-05-25 01:47:55 -07:00
test_partial_stream_finish_reason.py fix(stream): don't report dropped mid-tool-call streams as output truncation (#42314) 2026-06-08 11:56:10 -07:00
test_percentage_clamp.py test: repoint percentage-clamp source guard to gateway/slash_commands.py 2026-06-08 01:25:35 -07:00
test_plugin_context_engine_init.py fix: expose context engine tools with saved toolsets 2026-05-28 00:28:42 -07:00
test_primary_runtime_restore.py Add Hermes desktop app (#20059) 2026-05-31 17:46:56 -05:00
test_provider_attribution_headers.py fix(agent): preserve copilot routed headers 2026-06-21 11:29:49 -07:00
test_provider_fallback.py fix(fallback): skip chain entries matching current provider/model/base_url (#22780) 2026-05-09 12:48:19 -07:00
test_provider_parity.py fix Nous auth refresh for idle agents 2026-06-21 22:43:48 -07:00
test_real_interrupt_subagent.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_redirect_stdout_issue.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_repair_tool_call_arguments.py revert: drop cumulative-resend tool-arg heuristic from shared streaming path (#35718) (#35860) 2026-05-31 06:14:32 -07:00
test_repair_tool_call_name.py fix(volcengine): strip XML attribute fragments from tool_use.name (#33007) 2026-06-07 22:22:01 -07:00
test_retry_status_buffer.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_review_prompt_class_first.py fix(review): tell background reviewer not to capture transient env failures as skills (#23004) 2026-05-09 22:51:25 -07:00
test_run_agent.py test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
test_run_agent_codex_responses.py test: cover request debug dump redaction 2026-06-15 05:31:21 -07:00
test_run_agent_multimodal_prologue.py refactor: unify transport dispatch + collapse normalize shims 2026-04-22 18:34:25 -07:00
test_sequential_chats_live.py test: regression guards for the keepalive/transport bug class (#10933) (#11266) 2026-04-16 16:36:33 -07:00
test_session_id_env.py feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847) 2026-05-12 00:16:45 +05:30
test_session_meta_filtering.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_session_reset_fix.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_session_source.py fix(dashboard): hide sidecar sessions from history (#49269) 2026-06-19 18:06:38 -04:00
test_steer.py fix(agent): make mid-turn /steer trusted, not read as injection 2026-06-05 20:59:36 -05:00
test_stream_drop_logging.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_stream_interrupt_retry.py fix: /stop now immediately aborts streaming retry loop 2026-04-25 09:51:39 -07:00
test_streaming.py test(streaming): repoint anthropic stream-cleanup test to close+rebuild path 2026-06-27 19:37:33 -07:00
test_streaming_tool_call_repair.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_strict_api_validation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_strip_reasoning_tags_cli.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_switch_model_context.py test(ci): stabilize shared optional dependency baselines 2026-05-13 17:32:22 -07:00
test_switch_model_fallback_prune.py fix(agent): default missing fallback chain on switch 2026-04-24 05:35:43 -07:00
test_switch_model_pool_reload_52727.py fix(agent): reload credential pool on switch_model provider change (#52727) 2026-06-27 04:39:26 -07:00
test_switch_model_rollback.py fix(agent): roll back switch_model() state when client rebuild fails (#33228) 2026-05-27 05:43:20 -07:00
test_thinking_only_sanitizer.py fix(agent): treat Codex reasoning items as thinking-only 2026-06-13 14:35:00 -07:00
test_thinking_sig_recovery_persistence.py fix(agent): strip api_messages in thinking-signature recovery so the retry actually omits thinking blocks 2026-06-10 12:39:44 -07:00
test_tls_fd_recycle_corruption.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_token_persistence_non_cli.py fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
test_tool_arg_coercion.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_tool_call_args_sanitizer.py ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861) 2026-05-19 17:27:24 -07:00
test_tool_call_guardrail_runtime.py test(guardrail): assert halt message reaches stream_delta_callback 2026-05-24 07:38:24 -07:00
test_tool_call_incremental_persistence.py fix(agent): persist tool calls before turn-end flush 2026-06-24 02:15:57 +05:30
test_tool_executor_contextvar_propagation.py fix(code-exec): propagate agent-turn context into tool worker threads 2026-05-29 03:44:49 -07:00
test_tool_name_db_persistence.py fix(agent): set tool_name on tool-result messages at construction time 2026-05-19 20:49:11 +01:00
test_turn_completion_explainer.py test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
test_unicode_ascii_codec.py fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization 2026-04-15 15:03:28 -07:00
test_vision_aware_preprocessing.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_vision_tool_messages.py fix(vision): proactive downgrade for providers rejecting list-type tool content (#41072) 2026-06-07 21:50:57 -07:00