hermes-agent/tests/run_agent
Teknium e5af1dd633
fix(review): tell background reviewer not to capture transient env failures as skills (#23004)
Closes #6051.

Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").

Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.

Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
  • Environment-dependent failures (missing binaries, fresh-install
    errors, post-migration path mismatches, 'command not found',
    unconfigured credentials, uninstalled packages)
  • Negative claims about tools or features ("X does not work")
    that harden into self-cited refusals
  • Session-specific transient errors that resolved before the
    conversation ended
  • One-off task narratives ("summarize today's market", "analyze
    this PR") — also addresses the #12812 / #4538 family

Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.

Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.
2026-05-09 22:51:25 -07:00
..
__init__.py
conftest.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_413_compression.py fix(agent): surface preflight compression status 2026-05-04 01:41:51 -07:00
test_860_dedup.py fix: lazy session creation — defer DB row until first message (#18370) 2026-05-01 18:39:12 +05:30
test_1630_context_overflow_loop.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_agent_guardrails.py fix(agent): include name field on every role:tool message for Gemini compatibility (#16478) 2026-05-04 05:06:33 -07:00
test_agent_loop.py
test_agent_loop_tool_calling.py
test_agent_loop_vllm.py
test_anthropic_error_handling.py feat(providers): extend request_timeout_seconds to all client paths 2026-04-19 11:23:00 -07:00
test_anthropic_prompt_cache_policy.py fix(minimax): enable Anthropic prompt caching for MiniMax's own models (#17425) 2026-04-29 04:56:55 -07:00
test_anthropic_third_party_oauth_guard.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_anthropic_truncation_continuation.py refactor: remove _nr_to_assistant_message shim + fix flush_memories guard 2026-04-23 02:30:05 -07:00
test_api_max_retries_config.py feat(agent): make API retry count configurable via agent.api_max_retries (#14730) 2026-04-23 13:59:32 -07:00
test_async_httpx_del_neuter.py fix(copilot): send vision header for Copilot vision requests 2026-04-27 08:35:50 -07:00
test_background_review.py fix(cli): surface self-improvement review summaries from bg thread 2026-04-30 14:07:22 -07:00
test_background_review_summary.py fix(agent): exclude prior-history tool messages from background review summary 2026-04-24 03:10:19 -07:00
test_background_review_toolset_restriction.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
test_codex_multimodal_tool_result.py feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955) 2026-05-09 21:06:19 -07:00
test_commit_memory_session_context_engine.py fix(agent): notify context engine on commit_memory_session (#22764) 2026-05-09 12:28:42 -07:00
test_compress_focus_plugin_fallback.py refactor(memory): remove flush_memories entirely (#15696) 2026-04-25 08:21:14 -07:00
test_compression_boundary.py
test_compression_boundary_hook.py fix: signal compression boundary to context engine 2026-04-26 19:07:18 -07:00
test_compression_feasibility.py refactor(memory): remove flush_memories entirely (#15696) 2026-04-25 08:21:14 -07:00
test_compression_persistence.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_compression_trigger_excludes_reasoning.py fix(compression): exclude completion tokens from compression trigger (#12026) 2026-04-20 05:12:10 -07:00
test_compressor_fallback_update.py
test_concurrent_interrupt.py test: remove 50 stale/broken tests to unblock CI (#22098) 2026-05-08 14:55:40 -07:00
test_context_token_tracking.py feat(providers): extend request_timeout_seconds to all client paths 2026-04-19 11:23:00 -07:00
test_copilot_native_vision_headers.py fix(copilot): mark native image requests as vision 2026-04-27 08:35:50 -07:00
test_create_openai_client_kwargs_isolation.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_create_openai_client_proxy_env.py test(proxy): regression tests for NO_PROXY bypass on keepalive client 2026-04-24 03:04:42 -07:00
test_create_openai_client_reuse.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_deepseek_reasoning_content_echo.py fix(deepseek): use non-empty reasoning_content placeholder for V4 Pro thinking mode 2026-04-30 23:04:23 -07:00
test_deepseek_v4_thinking_live.py fix(deepseek): preserve v4 reasoning_content on replay 2026-04-30 11:18:39 -07:00
test_dict_tool_call_args.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_empty_response_recovery_persistence.py fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385) 2026-05-07 08:35:10 -07:00
test_exit_cleanup_interrupt.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_fallback_model.py fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665) 2026-05-09 17:53:56 -07:00
test_image_rejection_fallback.py fix(computer-use): harden image-rejection fallback + AUTHOR_MAP 2026-05-08 11:07:38 -07:00
test_image_shrink_recovery.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
test_init_fallback_on_exhausted_pool.py fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929) 2026-05-02 02:09:46 -07:00
test_interactive_interrupt.py
test_interrupt_propagation.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_invalid_context_length_warning.py fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation 2026-04-15 22:05:21 -07:00
test_iteration_budget_race.py fix(run_agent): acquire lock in IterationBudget.used property 2026-05-04 12:37:28 -07:00
test_jsondecodeerror_retryable.py fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107) 2026-04-24 05:02:58 -07:00
test_last_reasoning_per_turn.py test: pin per-turn reasoning extraction semantics 2026-05-05 05:00:05 -07:00
test_long_context_tier_429.py
test_memory_nudge_counter_hydration.py fix(agent): hydrate memory-nudge counters from conversation_history (#22774) 2026-05-09 12:48:03 -07:00
test_memory_provider_init.py fix(memory): keep Honcho provider opt-in 2026-04-18 22:50:55 -07:00
test_memory_sync_interrupted.py feat(memory): notify providers on mid-process session_id rotation (#17409) 2026-04-29 04:57:22 -07:00
test_message_sequence_repair.py fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385) 2026-05-07 08:35:10 -07:00
test_openai_client_lifecycle.py
test_percentage_clamp.py fix: update 6 test files broken by dead code removal 2026-04-10 03:44:43 -07:00
test_plugin_context_engine_init.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_primary_runtime_restore.py fix(agent): only set rate-limit cooldown when leaving primary; add tests 2026-04-24 05:35:43 -07:00
test_provider_attribution_headers.py refactor(gmi): move User-Agent to profile.default_headers 2026-05-08 03:22:11 -07:00
test_provider_fallback.py fix(fallback): skip chain entries matching current provider/model/base_url (#22780) 2026-05-09 12:48:19 -07:00
test_provider_parity.py fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765) 2026-04-29 23:23:50 -07:00
test_real_interrupt_subagent.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_redirect_stdout_issue.py
test_repair_tool_call_arguments.py fix(run_agent): handle unescaped control chars in tool_call arguments (#15356) 2026-04-24 15:06:41 -07:00
test_repair_tool_call_name.py fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124) 2026-04-24 05:32:08 -07:00
test_review_prompt_class_first.py fix(review): tell background reviewer not to capture transient env failures as skills (#23004) 2026-05-09 22:51:25 -07:00
test_run_agent.py fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro 2026-05-09 13:36:12 -07:00
test_run_agent_codex_responses.py fix(memory): drop scrub from interim commentary + final response 2026-04-27 12:37:33 -07:00
test_run_agent_multimodal_prologue.py refactor: unify transport dispatch + collapse normalize shims 2026-04-22 18:34:25 -07:00
test_sequential_chats_live.py test: regression guards for the keepalive/transport bug class (#10933) (#11266) 2026-04-16 16:36:33 -07:00
test_session_meta_filtering.py
test_session_reset_fix.py
test_steer.py refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340) 2026-04-20 22:18:49 -07:00
test_stream_drop_logging.py feat(stream-retry): add upstream + timing diagnostics to drop log (#23005) 2026-05-09 22:49:35 -07:00
test_stream_interrupt_retry.py fix: /stop now immediately aborts streaming retry loop 2026-04-25 09:51:39 -07:00
test_streaming.py fix(copilot-acp): disable streaming path for CopilotACPClient 2026-04-28 11:33:07 -07:00
test_streaming_tool_call_repair.py fix: repair malformed tool call args in streaming assembly before flagging as truncated 2026-04-24 15:03:07 -07:00
test_strict_api_validation.py
test_strip_reasoning_tags_cli.py fix(display): strip standalone tool-call XML tags from visible text 2026-04-22 18:12:42 -07:00
test_switch_model_context.py fix: pass config_context_length to switch_model context compressor 2026-04-10 05:52:45 -07:00
test_switch_model_fallback_prune.py fix(agent): default missing fallback chain on switch 2026-04-24 05:35:43 -07:00
test_thinking_only_sanitizer.py fix(agent): drop thinking-only assistant turns before provider call (#16959) 2026-04-28 03:50:51 -07:00
test_token_persistence_non_cli.py fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
test_tool_arg_coercion.py fix(tools): wrap bare scalars in single-element list for array-typed args 2026-05-04 05:00:37 -07:00
test_tool_call_args_sanitizer.py fix(agent): include name field on every role:tool message for Gemini compatibility (#16478) 2026-05-04 05:06:33 -07:00
test_tool_call_guardrail_runtime.py fix(agent): make tool loop guardrails warning-first 2026-04-30 20:43:15 -07:00
test_tool_executor_contextvar_propagation.py fix(agent): propagate ContextVars to concurrent tool worker threads (#18123) 2026-04-30 16:26:26 -07:00
test_unicode_ascii_codec.py fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization 2026-04-15 15:03:28 -07:00
test_vision_aware_preprocessing.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00