hermes-agent/tests/run_agent
Teknium 4093ee9c62
fix(codex): detect leaked tool-call text in assistant content (#15347)
gpt-5.x on the Codex Responses API sometimes degenerates and emits
Harmony-style `to=functions.<name> {json}` serialization as plain
assistant-message text instead of a structured `function_call` item.
The intent never makes it into `response.output` as a function_call,
so `tool_calls` is empty and `_normalize_codex_response()` returns
the leaked text as the final content. Downstream (e.g. delegate_task),
this surfaces as a confident-looking summary with `tool_trace: []`
because no tools actually ran — the Taiwan-embassy-email bug report.

Detect the pattern, scrub the content, and return finish_reason=
'incomplete' so the existing Codex-incomplete continuation path
(run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper
function_call item. Encrypted reasoning items are preserved so the
model keeps its chain-of-thought on the retry.

Regression tests: leaked text triggers incomplete, real tool calls
alongside leak-looking text are preserved, clean responses pass
through unchanged.

Reported on Discord (gpt-5.4 / openai-codex).
2026-04-24 14:39:59 -07:00
..
__init__.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
conftest.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_413_compression.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_860_dedup.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_1630_context_overflow_loop.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_agent_guardrails.py fix(delegate): make max_concurrent_children configurable + error on excess 2026-04-10 13:38:14 -07:00
test_agent_loop.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_agent_loop_tool_calling.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_agent_loop_vllm.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_anthropic_error_handling.py feat(providers): extend request_timeout_seconds to all client paths 2026-04-19 11:23:00 -07:00
test_anthropic_prompt_cache_policy.py fix(cache): enable prompt caching for Qwen on OpenCode/OpenCode-Go/Alibaba (#13528) 2026-04-21 06:40:58 -07:00
test_anthropic_third_party_oauth_guard.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_anthropic_truncation_continuation.py refactor: remove _nr_to_assistant_message shim + fix flush_memories guard 2026-04-23 02:30:05 -07:00
test_api_max_retries_config.py feat(agent): make API retry count configurable via agent.api_max_retries (#14730) 2026-04-23 13:59:32 -07:00
test_async_httpx_del_neuter.py fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470) 2026-04-15 13:16:28 -07:00
test_background_review_summary.py fix(agent): exclude prior-history tool messages from background review summary 2026-04-24 03:10:19 -07:00
test_compress_focus_plugin_fallback.py fix(compress): don't reach into ContextCompressor privates from /compress (#15039) 2026-04-24 02:55:43 -07:00
test_compression_boundary.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_compression_feasibility.py fix(compression): enforce 64k floor on aux model + auto-correct threshold (#12898) 2026-04-20 00:56:04 -07:00
test_compression_persistence.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_compression_trigger_excludes_reasoning.py fix(compression): exclude completion tokens from compression trigger (#12026) 2026-04-20 05:12:10 -07:00
test_compressor_fallback_update.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_concurrent_interrupt.py fix(tests): resolve 17 persistent CI test failures (#15084) 2026-04-24 03:46:46 -07:00
test_context_token_tracking.py feat(providers): extend request_timeout_seconds to all client paths 2026-04-19 11:23:00 -07:00
test_create_openai_client_kwargs_isolation.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_create_openai_client_proxy_env.py test(proxy): regression tests for NO_PROXY bypass on keepalive client 2026-04-24 03:04:42 -07:00
test_create_openai_client_reuse.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_dict_tool_call_args.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_exit_cleanup_interrupt.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_fallback_model.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_flush_memories_codex.py fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
test_interactive_interrupt.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_interrupt_propagation.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_invalid_context_length_warning.py fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation 2026-04-15 22:05:21 -07:00
test_jsondecodeerror_retryable.py fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107) 2026-04-24 05:02:58 -07:00
test_long_context_tier_429.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_memory_provider_init.py fix(memory): keep Honcho provider opt-in 2026-04-18 22:50:55 -07:00
test_openai_client_lifecycle.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_percentage_clamp.py fix: update 6 test files broken by dead code removal 2026-04-10 03:44:43 -07:00
test_plugin_context_engine_init.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_primary_runtime_restore.py fix(agent): only set rate-limit cooldown when leaving primary; add tests 2026-04-24 05:35:43 -07:00
test_provider_attribution_headers.py fix(providers): send user agent to routermint endpoints 2026-04-24 03:02:16 -07:00
test_provider_fallback.py fix(agent): fall back on rate limit when pool has no rotation room 2026-04-24 05:20:05 -07:00
test_provider_parity.py feat: add ResponsesApiTransport + wire all Codex transport paths 2026-04-21 19:48:56 -07:00
test_real_interrupt_subagent.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_redirect_stdout_issue.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_repair_tool_call_arguments.py fix: extract _repair_tool_call_arguments helper, add tests, bound loop 2026-04-20 05:12:55 -07:00
test_repair_tool_call_name.py fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124) 2026-04-24 05:32:08 -07:00
test_run_agent.py feat: read prompt caching cache_ttl from config 2026-04-24 03:21:29 -07:00
test_run_agent_codex_responses.py fix(codex): detect leaked tool-call text in assistant content (#15347) 2026-04-24 14:39:59 -07:00
test_run_agent_multimodal_prologue.py refactor: unify transport dispatch + collapse normalize shims 2026-04-22 18:34:25 -07:00
test_sequential_chats_live.py test: regression guards for the keepalive/transport bug class (#10933) (#11266) 2026-04-16 16:36:33 -07:00
test_session_meta_filtering.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_session_reset_fix.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_steer.py refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340) 2026-04-20 22:18:49 -07:00
test_streaming.py fix(streaming): silent retry when stream dies mid tool-call (#14151) 2026-04-22 13:47:33 -07:00
test_strict_api_validation.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_strip_reasoning_tags_cli.py fix(display): strip standalone tool-call XML tags from visible text 2026-04-22 18:12:42 -07:00
test_switch_model_context.py fix: pass config_context_length to switch_model context compressor 2026-04-10 05:52:45 -07:00
test_switch_model_fallback_prune.py fix(agent): default missing fallback chain on switch 2026-04-24 05:35:43 -07:00
test_token_persistence_non_cli.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_tool_arg_coercion.py test,chore: cover stringified array/object coercion + AUTHOR_MAP entry 2026-04-23 16:38:38 -07:00
test_unicode_ascii_codec.py fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization 2026-04-15 15:03:28 -07:00