hermes-agent/tests/run_agent
Teknium b6ca56f651
fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) (#33035)
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144)

When an OpenAI-compatible Responses API surface accepts an initial
request but later rejects the replayed `codex_reasoning_items`
encrypted blob with HTTP 400 `invalid_encrypted_content`, the
session previously got stuck retrying the same poisoned payload.

Recovery: classify the error as a dedicated FailoverReason, and on the
first hit disable encrypted reasoning replay for the rest of the
session, strip cached items from message history, and retry once.

Changes:
* error_classifier: add FailoverReason.invalid_encrypted_content
  branch in _classify_400 (before context_overflow so the messages
  that mention 'encrypted content … could not be verified' don't trip
  context heuristics), in _classify_by_error_code, and extend
  _extract_error_code to peek inside wrapped JSON in error.message and
  ignore the bare '400' as a code.
* agent_init: initialize `_codex_reasoning_replay_enabled = True` on
  every agent.
* run_agent: add AIAgent._disable_codex_reasoning_replay() helper
  that flips the flag and pops cached items.
* codex_responses_adapter: thread a `replay_encrypted_reasoning`
  kwarg through _chat_messages_to_responses_input so that when the
  flag is False we don't replay codex_reasoning_items.
* transports/codex.py: read `replay_encrypted_reasoning` from params,
  thread it into the adapter, and gate the
  `include=['reasoning.encrypted_content']` request hint on it.
* chat_completion_helpers: pass the agent's replay flag through to
  the transport.
* conversation_loop: in the retry loop, add an
  invalid_encrypted_content recovery branch that fires once per
  session, only when api_mode == codex_responses, only when replay is
  still enabled, and only when at least one assistant message in
  history actually carries cached reasoning items (otherwise the 400
  has nothing to do with our cache and the normal retry path handles
  it).

Tests:
* test_error_classifier: new wrapped-JSON _extract_error_code case;
  new TestClassifyApiError cases proving the 400 is retryable with
  no fallback, that the broad message match doesn't catch a generic
  'parsed' message, and that the error code match is
  case-insensitive.
* test_run_agent_codex_responses: end-to-end test of the recovery
  branch firing once and disabling replay, plus a sibling test that
  proves the branch does *not* fire (and the flag stays True) when
  history has no cached reasoning items.

Salvages PR #10144 onto the post-refactor module layout
(error_classifier / codex_responses_adapter / transports/codex /
conversation_loop / agent_init) since the original diff was written
against the pre-refactor monolithic run_agent.py.

* chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage

---------

Co-authored-by: victorGPT <wuxuebin1993@gmail.com>
2026-05-26 22:01:17 -07:00
..
__init__.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
conftest.py ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861) 2026-05-19 17:27:24 -07:00
test_413_compression.py test+polish(compression): pin anti-thrash gate and gateway session_id persistence 2026-05-25 01:44:46 -07:00
test_860_dedup.py refactor(gateway): stop writing JSONL in append_to_transcript / rewrite_transcript 2026-05-20 13:00:57 -07:00
test_1630_context_overflow_loop.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_31273_402_not_retried.py fix(agent): abort on HTTP 402 after pool rotation and fallback fail (#31443) 2026-05-24 15:14:13 -07:00
test_agent_guardrails.py fix(agent): include name field on every role:tool message for Gemini compatibility (#16478) 2026-05-04 05:06:33 -07:00
test_anthropic_prompt_cache_policy.py fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778) 2026-05-12 20:46:04 -07:00
test_anthropic_third_party_oauth_guard.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_anthropic_truncation_continuation.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
test_api_max_retries_config.py feat(agent): make API retry count configurable via agent.api_max_retries (#14730) 2026-04-23 13:59:32 -07:00
test_async_httpx_del_neuter.py fix(dashboard): UI polish — modals, layout, consistency, test fixes 2026-05-12 13:59:22 -04:00
test_background_review.py fix(run_agent): isolate background review fork from external memory plugins (#27190) 2026-05-16 20:33:38 -07:00
test_background_review_cache_parity.py chore: trim verbose comments/docstrings, add AUTHOR_MAP entry 2026-05-21 12:49:21 +05:30
test_background_review_summary.py fix(agent): exclude prior-history tool messages from background review summary 2026-04-24 03:10:19 -07:00
test_background_review_toolset_restriction.py chore: trim verbose comments/docstrings, add AUTHOR_MAP entry 2026-05-21 12:49:21 +05:30
test_callable_api_key.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
test_codex_app_server_integration.py fix(codex-runtime): retire wedged sessions + post-tool watchdog + OAuth refresh classify (#25769) 2026-05-14 07:55:09 -07:00
test_codex_multimodal_tool_result.py feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955) 2026-05-09 21:06:19 -07:00
test_codex_silent_hang_hint.py fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern 2026-05-25 04:49:22 -07:00
test_codex_xai_oauth_recovery.py test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344) 2026-05-23 02:48:13 -07:00
test_commit_memory_session_context_engine.py fix(agent): notify context engine on commit_memory_session (#22764) 2026-05-09 12:28:42 -07:00
test_compress_focus_plugin_fallback.py refactor(memory): remove flush_memories entirely (#15696) 2026-04-25 08:21:14 -07:00
test_compression_boundary.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_compression_boundary_hook.py fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) 2026-05-18 21:43:59 -07:00
test_compression_feasibility.py perf(compression): defer feasibility check to first compression attempt (#28957) 2026-05-19 17:27:17 -07:00
test_compression_persistence.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_compression_trigger_excludes_reasoning.py fix(compression): exclude completion tokens from compression trigger (#12026) 2026-04-20 05:12:10 -07:00
test_compressor_fallback_update.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_concurrent_interrupt.py test: remove 50 stale/broken tests to unblock CI (#22098) 2026-05-08 14:55:40 -07:00
test_context_token_tracking.py refactor(session-log): delete _save_session_log and all callers 2026-05-20 11:44:10 -07:00
test_copilot_native_vision_headers.py fix(copilot): mark native image requests as vision 2026-04-27 08:35:50 -07:00
test_create_openai_client_kwargs_isolation.py fix(tests): make AIAgent constructor calls self-contained (#11755) 2026-04-17 12:32:03 -07:00
test_create_openai_client_proxy_env.py test(proxy): regression tests for NO_PROXY bypass on keepalive client 2026-04-24 03:04:42 -07:00
test_create_openai_client_reuse.py fix(force_close_tcp_sockets): shutdown only, do not release FD (#29507) 2026-05-23 02:31:10 -07:00
test_credential_pool_interrupt.py fix(credential-pool): rotate immediately when credential already exhausted 2026-05-25 06:21:28 -07:00
test_deepseek_reasoning_content_echo.py fix(deepseek): use non-empty reasoning_content placeholder for V4 Pro thinking mode 2026-04-30 23:04:23 -07:00
test_deepseek_v4_thinking_live.py fix(deepseek): preserve v4 reasoning_content on replay 2026-04-30 11:18:39 -07:00
test_dict_tool_call_args.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_empty_response_recovery_persistence.py refactor(session-log): delete _save_session_log and all callers 2026-05-20 11:44:10 -07:00
test_exit_cleanup_interrupt.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_file_mutation_verifier.py fix: classify landed file mutations with diagnostics 2026-05-13 06:46:23 -07:00
test_image_rejection_fallback.py fix(agent): catch ChatGPT-account Codex data-URL rejection so images are stripped instead of cascading to compression (#23602) 2026-05-11 07:37:22 -07:00
test_image_shrink_recovery.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
test_init_fallback_on_exhausted_pool.py fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929) 2026-05-02 02:09:46 -07:00
test_interactive_interrupt.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_interrupt_propagation.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_invalid_context_length_warning.py fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation 2026-04-15 22:05:21 -07:00
test_iteration_budget_race.py fix(run_agent): acquire lock in IterationBudget.used property 2026-05-04 12:37:28 -07:00
test_jsondecodeerror_retryable.py refactor(run_agent): review fixes — keyword-forward __init__, drop dead code, tighten guards 2026-05-16 22:55:49 -07:00
test_last_reasoning_per_turn.py test: pin per-turn reasoning extraction semantics 2026-05-05 05:00:05 -07:00
test_long_context_tier_429.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_materialize_data_url_cleanup.py fix(misc): three small defensive fixes from PR #1974 2026-05-10 22:28:01 -07:00
test_memory_nudge_counter_hydration.py refactor(run_agent): review fixes — keyword-forward __init__, drop dead code, tighten guards 2026-05-16 22:55:49 -07:00
test_memory_provider_init.py fix(memory): keep Honcho provider opt-in 2026-04-18 22:50:55 -07:00
test_memory_sync_interrupted.py feat(memory): notify providers on mid-process session_id rotation (#17409) 2026-04-29 04:57:22 -07:00
test_message_sequence_repair.py fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385) 2026-05-07 08:35:10 -07:00
test_multimodal_tool_content_recovery.py fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259) 2026-05-21 23:40:16 -07:00
test_openai_client_lifecycle.py fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults 2026-05-25 01:47:55 -07:00
test_partial_stream_finish_reason.py fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) (#32012) 2026-05-25 17:43:10 +05:30
test_percentage_clamp.py fix: update 6 test files broken by dead code removal 2026-04-10 03:44:43 -07:00
test_plugin_context_engine_init.py fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency 2026-05-23 17:38:19 -07:00
test_primary_runtime_restore.py fix(agent): reset _fallback_index at turn start even when no fallback activated 2026-05-16 17:12:48 -07:00
test_provider_attribution_headers.py feat(nvidia): add NIM billing origin header 2026-05-15 14:06:51 -07:00
test_provider_fallback.py fix(fallback): skip chain entries matching current provider/model/base_url (#22780) 2026-05-09 12:48:19 -07:00
test_provider_parity.py fix(tests): stabilize xai env and provider parity 2026-05-17 11:55:25 -07:00
test_real_interrupt_subagent.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_redirect_stdout_issue.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_repair_tool_call_arguments.py fix(run_agent): handle unescaped control chars in tool_call arguments (#15356) 2026-04-24 15:06:41 -07:00
test_repair_tool_call_name.py fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124) 2026-04-24 05:32:08 -07:00
test_review_prompt_class_first.py fix(review): tell background reviewer not to capture transient env failures as skills (#23004) 2026-05-09 22:51:25 -07:00
test_run_agent.py fix(credential-pool): correct pool rotation when weekly usage limit is reached 2026-05-25 06:32:30 -07:00
test_run_agent_codex_responses.py fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) (#33035) 2026-05-26 22:01:17 -07:00
test_run_agent_multimodal_prologue.py refactor: unify transport dispatch + collapse normalize shims 2026-04-22 18:34:25 -07:00
test_sequential_chats_live.py test: regression guards for the keepalive/transport bug class (#10933) (#11266) 2026-04-16 16:36:33 -07:00
test_session_id_env.py feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847) 2026-05-12 00:16:45 +05:30
test_session_meta_filtering.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_session_reset_fix.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_steer.py refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340) 2026-04-20 22:18:49 -07:00
test_stream_drop_logging.py feat(stream-retry): add upstream + timing diagnostics to drop log (#23005) 2026-05-09 22:49:35 -07:00
test_stream_interrupt_retry.py fix: /stop now immediately aborts streaming retry loop 2026-04-25 09:51:39 -07:00
test_streaming.py fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184) 2026-05-16 17:09:41 -07:00
test_streaming_tool_call_repair.py chore: remove Atropos RL environments and tinker-atropos integration (#26106) 2026-05-15 10:36:38 +05:30
test_strict_api_validation.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_strip_reasoning_tags_cli.py fix(display): strip standalone tool-call XML tags from visible text 2026-04-22 18:12:42 -07:00
test_switch_model_context.py test(ci): stabilize shared optional dependency baselines 2026-05-13 17:32:22 -07:00
test_switch_model_fallback_prune.py fix(agent): default missing fallback chain on switch 2026-04-24 05:35:43 -07:00
test_thinking_only_sanitizer.py fix(agent): drop thinking-only assistant turns before provider call (#16959) 2026-04-28 03:50:51 -07:00
test_tls_fd_recycle_corruption.py test(tls-fd-recycle): pin shutdown-only + thread-aware close contract (#29507) 2026-05-23 02:31:10 -07:00
test_token_persistence_non_cli.py fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
test_tool_arg_coercion.py fix(tools): wrap bare scalars in single-element list for array-typed args 2026-05-04 05:00:37 -07:00
test_tool_call_args_sanitizer.py ci(tests): add pytest-timeout 60s hard cap to break suite-teardown deadlock (#28861) 2026-05-19 17:27:24 -07:00
test_tool_call_guardrail_runtime.py test(guardrail): assert halt message reaches stream_delta_callback 2026-05-24 07:38:24 -07:00
test_tool_executor_contextvar_propagation.py refactor(run_agent): extract tool execution to agent/tool_executor.py 2026-05-16 18:24:05 -07:00
test_tool_name_db_persistence.py fix(agent): set tool_name on tool-result messages at construction time 2026-05-19 20:49:11 +01:00
test_unicode_ascii_codec.py fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization 2026-04-15 15:03:28 -07:00
test_vision_aware_preprocessing.py fix(agent): resolve supports_vision override for named custom providers 2026-05-20 23:27:10 -07:00