hermes-agent/agent/turn_retry_state.py
teknium1 578e3989d4 fix(agent): route content-filter stream stalls to fallback chain (#32421)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.

Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
  the stub-creation point and stamp _content_filter_terminated on the stub
  (single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
  burning any continuation retries; roll partial content back to the last
  clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
  Plain network stalls are unaffected (only content_policy_blocked is tagged).

Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.

Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
2026-06-28 01:15:21 -07:00

79 lines
4 KiB
Python

"""Per-attempt recovery bookkeeping for the conversation turn loop.
The inner retry loop in ``run_conversation`` (``while retry_count <
max_retries``) makes several distinct recovery attempts on a single model API
call: a credential-pool 429 retry, a per-provider OAuth refresh (codex,
anthropic, nous, copilot), a long-context compression restart, a length-
continuation restart, and a handful of format-recovery branches (thinking-
signature stripping, multimodal-tool-content stripping, llama.cpp grammar
fallback, image shrink, invalid-encrypted-content, 1M-beta header).
Each of those branches is guarded by a one-shot boolean so it fires at most
once per attempt. They used to be ~16 bare ``*_attempted`` / ``has_retried_*``
/ ``restart_with_*`` locals declared inline before the loop and threaded
through its 2,400-line body. ``TurnRetryState`` collapses them into one object
the loop mutates in place (``state.codex_auth_retry_attempted = True``), giving
the recovery bookkeeping a single named, testable home.
Loop-control variables (``retry_count``, ``max_retries``,
``max_compression_attempts``) intentionally stay as plain locals — they are the
``while`` mechanics, not recovery bookkeeping, and putting them on the object
would add indirection without clarifying anything.
This module is dependency-free so it can be unit-tested in isolation and
imported by the turn loop without an import cycle.
"""
from __future__ import annotations
from dataclasses import dataclass, fields
@dataclass
class TurnRetryState:
"""One-shot recovery guards + restart signals for a single API-call attempt.
A fresh instance is created for each iteration of the outer turn loop
(once per ``api_call_count``). Each guard fires its recovery branch at most
once; the ``restart_with_*`` signals are read by the loop after the attempt
to decide whether to rebuild the request and retry.
"""
# ── Per-provider OAuth / credential refresh guards ───────────────────
codex_auth_retry_attempted: bool = False
anthropic_auth_retry_attempted: bool = False
nous_auth_retry_attempted: bool = False
nous_paid_entitlement_refresh_attempted: bool = False
copilot_auth_retry_attempted: bool = False
# ── Format / payload recovery guards ─────────────────────────────────
thinking_sig_retry_attempted: bool = False
invalid_encrypted_content_retry_attempted: bool = False
image_shrink_retry_attempted: bool = False
multimodal_tool_content_retry_attempted: bool = False
oauth_1m_beta_retry_attempted: bool = False
llama_cpp_grammar_retry_attempted: bool = False
# ── Transport / rate-limit recovery ──────────────────────────────────
primary_recovery_attempted: bool = False
has_retried_429: bool = False
# ── Auth-failure provider failover ───────────────────────────────────
# Set once we've escalated a persistent 401/403 (after the per-provider
# credential-refresh attempt above failed) to the fallback chain, so we
# don't loop on the same auth failover within one attempt.
auth_failover_attempted: bool = False
# ── Restart signals (read by the outer loop after the attempt) ───────
restart_with_compressed_messages: bool = False
restart_with_length_continuation: bool = False
# Set when a content-filter stream stall (e.g. MiniMax "new_sensitive")
# has been escalated to the fallback chain: the partial-stream content
# was rolled back off ``messages`` and the loop should re-issue the API
# call against the newly-activated provider (#32421).
restart_with_rebuilt_messages: bool = False
def __iter__(self):
# Convenience for debugging / tests: iterate (name, value) pairs.
for f in fields(self):
yield f.name, getattr(self, f.name)