mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-07-01 12:02:05 +00:00
When a provider's output-layer safety filter (MiniMax "output new_sensitive (1027)", Azure content_filter, etc.) kills a streaming response after deltas were already sent, interruptible_streaming_api_call swallows the raw error into a finish_reason=length partial-stream stub. The conversation loop then burned 3 continuation retries against the SAME primary — re-hitting the content-deterministic filter every time — and gave up with "Response remained truncated after 3 continuation attempts", never consulting fallback_providers. Builds on @595650661's classifier change (cherry-picked) so error_classifier recognizes the filter; then: - chat_completion_helpers: run the swallowed error through error_classifier at the stub-creation point and stamp _content_filter_terminated on the stub (single source of truth — no parallel pattern list). - conversation_loop: read the tag and activate the fallback chain BEFORE burning any continuation retries; roll partial content back to the last clean turn and re-issue against the new provider (restart_with_rebuilt_messages). Plain network stalls are unaffected (only content_policy_blocked is tagged). Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the same issue via the stub-tag and loop-escalation approaches respectively. Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback fires on the first stub and the fallback provider completes the turn.
79 lines
4 KiB
Python
79 lines
4 KiB
Python
"""Per-attempt recovery bookkeeping for the conversation turn loop.
|
|
|
|
The inner retry loop in ``run_conversation`` (``while retry_count <
|
|
max_retries``) makes several distinct recovery attempts on a single model API
|
|
call: a credential-pool 429 retry, a per-provider OAuth refresh (codex,
|
|
anthropic, nous, copilot), a long-context compression restart, a length-
|
|
continuation restart, and a handful of format-recovery branches (thinking-
|
|
signature stripping, multimodal-tool-content stripping, llama.cpp grammar
|
|
fallback, image shrink, invalid-encrypted-content, 1M-beta header).
|
|
|
|
Each of those branches is guarded by a one-shot boolean so it fires at most
|
|
once per attempt. They used to be ~16 bare ``*_attempted`` / ``has_retried_*``
|
|
/ ``restart_with_*`` locals declared inline before the loop and threaded
|
|
through its 2,400-line body. ``TurnRetryState`` collapses them into one object
|
|
the loop mutates in place (``state.codex_auth_retry_attempted = True``), giving
|
|
the recovery bookkeeping a single named, testable home.
|
|
|
|
Loop-control variables (``retry_count``, ``max_retries``,
|
|
``max_compression_attempts``) intentionally stay as plain locals — they are the
|
|
``while`` mechanics, not recovery bookkeeping, and putting them on the object
|
|
would add indirection without clarifying anything.
|
|
|
|
This module is dependency-free so it can be unit-tested in isolation and
|
|
imported by the turn loop without an import cycle.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from dataclasses import dataclass, fields
|
|
|
|
|
|
@dataclass
|
|
class TurnRetryState:
|
|
"""One-shot recovery guards + restart signals for a single API-call attempt.
|
|
|
|
A fresh instance is created for each iteration of the outer turn loop
|
|
(once per ``api_call_count``). Each guard fires its recovery branch at most
|
|
once; the ``restart_with_*`` signals are read by the loop after the attempt
|
|
to decide whether to rebuild the request and retry.
|
|
"""
|
|
|
|
# ── Per-provider OAuth / credential refresh guards ───────────────────
|
|
codex_auth_retry_attempted: bool = False
|
|
anthropic_auth_retry_attempted: bool = False
|
|
nous_auth_retry_attempted: bool = False
|
|
nous_paid_entitlement_refresh_attempted: bool = False
|
|
copilot_auth_retry_attempted: bool = False
|
|
|
|
# ── Format / payload recovery guards ─────────────────────────────────
|
|
thinking_sig_retry_attempted: bool = False
|
|
invalid_encrypted_content_retry_attempted: bool = False
|
|
image_shrink_retry_attempted: bool = False
|
|
multimodal_tool_content_retry_attempted: bool = False
|
|
oauth_1m_beta_retry_attempted: bool = False
|
|
llama_cpp_grammar_retry_attempted: bool = False
|
|
|
|
# ── Transport / rate-limit recovery ──────────────────────────────────
|
|
primary_recovery_attempted: bool = False
|
|
has_retried_429: bool = False
|
|
|
|
# ── Auth-failure provider failover ───────────────────────────────────
|
|
# Set once we've escalated a persistent 401/403 (after the per-provider
|
|
# credential-refresh attempt above failed) to the fallback chain, so we
|
|
# don't loop on the same auth failover within one attempt.
|
|
auth_failover_attempted: bool = False
|
|
|
|
# ── Restart signals (read by the outer loop after the attempt) ───────
|
|
restart_with_compressed_messages: bool = False
|
|
restart_with_length_continuation: bool = False
|
|
# Set when a content-filter stream stall (e.g. MiniMax "new_sensitive")
|
|
# has been escalated to the fallback chain: the partial-stream content
|
|
# was rolled back off ``messages`` and the loop should re-issue the API
|
|
# call against the newly-activated provider (#32421).
|
|
restart_with_rebuilt_messages: bool = False
|
|
|
|
def __iter__(self):
|
|
# Convenience for debugging / tests: iterate (name, value) pairs.
|
|
for f in fields(self):
|
|
yield f.name, getattr(self, f.name)
|