mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-22 10:32:00 +00:00
When the active provider returns a 401/403 that survives its per-provider
credential-refresh attempt (revoked OAuth, blocked/expired key, or an
account pinned to a dead/staging inference endpoint), the conversation
loop now escalates to the configured fallback chain instead of dead-ending.
Before: the generic failover dispatch fired only for {rate_limit, billing};
auth/auth_permanent fell through to 'switch providers manually' advice and
never called _try_activate_fallback(). A user whose primary credential was
broken kept thrashing on the same dead credential every turn — the main
agent appeared 'stuck in fallback mode' while never actually failing over.
This also affected auxiliary tasks (compression, vision, title-gen), since
auto-resolved aux follows the main provider.
After: a persistent auth failure with a configured fallback chain switches
to the next provider (mirroring the rate-limit/billing failover path),
guarded one-shot per attempt by TurnRetryState.auth_failover_attempted.
When no fallback is configured the behavior is unchanged — it falls through
to the existing terminal handling and provider-specific troubleshooting
guidance.
Tests: test_auth_provider_failover.py — 401/403 classify as auth, the
gating condition fires only with a chain present + guard unset, the guard
blocks repeats, and non-auth (500) errors do not trigger auth failover.
74 lines
3.7 KiB
Python
74 lines
3.7 KiB
Python
"""Per-attempt recovery bookkeeping for the conversation turn loop.
|
|
|
|
The inner retry loop in ``run_conversation`` (``while retry_count <
|
|
max_retries``) makes several distinct recovery attempts on a single model API
|
|
call: a credential-pool 429 retry, a per-provider OAuth refresh (codex,
|
|
anthropic, nous, copilot), a long-context compression restart, a length-
|
|
continuation restart, and a handful of format-recovery branches (thinking-
|
|
signature stripping, multimodal-tool-content stripping, llama.cpp grammar
|
|
fallback, image shrink, invalid-encrypted-content, 1M-beta header).
|
|
|
|
Each of those branches is guarded by a one-shot boolean so it fires at most
|
|
once per attempt. They used to be ~16 bare ``*_attempted`` / ``has_retried_*``
|
|
/ ``restart_with_*`` locals declared inline before the loop and threaded
|
|
through its 2,400-line body. ``TurnRetryState`` collapses them into one object
|
|
the loop mutates in place (``state.codex_auth_retry_attempted = True``), giving
|
|
the recovery bookkeeping a single named, testable home.
|
|
|
|
Loop-control variables (``retry_count``, ``max_retries``,
|
|
``max_compression_attempts``) intentionally stay as plain locals — they are the
|
|
``while`` mechanics, not recovery bookkeeping, and putting them on the object
|
|
would add indirection without clarifying anything.
|
|
|
|
This module is dependency-free so it can be unit-tested in isolation and
|
|
imported by the turn loop without an import cycle.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from dataclasses import dataclass, fields
|
|
|
|
|
|
@dataclass
|
|
class TurnRetryState:
|
|
"""One-shot recovery guards + restart signals for a single API-call attempt.
|
|
|
|
A fresh instance is created for each iteration of the outer turn loop
|
|
(once per ``api_call_count``). Each guard fires its recovery branch at most
|
|
once; the ``restart_with_*`` signals are read by the loop after the attempt
|
|
to decide whether to rebuild the request and retry.
|
|
"""
|
|
|
|
# ── Per-provider OAuth / credential refresh guards ───────────────────
|
|
codex_auth_retry_attempted: bool = False
|
|
anthropic_auth_retry_attempted: bool = False
|
|
nous_auth_retry_attempted: bool = False
|
|
nous_paid_entitlement_refresh_attempted: bool = False
|
|
copilot_auth_retry_attempted: bool = False
|
|
|
|
# ── Format / payload recovery guards ─────────────────────────────────
|
|
thinking_sig_retry_attempted: bool = False
|
|
invalid_encrypted_content_retry_attempted: bool = False
|
|
image_shrink_retry_attempted: bool = False
|
|
multimodal_tool_content_retry_attempted: bool = False
|
|
oauth_1m_beta_retry_attempted: bool = False
|
|
llama_cpp_grammar_retry_attempted: bool = False
|
|
|
|
# ── Transport / rate-limit recovery ──────────────────────────────────
|
|
primary_recovery_attempted: bool = False
|
|
has_retried_429: bool = False
|
|
|
|
# ── Auth-failure provider failover ───────────────────────────────────
|
|
# Set once we've escalated a persistent 401/403 (after the per-provider
|
|
# credential-refresh attempt above failed) to the fallback chain, so we
|
|
# don't loop on the same auth failover within one attempt.
|
|
auth_failover_attempted: bool = False
|
|
|
|
# ── Restart signals (read by the outer loop after the attempt) ───────
|
|
restart_with_compressed_messages: bool = False
|
|
restart_with_length_continuation: bool = False
|
|
|
|
def __iter__(self):
|
|
# Convenience for debugging / tests: iterate (name, value) pairs.
|
|
for f in fields(self):
|
|
yield f.name, getattr(self, f.name)
|