fix(xai-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions

When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403,
i.e. revoked or reused refresh token), _refresh_entry's existing race-
recovery path re-syncs from auth.json and returns if another process has
already rotated the tokens.  If auth.json still holds the same stale
token pair, the function fell through to _mark_exhausted — leaving the
dead credentials in auth.json.  On the next Hermes startup _seed_from_singletons
re-seeded the pool from those stale tokens, causing the same failure loop
on every session.

Fix: after the auth.json re-sync check in the xAI-oauth error handler,
detect terminal errors with the new _is_terminal_xai_oauth_refresh_error
helper and apply a quarantine:
- Clear access_token and refresh_token from providers["xai-oauth"]["tokens"]
  in auth.json so they are not re-seeded.
- Write a last_auth_error entry for hermes doctor / auth status diagnostics.
- Remove all loopback_pkce entries from the in-memory pool so the current
  session stops retrying with the dead credentials.

Mirrors the identical quarantine already in place for Nous OAuth
(c90556262).

Closes the parity gap introduced when c90556262 added Nous-only terminal
error handling without a corresponding xAI-oauth path.
This commit is contained in:
EloquentBrush0x 2026-05-18 12:15:00 +03:00 committed by Teknium
parent 226680500d
commit 5e40f83cb7
3 changed files with 200 additions and 1 deletions

View file

@ -4044,6 +4044,23 @@ def _is_terminal_nous_refresh_error(exc: Exception) -> bool:
)
def _is_terminal_xai_oauth_refresh_error(exc: Exception) -> bool:
"""True when retrying the same xAI OAuth refresh token cannot succeed.
``xai_refresh_failed`` covers HTTP 400/401/403 from the token endpoint
(invalid_grant, token revoked, refresh_token_reused).
``xai_auth_missing_refresh_token`` means the pool entry has no refresh
token at all retrying will never work.
Both carry ``relogin_required=True``; transient failures (429, 5xx) do not.
"""
return (
isinstance(exc, AuthError)
and exc.provider == "xai-oauth"
and exc.code in {"xai_refresh_failed", "xai_auth_missing_refresh_token"}
and bool(exc.relogin_required)
)
def _quarantine_nous_oauth_state(
state: Dict[str, Any],
error: AuthError,