mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-05 07:41:39 +00:00
fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions
When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403,
token revoked or reused), _mark_exhausted was called but auth.json was left with
the dead credentials. On the next session, _seed_from_singletons re-read
auth.json and re-seeded the pool with the same revoked token, triggering the
same terminal failure in a loop.
Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine
block in _refresh_entry: when a terminal error is detected and auth.json holds
no newer tokens, clear access_token/refresh_token from auth.json and remove all
device_code-sourced pool entries from memory. Mirrors the Nous quarantine added
in c90556262 and the xAI quarantine in #28116.
Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure,
matching the xAI and Nous patterns, to avoid refresh_token_reused races when
multiple Hermes processes share the same auth.json singleton.
Salvaged from #27911 by @EloquentBrush0x — contributor's branch was severely
stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems);
fix re-applied surgically on current main with their predicate and tests preserved.
This commit is contained in:
parent
9aae59feab
commit
b570e0fdd0
3 changed files with 237 additions and 0 deletions
|
|
@ -4061,6 +4061,29 @@ def _is_terminal_xai_oauth_refresh_error(exc: Exception) -> bool:
|
|||
)
|
||||
|
||||
|
||||
def _is_terminal_codex_oauth_refresh_error(exc: Exception) -> bool:
|
||||
"""True when retrying the same Codex OAuth refresh token cannot succeed.
|
||||
|
||||
``codex_refresh_failed`` covers HTTP 400/401/403 from the token endpoint
|
||||
(invalid_grant, token revoked, refresh_token_reused).
|
||||
``codex_auth_missing_refresh_token`` means the pool entry has no refresh
|
||||
token at all — retrying will never work.
|
||||
Both carry ``relogin_required=True``; transient failures (429, 5xx) do not.
|
||||
"""
|
||||
return (
|
||||
isinstance(exc, AuthError)
|
||||
and exc.provider == "openai-codex"
|
||||
and exc.code in {
|
||||
"codex_refresh_failed",
|
||||
"codex_auth_missing_refresh_token",
|
||||
"invalid_grant",
|
||||
"invalid_token",
|
||||
"refresh_token_reused",
|
||||
}
|
||||
and bool(exc.relogin_required)
|
||||
)
|
||||
|
||||
|
||||
def _quarantine_nous_oauth_state(
|
||||
state: Dict[str, Any],
|
||||
error: AuthError,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue