fix(auth): self-heal Codex refresh_token rotation by reimporting from ~/.codex

Hermes keeps its own copy of the Codex OAuth token per profile and at the
top level, separate from the Codex CLI's ~/.codex/auth.json. OAuth
refresh_tokens are single-use, so when the Codex CLI (or another Hermes
process) rotates the shared token, the frozen copy's refresh_token goes
stale and refresh_codex_oauth_pure fails with a relogin-required error
(invalid_grant / refresh_token_reused / 401). Today that surfaces as a hard
401 on the turn — idle profiles and desktop sessions 401 "token_expired"
until a manual re-auth — even though ~/.codex/auth.json holds a fresh token.

_refresh_codex_auth_tokens now falls back to _import_codex_cli_tokens() (the
canonical Codex CLI store) when the stored refresh_token is rejected, adopts
and persists the fresh token, and lets the in-flight retry succeed. This
complements PR #6525 (force relogin on 401/403): we attempt automatic
recovery before surfacing a relogin prompt. Transient failures (e.g. 429
quota, relogin_required=False) are never self-healed — the stored token is
still valid there — so they re-raise unchanged, and the happy path is
untouched.

Adds tests/hermes_cli/test_auth_codex_self_heal.py covering: self-heal on
invalid_grant, no self-heal on 429 quota, re-raise when ~/.codex is absent,
and happy-path-unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Kennedy Umege 2026-06-12 23:47:15 +01:00 committed by Teknium
parent 2681c5a12d
commit bd66e7e3fb
2 changed files with 151 additions and 5 deletions

View file

@ -3660,11 +3660,37 @@ def _refresh_codex_auth_tokens(
Saves the new tokens to Hermes auth store automatically.
"""
refreshed = refresh_codex_oauth_pure(
str(tokens.get("access_token", "") or ""),
str(tokens.get("refresh_token", "") or ""),
timeout_seconds=timeout_seconds,
)
try:
refreshed = refresh_codex_oauth_pure(
str(tokens.get("access_token", "") or ""),
str(tokens.get("refresh_token", "") or ""),
timeout_seconds=timeout_seconds,
)
except AuthError as exc:
# Self-heal cross-store refresh_token rotation. Hermes keeps its OWN
# Codex OAuth token (per profile + top-level), separate from the Codex
# CLI's ~/.codex/auth.json. OAuth refresh_tokens are single-use, so when
# the Codex CLI (or another Hermes process) rotates the shared token,
# this frozen copy's refresh_token goes stale and the refresh fails with
# a relogin-required error (invalid_grant / refresh_token_reused / 401).
# Before surfacing that as a hard 401 to the turn, adopt the canonical
# fresh token from ~/.codex/auth.json (the Codex CLI keeps it current) so
# idle profiles / desktop sessions recover automatically instead of
# 401'ing until a manual re-auth. Transient failures (e.g. 429 quota)
# keep relogin_required=False — the stored token is still valid there, so
# we never self-heal those and re-raise unchanged.
if not getattr(exc, "relogin_required", False):
raise
imported = _import_codex_cli_tokens()
if not (imported and str(imported.get("access_token", "") or "").strip()):
raise
logger.info(
"Codex refresh_token rejected (%s); recovered from ~/.codex/auth.json.",
getattr(exc, "code", None) or "auth_error",
)
_save_codex_tokens(imported)
return dict(imported)
updated_tokens = dict(tokens)
updated_tokens["access_token"] = refreshed["access_token"]
updated_tokens["refresh_token"] = refreshed["refresh_token"]