fix(auth): self-heal Codex refresh_token rotation by reimporting from ~/.codex

Hermes keeps its own copy of the Codex OAuth token per profile and at the top level, separate from the Codex CLI's ~/.codex/auth.json. OAuth refresh_tokens are single-use, so when the Codex CLI (or another Hermes process) rotates the shared token, the frozen copy's refresh_token goes stale and refresh_codex_oauth_pure fails with a relogin-required error (invalid_grant / refresh_token_reused / 401). Today that surfaces as a hard 401 on the turn — idle profiles and desktop sessions 401 "token_expired" until a manual re-auth — even though ~/.codex/auth.json holds a fresh token. _refresh_codex_auth_tokens now falls back to _import_codex_cli_tokens() (the canonical Codex CLI store) when the stored refresh_token is rejected, adopts and persists the fresh token, and lets the in-flight retry succeed. This complements PR #6525 (force relogin on 401/403): we attempt automatic recovery before surfacing a relogin prompt. Transient failures (e.g. 429 quota, relogin_required=False) are never self-healed — the stored token is still valid there — so they re-raise unchanged, and the happy path is untouched. Adds tests/hermes_cli/test_auth_codex_self_heal.py covering: self-heal on invalid_grant, no self-heal on 429 quota, re-raise when ~/.codex is absent, and happy-path-unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 09:31:37 +00:00 · 2026-06-12 23:47:15 +01:00 · 2026-06-12 23:47:15 +01:00 · bd66e7e3fb
commit bd66e7e3fb
parent 2681c5a12d
2 changed files with 151 additions and 5 deletions
--- a/hermes_cli/auth.py
+++ b/hermes_cli/auth.py
@ -3660,11 +3660,37 @@ def _refresh_codex_auth_tokens(
    
    Saves the new tokens to Hermes auth store automatically.
    """
-    refreshed = refresh_codex_oauth_pure(
-        str(tokens.get("access_token", "") or ""),
-        str(tokens.get("refresh_token", "") or ""),
-        timeout_seconds=timeout_seconds,
-    )
+    try:
+        refreshed = refresh_codex_oauth_pure(
+            str(tokens.get("access_token", "") or ""),
+            str(tokens.get("refresh_token", "") or ""),
+            timeout_seconds=timeout_seconds,
+        )
+    except AuthError as exc:
+        # Self-heal cross-store refresh_token rotation. Hermes keeps its OWN
+        # Codex OAuth token (per profile + top-level), separate from the Codex
+        # CLI's ~/.codex/auth.json. OAuth refresh_tokens are single-use, so when
+        # the Codex CLI (or another Hermes process) rotates the shared token,
+        # this frozen copy's refresh_token goes stale and the refresh fails with
+        # a relogin-required error (invalid_grant / refresh_token_reused / 401).
+        # Before surfacing that as a hard 401 to the turn, adopt the canonical
+        # fresh token from ~/.codex/auth.json (the Codex CLI keeps it current) so
+        # idle profiles / desktop sessions recover automatically instead of
+        # 401'ing until a manual re-auth. Transient failures (e.g. 429 quota)
+        # keep relogin_required=False — the stored token is still valid there, so
+        # we never self-heal those and re-raise unchanged.
+        if not getattr(exc, "relogin_required", False):
+            raise
+        imported = _import_codex_cli_tokens()
+        if not (imported and str(imported.get("access_token", "") or "").strip()):
+            raise
+        logger.info(
+            "Codex refresh_token rejected (%s); recovered from ~/.codex/auth.json.",
+            getattr(exc, "code", None) or "auth_error",
+        )
+        _save_codex_tokens(imported)
+        return dict(imported)
+
    updated_tokens = dict(tokens)
    updated_tokens["access_token"] = refreshed["access_token"]
    updated_tokens["refresh_token"] = refreshed["refresh_token"]