test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344)

Eleven new tests pinning the #29344 fix.  Layout mirrors the existing
"Fix D" entitlement section so the bad-credentials disambiguator
sits alongside the entitlement-block tests it complements.

Classifier-level coverage:

* ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix``
  — verbatim shape from the reporter's wire capture
  (``{code: 'caller does not have permission', error: 'OAuth2 access
  token could not be validated. [WKE=unauthenticated:bad-credentials]'}``)
  ↦ classifier must return False so the refresh path runs.
* ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape``
  — same body after ``_extract_api_error_context`` has rewritten it
  to ``{reason, message}``.  The disambiguator must fire in BOTH
  shapes; without this guard the production call site at
  ``_recover_with_credential_pool`` (which goes through the
  normalised extractor) would still misclassify.
* ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant``
  — parametrised forward-compat: ``bad-credentials``,
  ``expired-token``, ``revoked``, ``some-future-reason``.  xAI
  documents the prefix as stable, the suffix after the colon as a
  reason code that can grow; every variant under
  ``unauthenticated:`` must route to refresh.
* ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone``
  — belt-and-braces guard: if a future API revision drops the WKE
  suffix but keeps "OAuth2 access token could not be validated", we
  still classify correctly.
* ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords``
  — defensive: if a body ever carries BOTH the WKE suffix and
  entitlement language, the WKE signal wins.  Auth is recoverable;
  entitlement isn't, and a refreshed token will resurface the
  entitlement message on the next request.
* ``test_is_entitlement_failure_case_insensitive_wke_match`` —
  pins that the classifier lowercases the haystack so a future xAI
  build that uppercases the prefix doesn't reintroduce the bug.

Recovery-path coverage (end-to-end through
``_recover_with_credential_pool``):

* ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403``
  — the headline test the reporter requested: a bad-credentials 403
  with the exact wire body must call ``try_refresh_current()``
  exactly once and ``_swap_credential`` once.  Pre-fix this returned
  ``(False, _)`` because the entitlement classifier over-matched and
  short-circuited the refresh path.
* ``test_recover_with_credential_pool_still_blocks_real_entitlement``
  — companion regression guard for #26847: a pure unsubscribed-
  account body (no WKE suffix, no OAuth2-validation phrase) must
  still surface as entitlement and skip refresh.  The new
  disambiguator must not weaken the original loop-protection it
  was added to preserve.

The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the
existing ``MagicMock`` patterns from the surrounding tests so the
new section reads as a natural extension of "Fix D" rather than a
separate test file.
This commit is contained in:
xxxigm 2026-05-20 21:48:45 +07:00 committed by Teknium
parent 8b3cb930c9
commit b5ea6a5c80

View file

@ -621,6 +621,246 @@ def test_recover_with_credential_pool_still_refreshes_genuine_auth_failure():
assert refresh_calls["n"] == 1
# ---------------------------------------------------------------------------
# Fix D-bis: bad-credentials 403 must NOT be classified as entitlement (#29344)
#
# xAI returns the same permission-denied ``code`` text for two distinct
# conditions: unsubscribed account vs. stale OAuth access token. The
# ``error`` field's ``[WKE=unauthenticated:...]`` suffix (and the
# accompanying "OAuth2 access token could not be validated" phrasing) is
# xAI's authoritative disambiguator — when present, the body is an auth
# failure, not entitlement, and the credential-pool refresh path must
# run. Pre-fix, long-running TUI sessions stuck on a stale token
# surfaced as a non-retryable client error; the workaround was to exit
# and reopen the TUI so the startup-resolve path refreshed.
# ---------------------------------------------------------------------------
def test_is_entitlement_failure_false_for_bad_credentials_wke_suffix():
"""403 with ``[WKE=unauthenticated:bad-credentials]`` is auth, not entitlement.
Verbatim shape from the #29344 reporter — the ``code`` text matches
the entitlement permission-denied heuristic, but the ``error`` field
carries xAI's explicit "this is a credential validation failure"
signal. Classifier must honor it.
"""
from run_agent import AIAgent
assert not AIAgent._is_entitlement_failure(
{
"code": "The caller does not have permission to execute the specified operation",
"error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]",
},
403,
)
def test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape():
"""The same body after ``_extract_api_error_context`` normalisation.
Real runtime paths feed the classifier through
``_extract_api_error_context``, which converts the raw body to
``{message, reason, reset_at}``. The disambiguator must fire in
BOTH the raw-body shape (test above) and the normalised shape so
the fix actually reaches the production call site at
``_recover_with_credential_pool``.
"""
from run_agent import AIAgent
assert not AIAgent._is_entitlement_failure(
{
"reason": "The caller does not have permission to execute the specified operation",
"message": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]",
},
403,
)
@pytest.mark.parametrize("wke_variant", [
# The headline variant — what xAI returns today.
"[WKE=unauthenticated:bad-credentials]",
# Forward-compat: xAI documents the WKE prefix as a stable shape,
# the suffix after the colon is the "reason code" and could grow
# new values. Anything under ``unauthenticated:`` must route to
# the refresh path.
"[WKE=unauthenticated:expired-token]",
"[WKE=unauthenticated:revoked]",
"[WKE=unauthenticated:some-future-reason]",
])
def test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant(wke_variant):
from run_agent import AIAgent
assert not AIAgent._is_entitlement_failure(
{
"code": "The caller does not have permission to execute the specified operation",
"error": f"Token rejected. {wke_variant}",
},
403,
)
def test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone():
"""Second disambiguator: the "OAuth2 access token could not be
validated" phrase by itself (no WKE suffix) must also route to
refresh. This is a belt-and-braces guard against xAI dropping or
reformatting the WKE suffix in a future API revision without
changing the human-readable error text."""
from run_agent import AIAgent
assert not AIAgent._is_entitlement_failure(
{
"code": "The caller does not have permission to execute the specified operation",
"error": "The OAuth2 access token could not be validated.",
},
403,
)
def test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords():
"""Defensive: if a future xAI body somehow carries BOTH the WKE
suffix AND entitlement language, the WKE signal wins. Auth is
recoverable; entitlement isn't. If the refreshed token still
can't access the resource, the next 403 (without WKE) lands on
the entitlement path correctly."""
from run_agent import AIAgent
assert not AIAgent._is_entitlement_failure(
{
"code": "The caller does not have permission to execute the specified operation",
"error": (
"do not have an active Grok subscription. "
"[WKE=unauthenticated:bad-credentials]"
),
},
403,
)
def test_is_entitlement_failure_case_insensitive_wke_match():
"""Substring match is case-insensitive — the classifier lowercases
everything before matching, so a future xAI build that uppercases
the prefix wouldn't reintroduce the misclassification."""
from run_agent import AIAgent
assert not AIAgent._is_entitlement_failure(
{
"code": "The caller does not have permission to execute the specified operation",
"error": "[wke=Unauthenticated:Bad-Credentials]",
},
403,
)
def test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403():
"""End-to-end #29344: a bad-credentials 403 from xai-oauth MUST
call ``try_refresh_current()`` so the long-running TUI session
recovers without an exit/reopen cycle.
Mirrors the scaffolding of
``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``
but with the exact 403 body shape xAI ships for stale tokens
the very body that pre-fix tripped the entitlement classifier
and short-circuited the refresh path.
"""
from run_agent import AIAgent
from agent.error_classifier import FailoverReason
agent = _make_codex_agent()
refresh_calls = {"n": 0}
class _FakePool:
def try_refresh_current(self):
refresh_calls["n"] += 1
entry = MagicMock()
entry.id = "entry_refreshed_after_stale"
return entry
def mark_exhausted_and_rotate(self, **_kwargs):
return None
def has_available(self):
return False
agent._credential_pool = _FakePool()
agent._swap_credential = MagicMock()
# Normalised shape that ``_extract_api_error_context`` would
# produce for the reporter's wire-level body.
error_context = {
"reason": (
"The caller does not have permission to execute the specified operation"
),
"message": (
"The OAuth2 access token could not be validated. "
"[WKE=unauthenticated:bad-credentials]"
),
}
recovered, _retried_429 = agent._recover_with_credential_pool(
status_code=403,
has_retried_429=False,
classified_reason=FailoverReason.auth,
error_context=error_context,
)
assert recovered is True, (
"Stale OAuth token (bad-credentials 403) must trigger refresh — "
"pre-fix this returned False because the entitlement classifier "
"over-matched on the permission-denied code text"
)
assert refresh_calls["n"] == 1, "try_refresh_current must run exactly once"
agent._swap_credential.assert_called_once()
def test_recover_with_credential_pool_still_blocks_real_entitlement():
"""Companion regression guard for the #29344 fix: the original
#26847 protection — entitlement 403 must NOT refresh — must
survive the new disambiguator. A real unsubscribed-account body
has no WKE suffix and no OAuth2-validation phrase, so the
classifier still classifies it as entitlement and short-circuits."""
from run_agent import AIAgent
from agent.error_classifier import FailoverReason
agent = _make_codex_agent()
refresh_calls = {"n": 0}
class _FakePool:
def try_refresh_current(self):
refresh_calls["n"] += 1
return MagicMock(id="should_not_be_called")
def mark_exhausted_and_rotate(self, **_kwargs):
return None
def has_available(self):
return False
agent._credential_pool = _FakePool()
# Pure entitlement body — no WKE suffix, no OAuth2 phrase.
error_context = {
"reason": (
"The caller does not have permission to execute the specified operation"
),
"message": (
"You have either run out of available resources or do not have an "
"active Grok subscription. Manage at https://grok.com"
),
}
recovered, _retried_429 = agent._recover_with_credential_pool(
status_code=403,
has_retried_429=False,
classified_reason=FailoverReason.auth,
error_context=error_context,
)
assert recovered is False, "Entitlement 403 must surface, not refresh"
assert refresh_calls["n"] == 0
# ---------------------------------------------------------------------------
# Fix E: grok-4.3 context length must be 1M, not 256K
# ---------------------------------------------------------------------------