fix(agent): abort on HTTP 402 after pool rotation and fallback fail (#31443)

Closes #31273.

HTTP 402 (insufficient credits) was retried up to agent.api_max_retries
times (default 3), burning paid requests against an exhausted balance.
Real-world impact: ~$40 in 48h on a 24/7 Telegram+Discord gateway.

Root cause: FailoverReason.billing was in the is_client_error
exclusion set in agent/conversation_loop.py, which prevents the
non-retryable-abort branch from firing.

By the time control reaches that predicate:
  * credential-pool rotation has already run for billing and either
    continued the loop or returned False (pool exhausted/absent)
  * the eager-fallback branch has also fired on billing and either
    continued the loop or fell through (no fallback configured)

Falling through to the backoff retry from here has no recovery
mechanism left — it just burns more paid requests.  Removing billing
from the exclusion set makes 402 abort cleanly once pool+fallback
recovery has failed, mirroring how 401/403 (also should_fallback=True)
already behave.

Added tests/run_agent/test_31273_402_not_retried.py which mirrors the
is_client_error predicate shape from the source and asserts the
invariant (plus a source-inspection guard against accidental
re-introduction).
This commit is contained in:
Teknium 2026-05-24 15:14:13 -07:00 committed by GitHub
parent 5b52e26d18
commit 8065e70274
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 162 additions and 1 deletions

View file

@ -2806,6 +2806,21 @@ def run_conversation(
# retryable=True mapping takes effect instead.
and not isinstance(api_error, ssl.SSLError)
)
# ``FailoverReason.billing`` (HTTP 402) is NOT in this
# exclusion set. By the time we reach this block:
# • credential-pool rotation (line ~2031) has already
# fired for billing and either ``continue``d or
# returned (False, ...) — pool is exhausted or absent.
# • the eager-fallback branch above (line ~2422) also
# fires on billing and ``continue``s if a fallback
# provider is configured.
# Falling through to here means BOTH recovery paths
# gave up. Treating 402 as retryable from this point
# just burns more paid requests against a depleted
# balance with no recovery mechanism left — see #31273
# (real-world: ~$40 in 48h on a 24/7 gateway). Aborting
# mirrors how 401/403 (also ``should_fallback=True``)
# already behave once their recovery paths have failed.
is_client_error = (
is_local_validation_error
or (
@ -2813,7 +2828,6 @@ def run_conversation(
and not classified.should_compress
and classified.reason not in {
FailoverReason.rate_limit,
FailoverReason.billing,
FailoverReason.overloaded,
FailoverReason.context_overflow,
FailoverReason.payload_too_large,