fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages
like "The service may be temporarily overloaded, please try again later"
to signal **server-wide overload**, not per-credential rate limiting.
The two conditions need different recovery strategies:

| Reason | Correct strategy |
|---|---|
| ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate |
| ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) |

Before this fix:

- ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit``
  regardless of message body.
- ``FailoverReason.overloaded`` already existed as an enum value (and was
  produced for 503/529) but no production path emitted it for 429.
- ``_recover_with_credential_pool`` had no handler for ``overloaded`` —
  an ``overloaded`` classification fell through to the default no-op
  ``return False, has_retried_429`` line.

Net effect: every overload-language 429 burned a ``has_retried_429``
slot on the same saturated credential before the rotation happened, and
cron jobs (one turn each) often used their entire execution on that
wasted retry.

### Fix

Two narrow, additive changes:

1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list
   containing provider-language overload phrases (``"temporarily
   overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …).
   The 429 branch now checks the error body against this list and
   emits ``FailoverReason.overloaded`` when matched, preserving
   ``rate_limit`` for everything else.  Kept phrases narrow and
   provider-language-flavoured so a normal rate-limit message
   ("you have been rate-limited") doesn't hit this bucket.

2. ``run_agent.py::_recover_with_credential_pool`` — new branch for
   ``effective_reason == FailoverReason.overloaded``.  Same shape as
   the existing ``billing`` handler: rotate immediately via
   ``mark_exhausted_and_rotate(...)``, no retry-on-same-credential
   first.  The 503/529 ``overloaded`` classifications produced by the
   existing code now also flow through this branch.

### Tests (15 new, all passing on py3.11 venv)

``tests/agent/test_error_classifier.py`` — 7 cases:

- The exact #15297 Z.AI message → ``overloaded``
- 9-phrase parametrised matrix (server/service/upstream overloaded,
  at/over capacity, "currently overloaded", …) → all ``overloaded``
- Plain "Rate limit reached for requests per minute" 429 → still
  ``rate_limit`` (regression guard against the disambiguation
  silently broadening)
- Plain "Too Many Requests" 429 → still ``rate_limit``
- 503 path unchanged → still ``overloaded`` (negative control)

``tests/agent/test_credential_pool_routing.py`` —
``TestPoolOverloadedRotation`` (4 cases):

- Overloaded on first failure → rotates immediately (the fix)
- Overloaded with ``has_retried_429=True`` → still rotates (no retry-
  flag dependence — the rate-limit gate is specific to ``rate_limit``)
- Single-entry pool overload → ``recovered=False`` so outer fallback
  takes over rather than spinning
- ``rate_limit`` on first failure → still uses retry-first path
  (regression guard against the new branch broadening)

**Verified regression guards**: temporarily reverted the classifier's
overload branch; 10 of the 11 new classifier tests correctly failed
with clear assertion messages.  Restored fix → all 145 tests pass
(``test_error_classifier.py`` + ``test_credential_pool_routing.py``,
existing + new).

Closes #15297

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Brian D. Evans 2026-04-24 16:39:48 -07:00
parent 4fade39c90
commit 24461ec0bc
4 changed files with 265 additions and 1 deletions

View file

@ -147,6 +147,29 @@ _PAYLOAD_TOO_LARGE_PATTERNS = [
"error code: 413",
]
# Patterns that indicate provider-side overload, NOT per-credential rate
# limiting. Some providers (e.g. Z.AI / Zhipu) return HTTP 429 for
# server-wide overload — same status code as a true rate limit but the
# correct recovery is "rotate immediately, the whole endpoint is down"
# rather than "back off and retry the same key". Match against the
# message body alongside the 429 status code in the classifier. Keep
# phrases narrow and provider-language-flavoured so a normal rate-limit
# message ("you have been rate-limited") doesn't accidentally hit this
# bucket — only true overload language does (#15297).
_OVERLOADED_PATTERNS = [
"temporarily overloaded",
"server is overloaded",
"server overloaded",
"service overloaded",
"service is overloaded",
"service may be temporarily overloaded",
"upstream overloaded",
"is overloaded, please try again",
"at capacity",
"over capacity",
"currently overloaded",
]
# Context overflow patterns
_CONTEXT_OVERFLOW_PATTERNS = [
"context length",
@ -589,7 +612,20 @@ def _classify_by_status(
)
if status_code == 429:
# Already checked long_context_tier above; this is a normal rate limit
# Already checked long_context_tier above. Some providers (notably
# Z.AI / Zhipu) reuse 429 for server-wide overload — correct recovery
# is "skip retry, rotate immediately" rather than "back off, then
# rotate" (which is the right answer for a per-credential rate
# limit). Disambiguate on the error body so the retry loop picks
# the matching strategy via the corresponding ``FailoverReason``
# handler (#15297).
if any(p in error_msg for p in _OVERLOADED_PATTERNS):
return result_fn(
FailoverReason.overloaded,
retryable=True,
should_rotate_credential=True,
should_fallback=True,
)
return result_fn(
FailoverReason.rate_limit,
retryable=True,