fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages like "The service may be temporarily overloaded, please try again later" to signal **server-wide overload**, not per-credential rate limiting. The two conditions need different recovery strategies: | Reason | Correct strategy | |---|---| | ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate | | ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) | Before this fix: - ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit`` regardless of message body. - ``FailoverReason.overloaded`` already existed as an enum value (and was produced for 503/529) but no production path emitted it for 429. - ``_recover_with_credential_pool`` had no handler for ``overloaded`` — an ``overloaded`` classification fell through to the default no-op ``return False, has_retried_429`` line. Net effect: every overload-language 429 burned a ``has_retried_429`` slot on the same saturated credential before the rotation happened, and cron jobs (one turn each) often used their entire execution on that wasted retry. ### Fix Two narrow, additive changes: 1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list containing provider-language overload phrases (``"temporarily overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …). The 429 branch now checks the error body against this list and emits ``FailoverReason.overloaded`` when matched, preserving ``rate_limit`` for everything else. Kept phrases narrow and provider-language-flavoured so a normal rate-limit message ("you have been rate-limited") doesn't hit this bucket. 2. ``run_agent.py::_recover_with_credential_pool`` — new branch for ``effective_reason == FailoverReason.overloaded``. Same shape as the existing ``billing`` handler: rotate immediately via ``mark_exhausted_and_rotate(...)``, no retry-on-same-credential first. The 503/529 ``overloaded`` classifications produced by the existing code now also flow through this branch. ### Tests (15 new, all passing on py3.11 venv) ``tests/agent/test_error_classifier.py`` — 7 cases: - The exact #15297 Z.AI message → ``overloaded`` - 9-phrase parametrised matrix (server/service/upstream overloaded, at/over capacity, "currently overloaded", …) → all ``overloaded`` - Plain "Rate limit reached for requests per minute" 429 → still ``rate_limit`` (regression guard against the disambiguation silently broadening) - Plain "Too Many Requests" 429 → still ``rate_limit`` - 503 path unchanged → still ``overloaded`` (negative control) ``tests/agent/test_credential_pool_routing.py`` — ``TestPoolOverloadedRotation`` (4 cases): - Overloaded on first failure → rotates immediately (the fix) - Overloaded with ``has_retried_429=True`` → still rotates (no retry- flag dependence — the rate-limit gate is specific to ``rate_limit``) - Single-entry pool overload → ``recovered=False`` so outer fallback takes over rather than spinning - ``rate_limit`` on first failure → still uses retry-first path (regression guard against the new branch broadening) **Verified regression guards**: temporarily reverted the classifier's overload branch; 10 of the 11 new classifier tests correctly failed with clear assertion messages. Restored fix → all 145 tests pass (``test_error_classifier.py`` + ``test_credential_pool_routing.py``, existing + new). Closes #15297 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 06:31:32 +00:00 · 2026-04-24 16:39:48 -07:00 · 2026-04-24 16:39:48 -07:00 · 24461ec0bc
commit 24461ec0bc
parent 4fade39c90
4 changed files with 265 additions and 1 deletions
--- a/agent/error_classifier.py
+++ b/agent/error_classifier.py
@ -147,6 +147,29 @@ _PAYLOAD_TOO_LARGE_PATTERNS = [
    "error code: 413",
 ]

+# Patterns that indicate provider-side overload, NOT per-credential rate
+# limiting.  Some providers (e.g. Z.AI / Zhipu) return HTTP 429 for
+# server-wide overload — same status code as a true rate limit but the
+# correct recovery is "rotate immediately, the whole endpoint is down"
+# rather than "back off and retry the same key".  Match against the
+# message body alongside the 429 status code in the classifier.  Keep
+# phrases narrow and provider-language-flavoured so a normal rate-limit
+# message ("you have been rate-limited") doesn't accidentally hit this
+# bucket — only true overload language does (#15297).
+_OVERLOADED_PATTERNS = [
+    "temporarily overloaded",
+    "server is overloaded",
+    "server overloaded",
+    "service overloaded",
+    "service is overloaded",
+    "service may be temporarily overloaded",
+    "upstream overloaded",
+    "is overloaded, please try again",
+    "at capacity",
+    "over capacity",
+    "currently overloaded",
+]
+
 # Context overflow patterns
 _CONTEXT_OVERFLOW_PATTERNS = [
    "context length",
@ -589,7 +612,20 @@ def _classify_by_status(
        )

    if status_code == 429:
-        # Already checked long_context_tier above; this is a normal rate limit
+        # Already checked long_context_tier above.  Some providers (notably
+        # Z.AI / Zhipu) reuse 429 for server-wide overload — correct recovery
+        # is "skip retry, rotate immediately" rather than "back off, then
+        # rotate" (which is the right answer for a per-credential rate
+        # limit).  Disambiguate on the error body so the retry loop picks
+        # the matching strategy via the corresponding ``FailoverReason``
+        # handler (#15297).
+        if any(p in error_msg for p in _OVERLOADED_PATTERNS):
+            return result_fn(
+                FailoverReason.overloaded,
+                retryable=True,
+                should_rotate_credential=True,
+                should_fallback=True,
+            )
        return result_fn(
            FailoverReason.rate_limit,
            retryable=True,