mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-27 01:11:40 +00:00
fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)
Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages
like "The service may be temporarily overloaded, please try again later"
to signal **server-wide overload**, not per-credential rate limiting.
The two conditions need different recovery strategies:
| Reason | Correct strategy |
|---|---|
| ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate |
| ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) |
Before this fix:
- ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit``
regardless of message body.
- ``FailoverReason.overloaded`` already existed as an enum value (and was
produced for 503/529) but no production path emitted it for 429.
- ``_recover_with_credential_pool`` had no handler for ``overloaded`` —
an ``overloaded`` classification fell through to the default no-op
``return False, has_retried_429`` line.
Net effect: every overload-language 429 burned a ``has_retried_429``
slot on the same saturated credential before the rotation happened, and
cron jobs (one turn each) often used their entire execution on that
wasted retry.
### Fix
Two narrow, additive changes:
1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list
containing provider-language overload phrases (``"temporarily
overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …).
The 429 branch now checks the error body against this list and
emits ``FailoverReason.overloaded`` when matched, preserving
``rate_limit`` for everything else. Kept phrases narrow and
provider-language-flavoured so a normal rate-limit message
("you have been rate-limited") doesn't hit this bucket.
2. ``run_agent.py::_recover_with_credential_pool`` — new branch for
``effective_reason == FailoverReason.overloaded``. Same shape as
the existing ``billing`` handler: rotate immediately via
``mark_exhausted_and_rotate(...)``, no retry-on-same-credential
first. The 503/529 ``overloaded`` classifications produced by the
existing code now also flow through this branch.
### Tests (15 new, all passing on py3.11 venv)
``tests/agent/test_error_classifier.py`` — 7 cases:
- The exact #15297 Z.AI message → ``overloaded``
- 9-phrase parametrised matrix (server/service/upstream overloaded,
at/over capacity, "currently overloaded", …) → all ``overloaded``
- Plain "Rate limit reached for requests per minute" 429 → still
``rate_limit`` (regression guard against the disambiguation
silently broadening)
- Plain "Too Many Requests" 429 → still ``rate_limit``
- 503 path unchanged → still ``overloaded`` (negative control)
``tests/agent/test_credential_pool_routing.py`` —
``TestPoolOverloadedRotation`` (4 cases):
- Overloaded on first failure → rotates immediately (the fix)
- Overloaded with ``has_retried_429=True`` → still rotates (no retry-
flag dependence — the rate-limit gate is specific to ``rate_limit``)
- Single-entry pool overload → ``recovered=False`` so outer fallback
takes over rather than spinning
- ``rate_limit`` on first failure → still uses retry-first path
(regression guard against the new branch broadening)
**Verified regression guards**: temporarily reverted the classifier's
overload branch; 10 of the 11 new classifier tests correctly failed
with clear assertion messages. Restored fix → all 145 tests pass
(``test_error_classifier.py`` + ``test_credential_pool_routing.py``,
existing + new).
Closes #15297
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4fade39c90
commit
24461ec0bc
4 changed files with 265 additions and 1 deletions
|
|
@ -250,6 +250,85 @@ class TestClassifyApiError:
|
|||
assert result.reason == FailoverReason.rate_limit
|
||||
assert result.should_fallback is True
|
||||
|
||||
# ── 429 overload disambiguation (#15297) ──
|
||||
#
|
||||
# Some providers (notably Z.AI / Zhipu) reuse 429 for server-side
|
||||
# overload rather than per-credential rate limiting. Recovery for
|
||||
# these is "rotate immediately, the whole endpoint is down" — not
|
||||
# the rate-limit "back off, retry same key once, then rotate" path.
|
||||
# The classifier disambiguates on the message body so the retry loop
|
||||
# picks the right ``FailoverReason`` handler.
|
||||
|
||||
def test_429_zai_temporarily_overloaded(self):
|
||||
"""The exact message from #15297's Z.AI / Zhipu reproduction."""
|
||||
msg = (
|
||||
"HTTP 429: The service may be temporarily overloaded, "
|
||||
"please try again later"
|
||||
)
|
||||
e = MockAPIError(msg, status_code=429)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.overloaded, (
|
||||
f"Z.AI overload message must classify as overloaded, got "
|
||||
f"{result.reason!r} — recovery would burn a same-credential "
|
||||
f"retry instead of rotating immediately"
|
||||
)
|
||||
assert result.should_rotate_credential is True
|
||||
assert result.should_fallback is True
|
||||
assert result.retryable is True
|
||||
|
||||
@pytest.mark.parametrize("phrase", [
|
||||
"Service is overloaded, please try again later",
|
||||
"server is overloaded",
|
||||
"server overloaded — retry",
|
||||
"Service overloaded",
|
||||
"Upstream overloaded",
|
||||
"API is currently overloaded",
|
||||
"is overloaded, please try again",
|
||||
"The endpoint is at capacity",
|
||||
"Provider over capacity",
|
||||
])
|
||||
def test_429_overload_phrases_route_to_overloaded(self, phrase):
|
||||
"""Phrase matrix: any provider-language overload phrase on a 429
|
||||
must route to ``overloaded`` not ``rate_limit``. Parametrised
|
||||
so a future provider quirk that reads "overloaded" but is really
|
||||
rate limiting can be surfaced and reclassified narrowly."""
|
||||
e = MockAPIError(phrase, status_code=429)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.overloaded, (
|
||||
f"phrase {phrase!r} should map to overloaded; got {result.reason!r}"
|
||||
)
|
||||
|
||||
def test_429_plain_rate_limit_remains_rate_limit(self):
|
||||
"""Regression guard: a plain 429 with rate-limit-flavoured text
|
||||
(no overload phrase) must STILL classify as rate_limit so the
|
||||
backoff-then-rotate path keeps working for normal per-key
|
||||
throttling. Without this guard, the disambiguation could
|
||||
silently broaden and break every 429 recovery."""
|
||||
e = MockAPIError(
|
||||
"Rate limit reached for requests per minute", status_code=429,
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.rate_limit, (
|
||||
"vanilla rate-limit message must NOT route to overloaded — "
|
||||
"the recovery strategies are different (retry-then-rotate vs "
|
||||
"rotate-immediately)"
|
||||
)
|
||||
|
||||
def test_429_too_many_requests_remains_rate_limit(self):
|
||||
"""The literal HTTP 429 reason phrase ('Too Many Requests') must
|
||||
route to rate_limit too — it carries no overload language."""
|
||||
e = MockAPIError("Too Many Requests", status_code=429)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_503_overloaded_path_unchanged(self):
|
||||
"""Sanity check: HTTP 503 already mapped to ``overloaded`` and
|
||||
must keep doing so. Adding the 429 overload branch must not
|
||||
accidentally regress the 503 path."""
|
||||
e = MockAPIError("Service Unavailable", status_code=503)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.overloaded
|
||||
|
||||
def test_alibaba_rate_increased_too_quickly(self):
|
||||
"""Alibaba/DashScope returns a unique throttling message.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue