fix(error_classifier): classify 'overloaded' as FailoverReason.overloaded before rate_limit

When a provider (e.g. Z.AI) returns 'The service may be temporarily
overloaded, please try again later' as HTTP 200 or HTTP 400, the error
was matched against _RATE_LIMIT_PATTERNS (which includes
'servicequotaexceededexception') and classified as rate_limit with
should_rotate_credential=True. After 2 failures the single API key was
marked exhausted and all further retries failed.

The fix adds an 'overloaded' / 'temporarily overloaded' pattern check
BEFORE the rate_limit check in both _classify_400 and
_classify_by_message. Overloaded errors now get FailoverReason.overloaded
(retryable, should_fallback) instead of rate_limit, preventing
unnecessary credential rotation.

Closes #14038
This commit is contained in:
pander 2026-04-23 00:06:02 +08:00
parent 5e8262da26
commit 05f53f4e6a

View file

@ -590,6 +590,16 @@ def _classify_400(
# Some providers return rate limit / billing errors as 400 instead of 429/402.
# Check these patterns before falling through to format_error.
# Overloaded patterns — server-side overload, NOT a credential/billing issue.
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
return result_fn(
FailoverReason.overloaded,
retryable=True,
should_fallback=True,
)
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
return result_fn(
FailoverReason.rate_limit,
@ -723,7 +733,14 @@ def _classify_by_message(
should_fallback=True,
)
# Rate limit patterns
# Rate limit patterns — but overloaded must come first to avoid credential rotation.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
return result_fn(
FailoverReason.overloaded,
retryable=True,
should_fallback=True,
)
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
return result_fn(
FailoverReason.rate_limit,