mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix(error_classifier): classify 'overloaded' as FailoverReason.overloaded before rate_limit
When a provider (e.g. Z.AI) returns 'The service may be temporarily overloaded, please try again later' as HTTP 200 or HTTP 400, the error was matched against _RATE_LIMIT_PATTERNS (which includes 'servicequotaexceededexception') and classified as rate_limit with should_rotate_credential=True. After 2 failures the single API key was marked exhausted and all further retries failed. The fix adds an 'overloaded' / 'temporarily overloaded' pattern check BEFORE the rate_limit check in both _classify_400 and _classify_by_message. Overloaded errors now get FailoverReason.overloaded (retryable, should_fallback) instead of rate_limit, preventing unnecessary credential rotation. Closes #14038
This commit is contained in:
parent
5e8262da26
commit
05f53f4e6a
1 changed files with 18 additions and 1 deletions
|
|
@ -590,6 +590,16 @@ def _classify_400(
|
||||||
|
|
||||||
# Some providers return rate limit / billing errors as 400 instead of 429/402.
|
# Some providers return rate limit / billing errors as 400 instead of 429/402.
|
||||||
# Check these patterns before falling through to format_error.
|
# Check these patterns before falling through to format_error.
|
||||||
|
|
||||||
|
# Overloaded patterns — server-side overload, NOT a credential/billing issue.
|
||||||
|
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
|
||||||
|
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
|
||||||
|
return result_fn(
|
||||||
|
FailoverReason.overloaded,
|
||||||
|
retryable=True,
|
||||||
|
should_fallback=True,
|
||||||
|
)
|
||||||
|
|
||||||
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
|
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
|
||||||
return result_fn(
|
return result_fn(
|
||||||
FailoverReason.rate_limit,
|
FailoverReason.rate_limit,
|
||||||
|
|
@ -723,7 +733,14 @@ def _classify_by_message(
|
||||||
should_fallback=True,
|
should_fallback=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Rate limit patterns
|
# Rate limit patterns — but overloaded must come first to avoid credential rotation.
|
||||||
|
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
|
||||||
|
return result_fn(
|
||||||
|
FailoverReason.overloaded,
|
||||||
|
retryable=True,
|
||||||
|
should_fallback=True,
|
||||||
|
)
|
||||||
|
|
||||||
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
|
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
|
||||||
return result_fn(
|
return result_fn(
|
||||||
FailoverReason.rate_limit,
|
FailoverReason.rate_limit,
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue