mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-27 01:11:40 +00:00
Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages
like "The service may be temporarily overloaded, please try again later"
to signal **server-wide overload**, not per-credential rate limiting.
The two conditions need different recovery strategies:
| Reason | Correct strategy |
|---|---|
| ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate |
| ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) |
Before this fix:
- ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit``
regardless of message body.
- ``FailoverReason.overloaded`` already existed as an enum value (and was
produced for 503/529) but no production path emitted it for 429.
- ``_recover_with_credential_pool`` had no handler for ``overloaded`` —
an ``overloaded`` classification fell through to the default no-op
``return False, has_retried_429`` line.
Net effect: every overload-language 429 burned a ``has_retried_429``
slot on the same saturated credential before the rotation happened, and
cron jobs (one turn each) often used their entire execution on that
wasted retry.
### Fix
Two narrow, additive changes:
1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list
containing provider-language overload phrases (``"temporarily
overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …).
The 429 branch now checks the error body against this list and
emits ``FailoverReason.overloaded`` when matched, preserving
``rate_limit`` for everything else. Kept phrases narrow and
provider-language-flavoured so a normal rate-limit message
("you have been rate-limited") doesn't hit this bucket.
2. ``run_agent.py::_recover_with_credential_pool`` — new branch for
``effective_reason == FailoverReason.overloaded``. Same shape as
the existing ``billing`` handler: rotate immediately via
``mark_exhausted_and_rotate(...)``, no retry-on-same-credential
first. The 503/529 ``overloaded`` classifications produced by the
existing code now also flow through this branch.
### Tests (15 new, all passing on py3.11 venv)
``tests/agent/test_error_classifier.py`` — 7 cases:
- The exact #15297 Z.AI message → ``overloaded``
- 9-phrase parametrised matrix (server/service/upstream overloaded,
at/over capacity, "currently overloaded", …) → all ``overloaded``
- Plain "Rate limit reached for requests per minute" 429 → still
``rate_limit`` (regression guard against the disambiguation
silently broadening)
- Plain "Too Many Requests" 429 → still ``rate_limit``
- 503 path unchanged → still ``overloaded`` (negative control)
``tests/agent/test_credential_pool_routing.py`` —
``TestPoolOverloadedRotation`` (4 cases):
- Overloaded on first failure → rotates immediately (the fix)
- Overloaded with ``has_retried_429=True`` → still rotates (no retry-
flag dependence — the rate-limit gate is specific to ``rate_limit``)
- Single-entry pool overload → ``recovered=False`` so outer fallback
takes over rather than spinning
- ``rate_limit`` on first failure → still uses retry-first path
(regression guard against the new branch broadening)
**Verified regression guards**: temporarily reverted the classifier's
overload branch; 10 of the 11 new classifier tests correctly failed
with clear assertion messages. Restored fix → all 145 tests pass
(``test_error_classifier.py`` + ``test_credential_pool_routing.py``,
existing + new).
Closes #15297
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| transports | ||
| __init__.py | ||
| test_anthropic_adapter.py | ||
| test_anthropic_keychain.py | ||
| test_auxiliary_client.py | ||
| test_auxiliary_client_anthropic_custom.py | ||
| test_auxiliary_config_bridge.py | ||
| test_auxiliary_main_first.py | ||
| test_auxiliary_named_custom_providers.py | ||
| test_bedrock_adapter.py | ||
| test_bedrock_integration.py | ||
| test_codex_cloudflare_headers.py | ||
| test_compress_focus.py | ||
| test_context_compressor.py | ||
| test_context_engine.py | ||
| test_context_references.py | ||
| test_copilot_acp_client.py | ||
| test_credential_pool.py | ||
| test_credential_pool_routing.py | ||
| test_crossloop_client_cache.py | ||
| test_direct_provider_url_detection.py | ||
| test_display.py | ||
| test_display_emoji.py | ||
| test_error_classifier.py | ||
| test_external_skills.py | ||
| test_gemini_cloudcode.py | ||
| test_gemini_free_tier_gate.py | ||
| test_gemini_native_adapter.py | ||
| test_gemini_schema.py | ||
| test_image_gen_registry.py | ||
| test_insights.py | ||
| test_kimi_coding_anthropic_thinking.py | ||
| test_local_stream_timeout.py | ||
| test_memory_provider.py | ||
| test_memory_user_id.py | ||
| test_minimax_auxiliary_url.py | ||
| test_minimax_provider.py | ||
| test_model_metadata.py | ||
| test_model_metadata_local_ctx.py | ||
| test_model_metadata_ssl.py | ||
| test_models_dev.py | ||
| test_moonshot_schema.py | ||
| test_nous_rate_guard.py | ||
| test_prompt_builder.py | ||
| test_prompt_caching.py | ||
| test_proxy_and_url_validation.py | ||
| test_rate_limit_tracker.py | ||
| test_redact.py | ||
| test_shell_hooks.py | ||
| test_shell_hooks_consent.py | ||
| test_skill_commands.py | ||
| test_subagent_progress.py | ||
| test_subagent_stop_hook.py | ||
| test_subdirectory_hints.py | ||
| test_title_generator.py | ||
| test_usage_pricing.py | ||
| test_vision_resolved_args.py | ||