hermes-agent/agent
Brian D. Evans 24461ec0bc fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)
Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages
like "The service may be temporarily overloaded, please try again later"
to signal **server-wide overload**, not per-credential rate limiting.
The two conditions need different recovery strategies:

| Reason | Correct strategy |
|---|---|
| ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate |
| ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) |

Before this fix:

- ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit``
  regardless of message body.
- ``FailoverReason.overloaded`` already existed as an enum value (and was
  produced for 503/529) but no production path emitted it for 429.
- ``_recover_with_credential_pool`` had no handler for ``overloaded`` —
  an ``overloaded`` classification fell through to the default no-op
  ``return False, has_retried_429`` line.

Net effect: every overload-language 429 burned a ``has_retried_429``
slot on the same saturated credential before the rotation happened, and
cron jobs (one turn each) often used their entire execution on that
wasted retry.

### Fix

Two narrow, additive changes:

1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list
   containing provider-language overload phrases (``"temporarily
   overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …).
   The 429 branch now checks the error body against this list and
   emits ``FailoverReason.overloaded`` when matched, preserving
   ``rate_limit`` for everything else.  Kept phrases narrow and
   provider-language-flavoured so a normal rate-limit message
   ("you have been rate-limited") doesn't hit this bucket.

2. ``run_agent.py::_recover_with_credential_pool`` — new branch for
   ``effective_reason == FailoverReason.overloaded``.  Same shape as
   the existing ``billing`` handler: rotate immediately via
   ``mark_exhausted_and_rotate(...)``, no retry-on-same-credential
   first.  The 503/529 ``overloaded`` classifications produced by the
   existing code now also flow through this branch.

### Tests (15 new, all passing on py3.11 venv)

``tests/agent/test_error_classifier.py`` — 7 cases:

- The exact #15297 Z.AI message → ``overloaded``
- 9-phrase parametrised matrix (server/service/upstream overloaded,
  at/over capacity, "currently overloaded", …) → all ``overloaded``
- Plain "Rate limit reached for requests per minute" 429 → still
  ``rate_limit`` (regression guard against the disambiguation
  silently broadening)
- Plain "Too Many Requests" 429 → still ``rate_limit``
- 503 path unchanged → still ``overloaded`` (negative control)

``tests/agent/test_credential_pool_routing.py`` —
``TestPoolOverloadedRotation`` (4 cases):

- Overloaded on first failure → rotates immediately (the fix)
- Overloaded with ``has_retried_429=True`` → still rotates (no retry-
  flag dependence — the rate-limit gate is specific to ``rate_limit``)
- Single-entry pool overload → ``recovered=False`` so outer fallback
  takes over rather than spinning
- ``rate_limit`` on first failure → still uses retry-first path
  (regression guard against the new branch broadening)

**Verified regression guards**: temporarily reverted the classifier's
overload branch; 10 of the 11 new classifier tests correctly failed
with clear assertion messages.  Restored fix → all 145 tests pass
(``test_error_classifier.py`` + ``test_credential_pool_routing.py``,
existing + new).

Closes #15297

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:39:48 -07:00
..
transports fix(kimi,mcp): Moonshot schema sanitizer + MCP schema robustness (#14805) 2026-04-23 16:11:57 -07:00
__init__.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
account_usage.py feat(account-usage): add per-provider account limits module 2026-04-21 01:56:35 -07:00
anthropic_adapter.py fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295) 2026-04-24 07:26:07 -07:00
auxiliary_client.py fix(agent): handle aws_sdk auth type in resolve_provider_client 2026-04-24 07:26:07 -07:00
bedrock_adapter.py fix(bedrock): evict cached boto3 client on stale-connection errors 2026-04-24 07:26:07 -07:00
codex_responses_adapter.py fix(codex): detect leaked tool-call text in assistant content (#15347) 2026-04-24 14:39:59 -07:00
context_compressor.py fix(aux): surface auxiliary failures in UI 2026-04-24 14:31:21 -07:00
context_engine.py fix(compress): don't reach into ContextCompressor privates from /compress (#15039) 2026-04-24 02:55:43 -07:00
context_references.py fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
copilot_acp_client.py fix: set HOME for Copilot ACP subprocesses 2026-04-24 05:09:08 -07:00
credential_pool.py fix(credential_pool): add Nous OAuth cross-process auth-store sync 2026-04-24 05:20:05 -07:00
credential_sources.py fix(auth): unify credential source removal — every source sticks (#13427) 2026-04-21 01:52:49 -07:00
display.py fix(display): render <missing old_text> in memory previews instead of empty quotes (#12852) 2026-04-19 22:45:47 -07:00
error_classifier.py fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297) 2026-04-24 16:39:48 -07:00
file_safety.py fix(security): apply file safety to copilot acp fs 2026-04-21 01:31:58 -07:00
gemini_cloudcode_adapter.py refactor: remove redundant local imports already available at module level 2026-04-21 00:50:58 -07:00
gemini_native_adapter.py fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133) 2026-04-24 05:35:17 -07:00
gemini_schema.py fix(gemini): drop integer/number/boolean enums from tool schemas (#15082) 2026-04-24 03:40:00 -07:00
google_code_assist.py fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833) 2026-04-17 15:34:12 -07:00
google_oauth.py feat(gemini): add Google Gemini CLI OAuth provider via Cloud Code Assist (free + paid tiers) (#11270) 2026-04-16 16:49:00 -07:00
image_gen_provider.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
image_gen_registry.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
insights.py Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
manual_compression_feedback.py fix(gateway): make manual compression feedback truthful 2026-04-10 21:16:53 -07:00
memory_manager.py fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
memory_provider.py fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
model_metadata.py fix(bedrock): resolve context length via static table before custom-endpoint probe 2026-04-24 07:26:07 -07:00
models_dev.py fix: normalize provider in list_provider_models to support aliases 2026-04-23 01:59:20 -07:00
moonshot_schema.py fix(kimi,mcp): Moonshot schema sanitizer + MCP schema robustness (#14805) 2026-04-23 16:11:57 -07:00
nous_rate_guard.py fix: Nous Portal rate limit guard — prevent retry amplification (#10568) 2026-04-15 16:31:48 -07:00
prompt_builder.py feat(agent): add PLATFORM_HINTS for matrix, mattermost, and feishu (#14428) 2026-04-23 12:50:22 +05:30
prompt_caching.py fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter 2026-03-21 16:54:43 -07:00
rate_limit_tracker.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
redact.py feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal (#13148) 2026-04-20 11:49:54 -07:00
retry_utils.py feat(agent): add jittered retry backoff 2026-04-08 00:41:36 -07:00
shell_hooks.py feat: shell hooks — wire shell scripts as Hermes hook callbacks 2026-04-20 20:53:51 -07:00
skill_commands.py fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
skill_preprocessing.py fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
skill_utils.py fix(skills): follow symlinks in iter_skill_index_files 2026-04-22 17:43:30 -07:00
subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
title_generator.py fix: increase max_tokens for GLM 5.1 reasoning headroom 2026-04-22 18:44:07 -07:00
trajectory.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
usage_pricing.py fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies 2026-04-22 17:40:49 -07:00