Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages
like "The service may be temporarily overloaded, please try again later"
to signal **server-wide overload**, not per-credential rate limiting.
The two conditions need different recovery strategies:
| Reason | Correct strategy |
|---|---|
| ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate |
| ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) |
Before this fix:
- ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit``
regardless of message body.
- ``FailoverReason.overloaded`` already existed as an enum value (and was
produced for 503/529) but no production path emitted it for 429.
- ``_recover_with_credential_pool`` had no handler for ``overloaded`` —
an ``overloaded`` classification fell through to the default no-op
``return False, has_retried_429`` line.
Net effect: every overload-language 429 burned a ``has_retried_429``
slot on the same saturated credential before the rotation happened, and
cron jobs (one turn each) often used their entire execution on that
wasted retry.
### Fix
Two narrow, additive changes:
1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list
containing provider-language overload phrases (``"temporarily
overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …).
The 429 branch now checks the error body against this list and
emits ``FailoverReason.overloaded`` when matched, preserving
``rate_limit`` for everything else. Kept phrases narrow and
provider-language-flavoured so a normal rate-limit message
("you have been rate-limited") doesn't hit this bucket.
2. ``run_agent.py::_recover_with_credential_pool`` — new branch for
``effective_reason == FailoverReason.overloaded``. Same shape as
the existing ``billing`` handler: rotate immediately via
``mark_exhausted_and_rotate(...)``, no retry-on-same-credential
first. The 503/529 ``overloaded`` classifications produced by the
existing code now also flow through this branch.
### Tests (15 new, all passing on py3.11 venv)
``tests/agent/test_error_classifier.py`` — 7 cases:
- The exact #15297 Z.AI message → ``overloaded``
- 9-phrase parametrised matrix (server/service/upstream overloaded,
at/over capacity, "currently overloaded", …) → all ``overloaded``
- Plain "Rate limit reached for requests per minute" 429 → still
``rate_limit`` (regression guard against the disambiguation
silently broadening)
- Plain "Too Many Requests" 429 → still ``rate_limit``
- 503 path unchanged → still ``overloaded`` (negative control)
``tests/agent/test_credential_pool_routing.py`` —
``TestPoolOverloadedRotation`` (4 cases):
- Overloaded on first failure → rotates immediately (the fix)
- Overloaded with ``has_retried_429=True`` → still rotates (no retry-
flag dependence — the rate-limit gate is specific to ``rate_limit``)
- Single-entry pool overload → ``recovered=False`` so outer fallback
takes over rather than spinning
- ``rate_limit`` on first failure → still uses retry-first path
(regression guard against the new branch broadening)
**Verified regression guards**: temporarily reverted the classifier's
overload branch; 10 of the 11 new classifier tests correctly failed
with clear assertion messages. Restored fix → all 145 tests pass
(``test_error_classifier.py`` + ``test_credential_pool_routing.py``,
existing + new).
Closes#15297
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default. This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.
The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.
Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
HermesCLI (signature kept; just no longer initialised through the
smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md
Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
simplified _resolve_turn_agent_config directly (preserves credential
pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
— they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
helpers
Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
2026-04-07 17:19:07 -07:00
Renamed from tests/test_credential_pool_routing.py (Browse further)