hermes-agent

5831 commits 942 branches 10 tags 2.1 GiB

Author	SHA1	Message	Date
Brian D. Evans	24461ec0bc	fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297 ) Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages like "The service may be temporarily overloaded, please try again later" to signal server-wide overload, not per-credential rate limiting. The two conditions need different recovery strategies: \| Reason \| Correct strategy \| \|---\|---\| \| ``rate_limit`` \| Retry same credential once (the limit may have reset), then rotate \| \| ``overloaded`` \| Skip retry, rotate immediately (the whole endpoint is saturated) \| Before this fix: - ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit`` regardless of message body. - ``FailoverReason.overloaded`` already existed as an enum value (and was produced for 503/529) but no production path emitted it for 429. - ``_recover_with_credential_pool`` had no handler for ``overloaded`` — an ``overloaded`` classification fell through to the default no-op ``return False, has_retried_429`` line. Net effect: every overload-language 429 burned a ``has_retried_429`` slot on the same saturated credential before the rotation happened, and cron jobs (one turn each) often used their entire execution on that wasted retry. ### Fix Two narrow, additive changes: 1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list containing provider-language overload phrases (``"temporarily overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …). The 429 branch now checks the error body against this list and emits ``FailoverReason.overloaded`` when matched, preserving ``rate_limit`` for everything else. Kept phrases narrow and provider-language-flavoured so a normal rate-limit message ("you have been rate-limited") doesn't hit this bucket. 2. ``run_agent.py::_recover_with_credential_pool`` — new branch for ``effective_reason == FailoverReason.overloaded``. Same shape as the existing ``billing`` handler: rotate immediately via ``mark_exhausted_and_rotate(...)``, no retry-on-same-credential first. The 503/529 ``overloaded`` classifications produced by the existing code now also flow through this branch. ### Tests (15 new, all passing on py3.11 venv) ``tests/agent/test_error_classifier.py`` — 7 cases: - The exact #15297 Z.AI message → ``overloaded`` - 9-phrase parametrised matrix (server/service/upstream overloaded, at/over capacity, "currently overloaded", …) → all ``overloaded`` - Plain "Rate limit reached for requests per minute" 429 → still ``rate_limit`` (regression guard against the disambiguation silently broadening) - Plain "Too Many Requests" 429 → still ``rate_limit`` - 503 path unchanged → still ``overloaded`` (negative control) ``tests/agent/test_credential_pool_routing.py`` — ``TestPoolOverloadedRotation`` (4 cases): - Overloaded on first failure → rotates immediately (the fix) - Overloaded with ``has_retried_429=True`` → still rotates (no retry- flag dependence — the rate-limit gate is specific to ``rate_limit``) - Single-entry pool overload → ``recovered=False`` so outer fallback takes over rather than spinning - ``rate_limit`` on first failure → still uses retry-first path (regression guard against the new branch broadening) Verified regression guards: temporarily reverted the classifier's overload branch; 10 of the 11 new classifier tests correctly failed with clear assertion messages. Restored fix → all 145 tests pass (``test_error_classifier.py`` + ``test_credential_pool_routing.py``, existing + new). Closes #15297 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 16:39:48 -07:00
Teknium	424e9f36b0	refactor: remove smart_model_routing feature (#12732 ) Smart model routing (auto-routing short/simple turns to a cheap model across providers) was opt-in and disabled by default. This removes the feature wholesale: the routing module, its config keys, docs, tests, and the orchestration scaffolding it required in cli.py / gateway/run.py / cron/scheduler.py. The /fast (Priority Processing / Anthropic fast mode) feature kept its hooks into _resolve_turn_agent_config — those still build a route dict and attach request_overrides when the model supports it; the route now just always uses the session's primary model/provider rather than running prompts through choose_cheap_model_route() first. Also removed: - DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out example sections in hermes_cli/config.py and cli-config.yaml.example - _load_smart_model_routing() / self._smart_model_routing on GatewayRunner - self._smart_model_routing / self._active_agent_route_signature on HermesCLI (signature kept; just no longer initialised through the smart-routing pipeline) - route_label parameter on HermesCLI._init_agent (only set by smart routing; never read elsewhere) - 'Smart Model Routing' section in website/docs/integrations/providers.md - tip in hermes_cli/tips.py - entries in hermes_cli/dump.py + hermes_cli/web_server.py - row in skills/autonomous-ai-agents/hermes-agent/SKILL.md Tests: - Deleted tests/agent/test_smart_model_routing.py - Rewrote tests/agent/test_credential_pool_routing.py to target the simplified _resolve_turn_agent_config directly (preserves credential pool propagation + 429 rotation coverage) - Dropped 'cheap model' test from test_cli_provider_resolution.py - Dropped resolve_turn_route patches from cli + gateway test_fast_command — they now exercise the real method end-to-end - Removed _smart_model_routing stub assignments from gateway/cron test helpers Targeted suites: 74/74 in the directly affected test files; tests/agent + tests/cron + tests/cli pass except 5 failures that already exist on main (cron silent-delivery + alias quick-command).	2026-04-19 18:12:55 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00

Author

SHA1

Message

Date

Brian D. Evans

24461ec0bc

fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297 )

Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages
like "The service may be temporarily overloaded, please try again later"
to signal **server-wide overload**, not per-credential rate limiting.
The two conditions need different recovery strategies:

| Reason | Correct strategy |
|---|---|
| ``rate_limit`` | Retry same credential once (the limit may have reset), then rotate |
| ``overloaded`` | Skip retry, rotate immediately (the whole endpoint is saturated) |

Before this fix:

- ``error_classifier.py`` mapped every 429 to ``FailoverReason.rate_limit``
  regardless of message body.
- ``FailoverReason.overloaded`` already existed as an enum value (and was
  produced for 503/529) but no production path emitted it for 429.
- ``_recover_with_credential_pool`` had no handler for ``overloaded`` —
  an ``overloaded`` classification fell through to the default no-op
  ``return False, has_retried_429`` line.

Net effect: every overload-language 429 burned a ``has_retried_429``
slot on the same saturated credential before the rotation happened, and
cron jobs (one turn each) often used their entire execution on that
wasted retry.

### Fix

Two narrow, additive changes:

1. ``agent/error_classifier.py`` — new ``_OVERLOADED_PATTERNS`` list
   containing provider-language overload phrases (``"temporarily
   overloaded"``, ``"server is overloaded"``, ``"at capacity"``, …).
   The 429 branch now checks the error body against this list and
   emits ``FailoverReason.overloaded`` when matched, preserving
   ``rate_limit`` for everything else.  Kept phrases narrow and
   provider-language-flavoured so a normal rate-limit message
   ("you have been rate-limited") doesn't hit this bucket.

2. ``run_agent.py::_recover_with_credential_pool`` — new branch for
   ``effective_reason == FailoverReason.overloaded``.  Same shape as
   the existing ``billing`` handler: rotate immediately via
   ``mark_exhausted_and_rotate(...)``, no retry-on-same-credential
   first.  The 503/529 ``overloaded`` classifications produced by the
   existing code now also flow through this branch.

### Tests (15 new, all passing on py3.11 venv)

``tests/agent/test_error_classifier.py`` — 7 cases:

- The exact #15297 Z.AI message → ``overloaded``
- 9-phrase parametrised matrix (server/service/upstream overloaded,
  at/over capacity, "currently overloaded", …) → all ``overloaded``
- Plain "Rate limit reached for requests per minute" 429 → still
  ``rate_limit`` (regression guard against the disambiguation
  silently broadening)
- Plain "Too Many Requests" 429 → still ``rate_limit``
- 503 path unchanged → still ``overloaded`` (negative control)

``tests/agent/test_credential_pool_routing.py`` —
``TestPoolOverloadedRotation`` (4 cases):

- Overloaded on first failure → rotates immediately (the fix)
- Overloaded with ``has_retried_429=True`` → still rotates (no retry-
  flag dependence — the rate-limit gate is specific to ``rate_limit``)
- Single-entry pool overload → ``recovered=False`` so outer fallback
  takes over rather than spinning
- ``rate_limit`` on first failure → still uses retry-first path
  (regression guard against the new branch broadening)

**Verified regression guards**: temporarily reverted the classifier's
overload branch; 10 of the 11 new classifier tests correctly failed
with clear assertion messages.  Restored fix → all 145 tests pass
(``test_error_classifier.py`` + ``test_credential_pool_routing.py``,
existing + new).

Closes #15297

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-24 16:39:48 -07:00

Teknium

424e9f36b0

refactor: remove smart_model_routing feature (#12732 )

Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default.  This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.

The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.

Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
  example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
  HermesCLI (signature kept; just no longer initialised through the
  smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
  routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md

Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
  simplified _resolve_turn_agent_config directly (preserves credential
  pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
  — they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
  helpers

Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).

2026-04-19 18:12:55 -07:00

Siddharth Balyan

f3006ebef9

refactor(tests): re-architect tests + fix CI failures (#5946 )

* refactor: re-architect tests to mirror the codebase

* Update tests.yml

* fix: add missing tool_error imports after registry refactor

* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist

patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.

* fix(tests): fix update_check and telegram xdist failures

- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
  monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
  directly, it uses get_hermes_home() from hermes_constants.

- test_telegram_conflict/approval_buttons: provide real exception classes
  for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
  except clause in connect() doesn't fail with "catching classes that do
  not inherit from BaseException" when xdist pollutes sys.modules.

* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock

2026-04-07 17:19:07 -07:00

Renamed from tests/test_credential_pool_routing.py (Browse further)

3 commits