Commit graph

12 commits

Author SHA1 Message Date
Teknium
77e04a29d5
fix(error_classifier): don't classify generic 404 as model_not_found (#14013)
The 404 branch in _classify_by_status had dead code: the generic
fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the
exact same classification (model_not_found + should_fallback=True),
so every 404 — regardless of message — was treated as a missing model.

This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s
usually mean a wrong endpoint path, proxy routing glitch, or transient
backend issue — not a missing model. Claiming 'model not found' misleads
the next turn and silently falls back to another provider when the real
problem was a URL typo the user should see.

Fix: only classify 404 as model_not_found when the message actually
matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found",
etc.). Otherwise fall through as unknown (retryable) so the real error
surfaces in the retry loop.

Test updated to match the new behavior. 103 error_classifier tests pass.
2026-04-22 06:11:47 -07:00
Linux2010
b869bf206c fix(error_classifier): handle dict-typed message fields without crashing
When API providers return Pydantic-style validation errors where
body['message'] or body['error']['message'] is a dict (e.g.
{"detail": [...]}), the error classifier was crashing with
AttributeError: 'dict' object has no attribute 'lower'.

The 'or ""' fallback only handles None/falsy values. A non-empty
dict is truthy and passes through to .lower(), which fails.

Fix: Wrap all 5 call sites with str() before calling .lower().
This is a no-op for strings and safely converts dicts to their
repr for pattern matching (no false positives on classification
patterns like 'rate limit', 'context length', etc.).

Closes #11233
2026-04-20 02:40:20 -07:00
JiaDe WU
0cb8c51fa5 feat: native AWS Bedrock provider via Converse API
Salvaged from PR #7920 by JiaDe-Wu — cherry-picked Bedrock-specific
additions onto current main, skipping stale-branch reverts (293 commits
behind).

Dual-path architecture:
  - Claude models → AnthropicBedrock SDK (prompt caching, thinking budgets)
  - Non-Claude models → Converse API via boto3 (Nova, DeepSeek, Llama, Mistral)

Includes:
  - Core adapter (agent/bedrock_adapter.py, 1098 lines)
  - Full provider registration (auth, models, providers, config, runtime, main)
  - IAM credential chain + Bedrock API Key auth modes
  - Dynamic model discovery via ListFoundationModels + ListInferenceProfiles
  - Streaming with delta callbacks, error classification, guardrails
  - hermes doctor + hermes auth integration
  - /usage pricing for 7 Bedrock models
  - 130 automated tests (79 unit + 28 integration + follow-up fixes)
  - Documentation (website/docs/guides/aws-bedrock.md)
  - boto3 optional dependency (pip install hermes-agent[bedrock])

Co-authored-by: JiaDe WU <40445668+JiaDe-Wu@users.noreply.github.com>
2026-04-15 16:17:17 -07:00
Teknium
f324222b79
fix: add vLLM/local server error patterns + MCP initial connection retry (#9281)
Port two improvements inspired by Kilo-Org/kilocode analysis:

1. Error classifier: add context overflow patterns for vLLM, Ollama,
   and llama.cpp/llama-server. These local inference servers return
   different error formats than cloud providers (e.g., 'exceeds the
   max_model_len', 'context length exceeded', 'slot context'). Without
   these patterns, context overflow errors from local servers are
   misclassified as format errors, causing infinite retries instead
   of triggering compression.

2. MCP initial connection retry: previously, if the very first
   connection attempt to an MCP server failed (e.g., transient DNS
   blip at startup), the server was permanently marked as failed with
   no retry. Post-connect reconnection had 5 retries with exponential
   backoff, but initial connection had zero. Now initial connections
   retry up to 3 times with backoff before giving up, matching the
   resilience of post-connect reconnection.
   (Inspired by Kilo Code's MCP server disappearing fix in v1.3.3)

Tests: 6 new error classifier tests, 4 new MCP retry tests, 1
updated existing test. All 276 affected tests pass.
2026-04-13 18:46:14 -07:00
Teknium
8d023e43ed
refactor: remove dead code — 1,784 lines across 77 files (#9180)
Deep scan with vulture, pyflakes, and manual cross-referencing identified:
- 41 dead functions/methods (zero callers in production)
- 7 production-dead functions (only test callers, tests deleted)
- 5 dead constants/variables
- ~35 unused imports across agent/, hermes_cli/, tools/, gateway/

Categories of dead code removed:
- Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection,
  rebuild_lookups, clear_session_context, get_logs_dir, clear_session
- Unused API surface: search_models_dev, get_pricing, skills_categories,
  get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list
- Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob
- Stale debug helpers: get_debug_session_info copies in 4 tool files
  (centralized version in debug_helpers.py already exists)
- Dead gateway methods: send_emote, send_notice (matrix), send_reaction
  (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history
  (matrix), _start_typing_indicator (signal), parse_feishu_post_content
- Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION,
  FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR
- Unused UI code: _interactive_provider_selection,
  _interactive_model_selection (superseded by prompt_toolkit picker)

Test suite verified: 609 tests covering affected files all pass.
Tests for removed functions deleted. Tests using removed utilities
(clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.
2026-04-13 16:32:04 -07:00
Teknium
5fc5ced972 fix: add Alibaba/DashScope rate-limit pattern to error classifier
Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a
unique throttling message ('Request rate increased too quickly...') that
doesn't match standard rate-limit patterns ('rate limit', 'too many
requests'). This caused Alibaba errors to fall through to the 'unknown'
category rather than being properly classified as rate_limit with
appropriate backoff/rotation.

Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with
the exact error message observed from the Alibaba provider.
2026-04-10 05:52:45 -07:00
alt-glitch
96c060018a fix: remove 115 verified dead code symbols across 46 production files
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).

Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-04-10 03:44:43 -07:00
aaronagent
738f0bac13 fix: align auth-by-message classification with status-code path, decode URLs before secret check
error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True.  The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback.  Align the message-based path to
match: retryable=False, should_fallback=True.

web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196).  URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII.  Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 03:05:04 -07:00
Cocoon-Break
45034b746f
fix: set retryable=False for message-based auth errors in _classify_by_message() (#7027)
Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes #7026. Contributed by @kuishou68.
2026-04-10 02:48:45 -07:00
sprmn24
e053433c84 fix(error_classifier): disambiguate usage-limit patterns in _classify_by_message
_classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so
messages like 'usage limit exceeded, try again in 5 minutes' arriving
without an HTTP status code fell through to FailoverReason.unknown
instead of rate_limit.

Apply the same billing/rate-limit disambiguation that _classify_402
already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit,
USAGE_LIMIT_PATTERNS alone → billing.

Add 4 tests covering the no-status-code usage-limit path.
2026-04-09 16:24:13 -07:00
Teknium
3007174a61
fix: prevent 400 format errors from triggering compression loop on Codex Responses API (#6751)
The error classifier's generic-400 heuristic only extracted err_body_msg from
the nested body structure (body['error']['message']), missing the flat body
format used by OpenAI's Responses API (body['message']). This caused
descriptive 400 errors like 'Invalid input[index].name: string does not match
pattern' to appear generic when the session was large, misclassifying them as
context overflow and triggering an infinite compression loop.

Added flat-body fallback in _classify_400() consistent with the parent
classify_api_error() function's existing handling at line 297-298.
2026-04-09 11:11:34 -07:00
Teknium
1a3ae6ac6e
feat: structured API error classification for smart failover (#6514)
Add agent/error_classifier.py with a priority-ordered classification
pipeline that replaces scattered inline string-matching in the retry
loop with structured error taxonomy and recovery hints.

FailoverReason enum (14 categories): auth, auth_permanent, billing,
rate_limit, overloaded, server_error, timeout, context_overflow,
payload_too_large, model_not_found, format_error, thinking_signature,
long_context_tier, unknown.

ClassifiedError dataclass carries reason + recovery action hints
(retryable, should_compress, should_rotate_credential, should_fallback).

Key improvements over inline matching:
- 402 disambiguation: 'insufficient credits' = billing (immediate rotate),
  'usage limit, try again' = rate_limit (backoff first)
- OpenRouter 403 'key limit exceeded' correctly classified as billing
- Error cause chain walking (walks __cause__/__context__ up to 5 levels)
- Body message included in pattern matching (SDK str() misses it)
- Server disconnect + large session check ordered before generic transport
  catch so RemoteProtocolError triggers compression when appropriate
- Chinese error message support for context overflow

run_agent.py: replaced 6 inline detection blocks with classifier calls,
net -55 lines. All recovery actions (pool rotation, fallback activation,
compression, transport recovery) unchanged.

65 new unit tests + 10 E2E tests + live tests with real SDK error objects.
Inspired by OpenClaw's failover error classification system.
2026-04-09 04:10:11 -07:00