hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

Author	SHA1	Message	Date
Teknium	be4f049f46	fix: salvage follow-ups for Weixin adapter (#6747 ) - Remove sys.path.insert hack (leftover from standalone dev) - Add token lock (acquire_scoped_lock/release_scoped_lock) in connect()/disconnect() to prevent duplicate pollers across profiles - Fix get_connected_platforms: WEIXIN check must precede generic token/api_key check (requires both token AND account_id) - Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS - Add gateway setup wizard with QR login flow - Add platform status check for partially configured state - Add weixin.md docs page with full adapter documentation - Update environment-variables.md reference with all 11 env vars - Update sidebars.ts to include weixin docs page - Wire all gateway integration points onto current main Salvaged from PR #6747 by Zihan Huang.	2026-04-10 05:54:37 -07:00
Kenny Xie	b730c2955a	fix(model): normalize direct provider ids in auxiliary routing	2026-04-10 05:52:45 -07:00
Teknium	5fc5ced972	fix: add Alibaba/DashScope rate-limit pattern to error classifier Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a unique throttling message ('Request rate increased too quickly...') that doesn't match standard rate-limit patterns ('rate limit', 'too many requests'). This caused Alibaba errors to fall through to the 'unknown' category rather than being properly classified as rate_limit with appropriate backoff/rotation. Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with the exact error message observed from the Alibaba provider.	2026-04-10 05:52:45 -07:00
Teknium	6d2fa03837	fix: UTF-8 config encoding, pairing hint, credential_pool key, header normalization (#7174 ) Four small fixes: (1) UTF-8 encoding for config open (@zhangchn #7063), (2) pairing hint placeholders (@konsisumer #7057), (3) missing credential_pool in cheap route (@kuishou68 #7025), (4) case-insensitive rate limit headers (@kuishou68 #7019).	2026-04-10 05:33:48 -07:00
xwp	5a1cce53e4	fix(auxiliary): skip anthropic in fallback chain when not explicitly configured _resolve_api_key_provider() now checks is_provider_explicitly_configured before calling _try_anthropic(). Previously, any auxiliary fallback (e.g. when kimi-coding key was invalid) would silently discover and use Claude Code OAuth tokens — consuming the user's Claude Max subscription without their knowledge. This is the auxiliary-client counterpart of the setup-wizard gate in PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
xwp	419b719c2b	fix(auth): make 'auth remove' for claude_code prevent re-seeding Previously, removing a claude_code credential from the anthropic pool only printed a note — the next load_pool() re-seeded it from ~/.claude/.credentials.json. Now writes a 'suppressed_sources' flag to auth.json that _seed_from_singletons checks before seeding. Follows the pattern of env: source removal (clears .env var) and device_code removal (clears auth store state). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
xwp	f3fb3eded4	fix(auth): gate Claude Code credential seeding behind explicit provider config _seed_from_singletons('anthropic') now checks is_provider_explicitly_configured('anthropic') before reading ~/.claude/.credentials.json. Without this, the auxiliary client fallback chain silently discovers and uses Claude Code tokens when the user's primary provider key is invalid — consuming their Claude Max subscription quota without consent. Follows the same gating pattern as PR #4210 (setup wizard gate) but applied to the credential pool seeding path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
alt-glitch	96c060018a	fix: remove 115 verified dead code symbols across 46 production files Automated dead code audit using vulture + coverage.py + ast-grep intersection, confirmed by Opus deep verification pass. Every symbol verified to have zero production callers (test imports excluded from reachability analysis). Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py, hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py). Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-04-10 03:44:43 -07:00
aaronagent	9afe1784bd	fix: hidden_div regex bypass with newlines, credential config silent failure, webhook route error severity prompt_builder.py: The `hidden_div` detection pattern uses `.` which does not match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass detection by splitting the style attribute across lines: `<div style="color:red;\ndisplay: none">injected content</div>` Replace `.` with `[\s\S]*?` to match across line boundaries. credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level (line 171), making YAML parse failures invisible in production logs. Users whose credential files silently fail to mount into sandboxes have no diagnostic clue. Promote to WARNING to match the severity pattern used by the path validation warnings at lines 150 and 158 in the same function. webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line 265) but the impact — stale/corrupted dynamic routes persisting silently — warrants ERROR level to ensure operator visibility in alerting pipelines. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 03:05:04 -07:00
aaronagent	738f0bac13	fix: align auth-by-message classification with status-code path, decode URLs before secret check error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized", etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401 path (line 432) which correctly uses retryable=False + should_fallback=True. The mismatch causes 3 wasted retries with the same broken credential before fallback, while 401 errors immediately attempt fallback. Align the message-based path to match: retryable=False, should_fallback=True. web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... ( sk-1234...) bypass the filter because the regex expects literal ASCII. Add urllib.parse.unquote() before the check so percent-encoded variants are also caught. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 03:05:04 -07:00
Julien Talbot	b577697189	fix(model_metadata): add xAI Grok context length fallbacks xAI /v1/models does not return context_length metadata, so Hermes probes down to the 128k default whenever a user configures a custom provider pointing at https://api.x.ai/v1. This forces every xAI user to manually override model.context_length in config.yaml (2M for Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context window. Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the fallback lookup returns the correct value via substring matching. Values sourced from models.dev (2026-04) and cross-checked against the xAI /v1/models listing: - grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent) - grok-4-1-fast-* 2,000,000 - grok-4-fast-* 2,000,000 - grok-4 / grok-4-0709 256,000 - grok-code-fast-1 256,000 - grok-3* 131,072 - grok-2 / latest 131,072 - grok-2-vision* 8,192 - grok (catch-all) 131,072 Keys are ordered longest-first so that specific variants match before the catch-all, consistent with the existing Claude/Gemma/MiniMax entries. Add TestDefaultContextLengths.test_grok_models_context_lengths and test_grok_substring_matching to pin the values and verify the full lookup path. All 77 tests in test_model_metadata.py pass.	2026-04-10 03:04:19 -07:00
Cocoon-Break	45034b746f	fix: set retryable=False for message-based auth errors in _classify_by_message() (#7027 ) Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes #7026. Contributed by @kuishou68.	2026-04-10 02:48:45 -07:00
kshitijk4poor	9431f82aff	fix: update Kimi Coding User-Agent to KimiCLI/1.30.0 The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at v1.30.0. The stale version string causes intermittent 403 errors from Kimi's coding endpoint ('only available for Coding Agents'). Update all 8 occurrences across run_agent.py, auxiliary_client.py, and doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI.	2026-04-10 02:37:28 -07:00
Teknium	8779a268a7	feat: add Anthropic Fast Mode support to /fast command (#7037 ) Extends the /fast command to support Anthropic's Fast Mode beta in addition to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for ~2.5x faster output token throughput. Changes: - hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry, model_supports_fast_mode() now recognizes Claude Opus 4.6, resolve_fast_mode_overrides() returns {speed: fast} for Anthropic vs {service_tier: priority} for OpenAI - agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant, build_anthropic_kwargs() accepts fast_mode=True which injects speed:fast + beta header via extra_headers (skipped for third-party Anthropic-compatible endpoints like MiniMax) - run_agent.py: Pass fast_mode to build_anthropic_kwargs in the anthropic_messages path of _build_api_kwargs() - cli.py: Update _handle_fast_command with provider-aware messaging (shows 'Anthropic Fast Mode' vs 'Priority Processing') - hermes_cli/commands.py: Update /fast description to mention both providers - tests: 13 new tests covering Anthropic model detection, override resolution, CLI availability, routing, adapter kwargs, and third-party endpoint safety	2026-04-10 02:32:15 -07:00
emozilla	bda9aa17cb	fix(streaming): prevent <think> in prose from suppressing response output When the model mentions <think> as literal text in its response (e.g. "(/think not producing <think> tags)"), the streaming display treated it as a reasoning block opener and suppressed everything after it. The response box would close with truncated content and no error — the API response was complete but the display ate it. Root cause: _stream_delta() matched <think> anywhere in the text stream regardless of position. Real reasoning blocks always start at the beginning of a line; mentions in prose appear mid-sentence. Fix: track line position across streaming deltas with a _stream_last_was_newline flag. Only enter reasoning suppression when the tag appears at a block boundary (start of stream, after a newline, or after only whitespace on the current line). Add a _flush_stream() safety net that recovers buffered content if no closing tag is found by end-of-stream. Also fixes three related issues discovered during investigation: - anthropic_adapter: _get_anthropic_max_output() now normalizes dots to hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key (was returning 32K instead of 128K) - run_agent: send explicit max_tokens for Claude models on Nous Portal, same as OpenRouter — both proxy to Anthropic's API which requires it. Without it the backend defaults to a low limit that truncates responses. - run_agent: reset truncated_tool_call_retries after successful tool execution so a single truncation doesn't poison the entire conversation.	2026-04-09 22:16:36 -07:00
Teknium	4caa635803	fix: add auth.json write-back for Codex retry and valid-token early-return paths The Codex retry block and valid-token short-circuit in _refresh_entry() both return early, bypassing the auth.json sync at the end of the method. This adds _sync_device_code_entry_to_auth_store() calls on both paths so refreshed/synced tokens are written back to auth.json regardless of which code path succeeds.	2026-04-09 21:48:50 -07:00
Ben Barclay	a64d8a83e1	fix: proactive Codex CLI sync before refresh + retry on failure	2026-04-09 21:48:50 -07:00
Ben Barclay	dfde4058cf	fix: sync refreshed OAuth tokens from pool back to auth.json providers	2026-04-09 21:48:50 -07:00
kshitijk4poor	08e2a1a51e	fix(anthropic): omit tool-streaming beta on MiniMax endpoints MiniMax's Anthropic-compatible endpoints reject requests that include the fine-grained-tool-streaming beta header — every tool-use message triggers a connection error (~18s timeout). Regular chat works fine. Add _common_betas_for_base_url() that filters out the tool-streaming beta for Bearer-auth (MiniMax) endpoints while keeping all other betas. All four client-construction branches now use the filtered list. Based on #6528 by @HiddenPuppy. Original cherry-picked from PR #6688 by kshitijk4poor. Fixes #6510, fixes #6555.	2026-04-09 17:53:52 -07:00
sprmn24	e053433c84	fix(error_classifier): disambiguate usage-limit patterns in _classify_by_message _classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so messages like 'usage limit exceeded, try again in 5 minutes' arriving without an HTTP status code fell through to FailoverReason.unknown instead of rate_limit. Apply the same billing/rate-limit disambiguation that _classify_402 already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit, USAGE_LIMIT_PATTERNS alone → billing. Add 4 tests covering the no-status-code usage-limit path.	2026-04-09 16:24:13 -07:00
Teknium	97308707e9	fix: insert static fallback when compression summary fails When _generate_summary() failed (no provider, timeout, model error), the compressor silently dropped all middle turns with just a debug log. The agent would then see head + tail with no explanation of the gap, causing total context amnesia (generic greetings instead of continuing the conversation). Now generates a static fallback marker that tells the model context was lost and to continue from the recent tail messages. The fallback flows through the same role-alternation logic as a real summary so message structure stays valid.	2026-04-09 14:28:56 -07:00
Teknium	c6974fd108	fix: allow custom endpoint users to use main model for auxiliary tasks Step 1 of _resolve_auto() explicitly excluded 'custom' providers, forcing custom endpoint users through the fragile fallback chain instead of using their known-working main model credentials. This caused silent compression failures for users on local OpenAI- compatible endpoints — the summary generation would fail, middle turns would be silently dropped, and the agent would lose all conversation context. Remove 'custom' from the exclusion list so custom endpoint users get the same main-model-first treatment as DeepSeek, Anthropic, Gemini, and other direct providers.	2026-04-09 13:23:56 -07:00
KUSH42	34d06a9802	fix(compaction): don't halve context_length on output-cap-too-large errors When the API returns "max_tokens too large given prompt" (input tokens are within the context window, but input + requested output > window), the old code incorrectly routed through the same handler as "prompt too long" errors, calling get_next_probe_tier() and permanently halving context_length. This made things worse: the window was fine, only the requested output size needed trimming for that one call. Two distinct error classes now handled separately: Prompt too long — input itself exceeds context window. Fix: compress history + halve context_length (existing behaviour, unchanged). Output cap too large — input OK, but input + max_tokens > window. Fix: parse available_tokens from the error message, set a one-shot _ephemeral_max_output_tokens override for the retry, and leave context_length completely untouched. Changes: - agent/model_metadata.py: add parse_available_output_tokens_from_error() that detects Anthropic's "available_tokens: N" error format and returns the available output budget, or None for all other error types. - run_agent.py: call the new parser first in the is_context_length_error block; if it fires, set _ephemeral_max_output_tokens (with a 64-token safety margin) and break to retry without touching context_length. _build_api_kwargs consumes the ephemeral value exactly once then clears it so subsequent calls use self.max_tokens normally. - agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to clearly document the max_tokens (output cap) vs context_length (total window) distinction, which is a persistent source of confusion due to the OpenAI-inherited "max_tokens" name. - cli-config.yaml.example: add inline comments explaining both keys side by side where users are most likely to look. - website/docs/integrations/providers.md: add a callout box at the top of "Context Length Detection" and clarify the troubleshooting entry. - tests/test_ctx_halving_fix.py: 24 tests across four classes covering the parser, build_anthropic_kwargs clamping, ephemeral one-shot consumption, and the invariant that context_length is never mutated on output-cap errors.	2026-04-09 11:27:41 -07:00
Teknium	3007174a61	fix: prevent 400 format errors from triggering compression loop on Codex Responses API (#6751 ) The error classifier's generic-400 heuristic only extracted err_body_msg from the nested body structure (body['error']['message']), missing the flat body format used by OpenAI's Responses API (body['message']). This caused descriptive 400 errors like 'Invalid input[index].name: string does not match pattern' to appear generic when the session was large, misclassifying them as context overflow and triggering an infinite compression loop. Added flat-body fallback in _classify_400() consistent with the parent classify_api_error() function's existing handling at line 297-298.	2026-04-09 11:11:34 -07:00
Yang Zhi	110cdd573a	fix(auxiliary_client): inject KimiCLI User-Agent for custom endpoint sync clients When is explicitly set to , the custom-endpoint path in creates a plain client without provider-specific headers. This means sync vision calls (e.g. ) use the generic User-Agent and get rejected by Kimi's coding endpoint with a 403: 'Kimi For Coding is currently only available for Coding Agents such as Kimi CLI...' The async converter already injects , and the auto-detected API-key provider path also injects it, but the explicit custom endpoint shortcut was missing it entirely. This patch adds the same injection to the custom endpoint branch, and updates all existing Kimi header sites to for consistency. Fixes <issue number to be filled in>	2026-04-09 11:11:25 -07:00
Yang Zhi	4d1b988070	fix(credential_pool): use _resolve_kimi_base_url when seeding kimi-coding pool The credential pool seeder (_seed_from_env) hardcoded the base URL for API-key providers without running provider-specific auto-detection. For kimi-coding, this caused sk-kimi- prefixed keys to be seeded with the legacy api.moonshot.ai/v1 endpoint instead of api.kimi.com/coding/v1, resulting in HTTP 401 on the first request. Import and call _resolve_kimi_base_url for kimi-coding so the pool uses the correct endpoint based on the key prefix, matching the runtime credential resolver behavior. Also fix a comment: sk-kimi- keys are issued by kimi.com/code, not platform.kimi.ai. Fixes #5561	2026-04-09 11:11:25 -07:00
Teknium	1ec1f6a68a	fix: model fallback — stale model on Nous login + connection error fallback (#6554 ) Two bugs in the model fallback system: 1. Nous login leaves stale model in config (provider=nous, model=opus from previous OpenRouter setup). Fixed by deferring the config.yaml provider write until AFTER model selection completes, and passing the selected model atomically via _update_config_for_provider's default_model parameter. Previously, _update_config_for_provider was called before model selection — if selection failed (free tier, no models, exception), config stayed as nous+opus permanently. 2. Codex/stale providers in auxiliary fallback can't connect but block the auto-detection chain. Added _is_connection_error() detection (APIConnectionError, APITimeoutError, DNS failures, connection refused) alongside the existing _is_payment_error() check in call_llm(). When a provider endpoint is unreachable, the system now falls back to the next available provider instead of crashing.	2026-04-09 10:38:53 -07:00
Teknium	1a3ae6ac6e	feat: structured API error classification for smart failover (#6514 ) Add agent/error_classifier.py with a priority-ordered classification pipeline that replaces scattered inline string-matching in the retry loop with structured error taxonomy and recovery hints. FailoverReason enum (14 categories): auth, auth_permanent, billing, rate_limit, overloaded, server_error, timeout, context_overflow, payload_too_large, model_not_found, format_error, thinking_signature, long_context_tier, unknown. ClassifiedError dataclass carries reason + recovery action hints (retryable, should_compress, should_rotate_credential, should_fallback). Key improvements over inline matching: - 402 disambiguation: 'insufficient credits' = billing (immediate rotate), 'usage limit, try again' = rate_limit (backoff first) - OpenRouter 403 'key limit exceeded' correctly classified as billing - Error cause chain walking (walks __cause__/__context__ up to 5 levels) - Body message included in pattern matching (SDK str() misses it) - Server disconnect + large session check ordered before generic transport catch so RemoteProtocolError triggers compression when appropriate - Chinese error message support for context overflow run_agent.py: replaced 6 inline detection blocks with classifier calls, net -55 lines. All recovery actions (pool rotation, fallback activation, compression, transport recovery) unchanged. 65 new unit tests + 10 E2E tests + live tests with real SDK error objects. Inspired by OpenClaw's failover error classification system.	2026-04-09 04:10:11 -07:00
Teknium	8dfc96dbbb	feat: capture provider rate limit headers and show in /usage (#6541 ) Parse x-ratelimit-* headers from inference API responses (Nous Portal, OpenRouter, OpenAI-compatible) and display them in the /usage command. - New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/ TPM/TPH limits, remaining, reset timers), format as progress bars (CLI) or compact one-liner (gateway) - Hook into streaming path in run_agent.py: stream.response.headers is available on the OpenAI SDK Stream object before chunks are consumed - CLI /usage: appends rate limit section with progress bars + warnings when any bucket exceeds 80% - Gateway /usage: appends compact rate limit summary - 24 unit tests covering parsing, formatting, edge cases Headers captured per response: x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h} Example CLI display: Nous Rate Limits (captured just now): Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s) Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)	2026-04-09 03:43:14 -07:00
konsisumer	3c8ec7037c	fix(agent): catch PermissionError in subdirectory hint discovery Wrap is_dir() in _is_valid_subdir() and is_file() in _load_hints_for_directory() with OSError handlers so that inaccessible directories (e.g. /root from a non-root Daytona host user) are silently skipped instead of crashing the agent. The existing PermissionError PRs for prompt_builder.py (#6247, #6321, #6355) do not cover subdirectory_hints.py, which was identified as a separate crash path in the #6214 comments. Ref: #6214	2026-04-09 03:10:30 -07:00
Teknium	b408379e9d	fix: reduce credential exhaustion TTL from 24 hours to 1 hour (#6504 ) The 24-hour default cooldown for 402-exhausted credentials was far too aggressive — if a user tops up credits or the 402 was caused by an oversized max_tokens request rather than true billing exhaustion, they shouldn't have to wait a full day. Reduce to 1 hour (matching the existing 429 TTL). Inspired by PR #6493 (michalkomar).	2026-04-09 02:37:23 -07:00
Cherif Yaya	5cf4fac2aa	fix: restore codex fallback auth-store lookup	2026-04-09 01:56:10 -07:00
Hunter B	894e8c8a8f	fix: resolve opencode.ai context window to 1M and clean up display formatting Two issues resolved: 1. Add opencode.ai to _URL_TO_PROVIDER mapping so base_url routes through models.dev lookup (which has mimo-v2-pro at 1M context) instead of falling back to probing /models (404) and defaulting to 128K. 2. Fix _format_context_length to round cleanly: 1048576 → '1M' instead of '1.048576M'. Applies same rounding logic to K values.	2026-04-09 01:43:22 -07:00
BongSuCHOI	d12f8db0b8	fix(compaction): token-budget primary tail protection Tail protection was effectively message-count based despite having a token budget, because protect_last_n=20 acted as a hard floor. A single 50K-token tool output would cause all 20 recent messages to be preserved regardless of budget, leaving little room for summarization. Changes: - _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20) to 3; token budget is now the primary criterion - Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message - _prune_old_tool_results: accepts optional protect_tail_tokens so pruning also respects the token budget instead of a fixed count - compress() minimum message check relaxed from protect_first_n + protect_last_n + 1 to protect_first_n + 3 + 1 - Tool group alignment (no splitting tool_call/result) preserved	2026-04-08 23:54:23 -07:00
Teknium	d97f6cec7f	feat(gateway): add BlueBubbles iMessage platform adapter (#6437 ) Adds Apple iMessage as a gateway platform via BlueBubbles macOS server. Architecture: - Webhook-based inbound (event-driven, no polling/dedup needed) - Email/phone → chat GUID resolution for user-friendly addressing - Private API safety (checks helper_connected before tapback/typing) - Inbound attachment downloading (images, audio, documents cached locally) - Markdown stripping for clean iMessage delivery - Smart progress suppression for platforms without message editing Based on PR #5869 by @benjaminsehl (webhook architecture, GUID resolution, Private API safety, progress suppression) with inbound attachment downloading from PR #4588 by @1960697431 (attachment cache routing). Integration points: Platform enum, env config, adapter factory, auth maps, cron delivery, send_message routing, channel directory, platform hints, toolset definition, setup wizard, status display. 27 tests covering config, adapter, webhook parsing, GUID resolution, attachment download routing, toolset consistency, and prompt hints.	2026-04-08 23:54:03 -07:00
SHL0MS	8567031433	fix: improve context compression quality — named constants, tool tracking, degradation warning Three targeted improvements to the compression system: 1. Replace hardcoded truncation limits with named class constants (_CONTENT_MAX=6000, _CONTENT_HEAD=4000, _CONTENT_TAIL=1500, _TOOL_ARGS_MAX=1500, _TOOL_ARGS_HEAD=1200). Previous limits (3000/500) heavily truncated the summarizer's input — a 200-line edit got cut to 3000 chars before the summarizer ever saw it. 2. Add '## Tools & Patterns' section to both compression prompt templates (first-pass and iterative). Preserves working tool invocations, preferred flags, and tool-specific discoveries across compaction boundaries. 3. Warn users on 2nd+ compression: 'Session compressed N times — accuracy may degrade. Consider /new to start fresh.' Ref #499	2026-04-08 20:54:23 -07:00
kshitijk4poor	875a72e4c8	fix: normalize httpx.URL base_url + strip thinking signatures for third-party endpoints Two linked fixes for MiniMax Anthropic-compatible fallback: 1. Normalize httpx.URL to str before calling .rstrip() in auth/provider detection helpers. Some client objects expose base_url as httpx.URL, not str — crashed with AttributeError in _requires_bearer_auth() and _is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback() to use the already-stringified fb_base_url instead of raw httpx.URL. 2. Strip Anthropic-proprietary thinking block signatures when targeting third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry, self-hosted proxies). These endpoints cannot validate Anthropic's signatures and reject them with HTTP 400 'Invalid signature in thinking block'. Now threads base_url through convert_messages_to_anthropic() → build_anthropic_kwargs() so signature management is endpoint-aware. Based on PR #4945 by kshitijk4poor (rstrip fix). Fixes #4944.	2026-04-08 16:39:29 -07:00
Teknium	7156f8d866	fix: CI test failures — metadata key, cli console, docker env, vision order (#6294 ) Fixes 9 test failures on current main, incorporating ideas from PR stack #6219-#6222 by xinbenlv with corrections: - model_metadata: sync HF context length key casing (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5) - cli.py: route quick command error output through self.console instead of creating a new ChatConsole() instance - docker.py: explicit docker_forward_env entries now bypass the Hermes secret blocklist (intentional opt-in wins over generic filter) - auxiliary_client: revert _read_main_provider() to simple provider.strip().lower() — the _normalize_aux_provider() call introduced in `5c03f2e7` stripped the custom: prefix, breaking named custom provider resolution - auxiliary_client: flip vision auto-detection order to active provider → OpenRouter → Nous → stop (was OR → Nous → active) - test: update vision priority test to match new order Based on PR #6219-#6222 by xinbenlv.	2026-04-08 16:37:05 -07:00
Teknium	5d2fc6d928	fix: cleanup Qwen OAuth provider gaps - Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing despite being referenced in code) - Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS (non-aggregator providers use their main model for aux tasks automatically)	2026-04-08 13:46:30 -07:00
kshitijk4poor	3377017eb4	feat(qwen): add Qwen OAuth provider with portal request support Based on #6079 by @tunamitom with critical fixes and comprehensive tests. Changes from #6079: - Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex field sanitization, not before (was silently discarding Qwen transforms) - Fix: missing try/except AuthError in runtime_provider.py — stale Qwen credentials now fall through to next provider on auto-detect - Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba' (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider - Fix: hardcoded ['coder-model'] replaced with live API fetch + curated fallback list (qwen3-coder-plus, qwen3-coder) - Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace 5 inline 'portal.qwen.ai' string checks and share headers between init and credential swap - Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session credential swaps - Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice - Fix: handle bare string items in content lists (were silently dropped) - Fix: remove redundant dict() copies after deepcopy in message prep - Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion New tests (30 test functions): - _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths) - _save_qwen_cli_tokens (roundtrip, parent creation, permissions) - _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew, None, non-numeric) - _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths, default expires_in, disk persistence) - resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh, missing token, env override) - get_qwen_auth_status (logged in, not logged in) - Runtime provider resolution (direct, pool entry, alias) - _build_api_kwargs (metadata, vl_high_resolution_images, message formatting, max_tokens suppression)	2026-04-08 13:46:30 -07:00
Teknium	c8a5e36be8	feat(prompting): self-optimized GPT/Codex tool-use guidance via automated behavioral benchmarking (#6120 ) Hermes Agent identified and patched its own prompting blind spots through automated self-evaluation — running 64+ tool-use benchmarks across GPT-5.4 and Codex-5.3, diagnosing 5 failure modes, writing targeted prompt patches, and verifying the fix in a closed loop. Failure modes discovered and fixed: - Mental arithmetic (wrong answers: 39,152,053 vs correct 39,151,253) - User profile hallucination ('Windows 11' when running on Linux) - Time guessing without verification - Clarification-seeking instead of acting ('open where?' for port checks) - Hash computation from memory (SHA-256, encodings) - Confusing system RAM with agent's own persistent memory store Two new XML sections added to OPENAI_MODEL_EXECUTION_GUIDANCE: - <mandatory_tool_use>: explicit categories that must always use tools - <act_dont_ask>: default to action on obvious interpretations Results: gpt-5.4: 68.8% → 100% tool compliance (+31.2pp) gpt-5.3-codex: 62.5% → 100% tool compliance (+37.5pp) Regression: 0/8 conversational prompts over-tooled	2026-04-08 04:06:42 -07:00
Teknium	1368caf66f	fix(anthropic): smart thinking block signature management (#6112 ) Anthropic signs thinking blocks against the full turn content. Any upstream mutation (context compression, session truncation, orphan stripping, message merging) invalidates the signature, causing HTTP 400 'Invalid signature in thinking block' — especially in long-lived gateway sessions. Strategy (following clawdbot/OpenClaw pattern): 1. Strip thinking/redacted_thinking from all assistant messages EXCEPT the last one — preserves reasoning continuity on the current tool-use chain while avoiding stale signature errors on older turns. 2. Downgrade unsigned thinking blocks to plain text — Anthropic can't validate them, but the reasoning content is preserved. 3. Strip cache_control from thinking/redacted_thinking blocks to prevent cache markers from interfering with signature validation. 4. Drop thinking blocks from the second message when merging consecutive assistant messages (role alternation enforcement). 5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking', strip all reasoning_details from the conversation and retry once. This is the safety net for edge cases the proactive stripping misses. Addresses the issue reported in PR #6086 by @mingginwan while preserving reasoning continuity (their PR stripped ALL thinking blocks unconditionally). Files changed: - agent/anthropic_adapter.py: thinking block management in convert_messages_to_anthropic (strip old turns, downgrade unsigned, strip cache_control, merge-time strip) - run_agent.py: one-shot signature error recovery in retry loop - tests/test_anthropic_adapter.py: 10 new tests covering all cases	2026-04-08 03:38:08 -07:00
kshitij	22d1bda185	fix(minimax): correct context lengths, model catalog, thinking guard, aux model, and config base_url Cherry-picked from PR #6046 by kshitijk4poor with dead code stripped. - Context lengths: 204800 → 1M (M1) / 1048576 (M2.5/M2.7) per official docs - Model catalog: add M1 family, remove deprecated M2.1 and highspeed variants - Thinking guard: skip extended thinking for MiniMax (Anthropic-compat endpoint) - Aux model: MiniMax-M2.7-highspeed → MiniMax-M2.7 (same model, half price) - Config base_url: honour model.base_url for API-key providers (fixes China users) - Stripped unused get_minimax_max_output() / _MINIMAX_MAX_OUTPUT (no consumer) Fixes #5777, #4082, #6039. Closes #3895.	2026-04-08 02:20:46 -07:00
Mibayy	ab271ebe10	fix(vision): simplify vision auto-detection to openrouter → nous → active provider Simplify the vision auto-detection chain from 5 backends (openrouter, nous, codex, anthropic, custom) down to 3: 1. OpenRouter (known vision-capable default model) 2. Nous Portal (known vision-capable default model) 3. Active provider + model (whatever the user is running) 4. Stop This is simpler and more predictable. The active provider step uses resolve_provider_client() which handles all provider types including named custom providers (from #5978). Removed the complex preferred-provider promotion logic and API-level fallback — the chain is short enough that it doesn't need them. Based on PR #5376 by Mibay. Closes #5366.	2026-04-08 01:21:54 -07:00
zocomputer	e1befe5077	feat(agent): add jittered retry backoff Adds agent/retry_utils.py with jittered_backoff() — exponential backoff with additive jitter to prevent thundering-herd retry spikes when multiple gateway sessions hit the same rate-limited provider. Replaces fixed exponential backoff at 4 call sites: - run_agent.py: None-choices retry path (5s base, 120s cap) - run_agent.py: API error retry path (2s base, 60s cap) - trajectory_compressor.py: sync + async summarization retries Thread-safe jitter counter with overflow guards ensures unique seeds across concurrent retries. Trimmed from original PR to keep only wired-in functionality. Co-authored-by: martinp09 <martinp09@users.noreply.github.com>	2026-04-08 00:41:36 -07:00
Teknium	5c03f2e7cc	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 ) Salvaged fixes from community PRs: - fix(model_switch): _read_auth_store → _load_auth_store + fix auth store key lookup (was checking top-level dict instead of store['providers']). OAuth providers now correctly detected in /model picker. Cherry-picked from PR #5911 by Xule Lin (linxule). - fix(ollama): pass num_ctx to override 2048 default context window. Ollama defaults to 2048 context regardless of model capabilities. Now auto-detects from /api/show metadata and injects num_ctx into every request. Config override via model.ollama_num_ctx. Fixes #2708. Cherry-picked from PR #5929 by kshitij (kshitijk4poor). - fix(aux): normalize provider aliases for vision/auxiliary routing. Adds _normalize_aux_provider() with 17 aliases (google→gemini, claude→anthropic, glm→zai, etc). Fixes vision routing failure when provider is set to 'google' instead of 'gemini'. Cherry-picked from PR #5793 by e11i (Elizabeth1979). - fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK. MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API), but auxiliary client uses OpenAI SDK which appends /chat/completions → 404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint. Inspired by PR #5786 by Lempkey. Added debug logging to silent exception blocks across all fixes. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-07 22:23:28 -07:00
Teknium	8d7a98d2ff	feat: use mimo-v2-pro for non-vision auxiliary tasks on Nous free tier (#6018 ) Free-tier Nous Portal users were getting mimo-v2-omni (a multimodal model) for all auxiliary tasks including compression, session search, and web extraction. Now routes non-vision tasks to mimo-v2-pro (a text model) which is better suited for those workloads. - Added _NOUS_FREE_TIER_AUX_MODEL constant for text auxiliary tasks - _try_nous() accepts vision=False param to select the right model - Vision path (_resolve_strict_vision_backend) passes vision=True - All other callers default to vision=False → mimo-v2-pro	2026-04-07 21:41:05 -07:00
Teknium	cbf1f15cfe	fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978 ) * fix(telegram): replace substring caption check with exact line-by-line match Captions in photo bursts and media group albums were silently dropped when a shorter caption happened to be a substring of an existing one (e.g. "Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption static helper that splits on "\n\n" and uses exact match with whitespace normalisation, then use it in both _enqueue_photo_event and _queue_media_group_event. Adds 13 unit tests covering the fixed bug scenarios. Cherry-picked from PR #2671 by Dilee. * fix: extend caption substring fix to all platforms Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter so all adapters inherit it. Fix the same substring-containment bug in: - gateway/platforms/base.py (photo burst merging) - gateway/run.py (priority photo follow-up merging) - gateway/platforms/feishu.py (media batch merging) The original fix only covered telegram.py. The same bug existed in base.py and run.py (pure substring check) and feishu.py (list membership without whitespace normalization). * fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when using named custom providers defined in config.yaml: 1. 'provider: main' was hardcoded to 'custom', which only checks legacy OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek'). 2. Named custom provider names (e.g., 'beans') fell through to PROVIDER_REGISTRY which doesn't know about config.yaml entries. Now checks _get_named_custom_provider() before the registry fallback. Fixes both resolve_provider_client() and _normalize_vision_provider() so the fix covers all auxiliary tasks (vision, compression, web_extract, session_search, etc.). Adds 13 unit tests. Reported by Laura via Discord. --------- Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>	2026-04-07 17:59:47 -07:00
Teknium	678a87c477	refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites Add three reusable helpers to eliminate pervasive boilerplate: tools/registry.py — tool_error() and tool_result(): Every tool handler returns JSON strings. The pattern json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times, and json.dumps({"success": False, "error": msg}, ...) another 23. Now: tool_error(msg) or tool_error(msg, success=False). tool_result() handles arbitrary result dicts: tool_result(success=True, data=payload) or tool_result(some_dict). hermes_cli/config.py — read_raw_config(): Lightweight YAML reader that returns the raw config dict without load_config()'s deep-merge + migration overhead. Available for callsites that just need a single config value. Migration (129 callsites across 32 files): - tools/: browser_camofox (18), file_tools (10), homeassistant (8), web_tools (7), skill_manager (7), cronjob (11), code_execution (4), delegate (5), send_message (4), tts (4), memory (7), session_search (3), mcp (2), clarify (2), skills_tool (3), todo (1), vision (1), browser (1), process_registry (2), image_gen (1) - plugins/memory/: honcho (9), supermemory (9), hindsight (8), holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2) - agent/: memory_manager (2), builtin_memory_provider (1)	2026-04-07 13:36:38 -07:00
Teknium	ca0459d109	refactor: remove 24 confirmed dead functions — 432 lines of unused code Each function was verified to have exactly 1 reference in the entire codebase (its own definition). Zero calls, zero imports, zero string references anywhere including tests. Removed by category: Superseded wrappers (replaced by newer implementations): - agent/anthropic_adapter.py: run_hermes_oauth_login, refresh_hermes_oauth_token - hermes_cli/callbacks.py: sudo_password_callback (superseded by CLI method) - hermes_cli/setup.py: _set_model_provider, _sync_model_from_disk - tools/file_tools.py: get_file_tools (superseded by registry.register) - tools/cronjob_tools.py: get_cronjob_tool_definitions (same) - tools/terminal_tool.py: _check_dangerous_command (_check_all_guards used) Dead private helpers (lost their callers during refactors): - agent/anthropic_adapter.py: _convert_user_content_part_to_anthropic - agent/display.py: honcho_session_line, write_tty - hermes_cli/providers.py: _build_labels (+ dead _labels_cache var) - hermes_cli/tools_config.py: _prompt_yes_no - hermes_cli/models.py: _extract_model_ids - hermes_cli/uninstall.py: log_error - gateway/platforms/feishu.py: _is_loop_ready - tools/file_operations.py: _read_image (64-line method) - tools/process_registry.py: cleanup_expired - tools/skill_manager_tool.py: check_skill_manage_requirements Dead class methods (zero callers): - run_agent.py: _is_anthropic_url (logic duplicated inline at L618) - run_agent.py: _classify_empty_content_response (68-line method, never wired) - cli.py: reset_conversation (callers all use new_session directly) - cli.py: _clear_current_input (added but never wired in) Other: - gateway/delivery.py: build_delivery_context_for_tool - tools/browser_tool.py: get_active_browser_sessions	2026-04-07 11:41:26 -07:00

1 2 3 4 5 ...

378 commits