hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Andre Kurait	b290297d66	fix(bedrock): resolve context length via static table before custom-endpoint probe ## Problem `get_model_context_length()` in `agent/model_metadata.py` had a resolution order bug that caused every Bedrock model to fall back to the 128K default context length instead of reaching the static Bedrock table (200K for Claude, etc.). The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in `_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False. The resolution order then ran the custom-endpoint probe (step 2) before the Bedrock branch (step 4b), which: 1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`). 2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the bedrock-runtime URL (Bedrock doesn't serve this shape). 3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the "probe-down" branch — never reaching the Bedrock static table. Result: users on Bedrock saw 128K context for Claude models that actually support 200K on Bedrock, causing premature auto-compression. ## Fix Promote the Bedrock branch from step 4b to step 1b, so it runs before the custom-endpoint probe at step 2. The static table in `bedrock_adapter.py::get_bedrock_context_length()` is the authoritative source for Bedrock (the ListFoundationModels API doesn't expose context window sizes), so there's no reason to probe `/models` first. The original step 4b is replaced with a one-line breadcrumb comment pointing to the new location, to make the resolution-order docstring accurate. ## Changes - `agent/model_metadata.py` - Add step 1b: Bedrock static-table branch (unchanged predicate, moved). - Remove dead step 4b block, replace with breadcrumb comment. - Update resolution-order docstring to include step 1b. - `tests/agent/test_model_metadata.py` - New `TestBedrockContextResolution` class (3 tests): - `test_bedrock_provider_returns_static_table_before_probe`: confirms `provider="bedrock"` hits the static table and does NOT call `fetch_endpoint_model_metadata` (regression guard). - `test_bedrock_url_without_provider_hint`: confirms the `bedrock-runtime.*.amazonaws.com` host match works without an explicit `provider=` hint. - `test_non_bedrock_url_still_probes`: confirms the probe still fires for genuinely-custom endpoints (no over-reach). ## Testing pytest tests/agent/test_model_metadata.py -q # 83 passed in 1.95s (3 new + 80 existing) ## Risk Very low. - Predicate is identical to the original step 4b — no behaviour change for non-Bedrock paths. - Original step 4b was dead code for the user-facing case (always hit the 128K fallback first), so removing it cannot regress behaviour. - Bedrock path now short-circuits before any network I/O — faster too. - `ImportError` fall-through preserved so users without `boto3` installed are unaffected. ## Related - This is a prerequisite for accurate context-window accounting on Bedrock — the fix for #14710 (stale-connection client eviction) depends on correct context sizing to know when to compress. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
NiuNiu Xia	76329196c1	fix(copilot): wire live /models max_prompt_tokens into context-window resolver The Copilot provider resolved context windows via models.dev static data, which does not include account-specific models (e.g. claude-opus-4.6-1m with 1M context). This adds the live Copilot /models API as a higher- priority source for copilot/copilot-acp/github-copilot providers. New helper get_copilot_model_context() in hermes_cli/models.py extracts capabilities.limits.max_prompt_tokens from the cached catalog. Results are cached in-process for 1 hour. In agent/model_metadata.py, step 5a queries the live API before falling through to models.dev (step 5b). This ensures account-specific models get correct context windows while standard models still have a fallback. Part 1 of #7731. Refs: #7272	2026-04-24 05:09:08 -07:00
Teknium	346601ca8d	fix(context): invalidate stale Codex OAuth cache entries >= 400k (#15078 ) PR #14935 added a Codex-aware context resolver but only new lookups hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4 BEFORE that PR already had the wrong value (e.g. 1,050,000 from models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the cache-first lookup in get_model_context_length() returns it forever. Symptom (reported in the wild by Ludwig, min heo, Gaoge on current main at `6051fba9d`, which is AFTER #14935): * Startup banner shows context usage against 1M * Compression fires late and then OpenAI hard-rejects with 'context length will be reduced from 1,050,000 to 128,000' around the real 272k boundary. Fix: when the step-1 cache returns a value for an openai-codex lookup, check whether it's >= 400k. Codex OAuth caps every slug at 272k (live probe values) so anything at or above 400k is definitionally a pre-#14935 leftover. Drop that entry from the on-disk cache and fall through to step 5, which runs the live /models probe and repersists the correct value (or 272k from the hardcoded fallback if the probe fails). Non-Codex providers and legitimately-cached Codex entries at 272k are untouched. Changes: - agent/model_metadata.py: * _invalidate_cached_context_length() — drop a single entry from context_length_cache.yaml and rewrite the file. * Step-1 cache check in get_model_context_length() now gates provider=='openai-codex' entries >= 400k through invalidation instead of returning them. Tests (3 new in TestCodexOAuthContextLength): - stale 1.05M Codex entry is dropped from disk AND re-resolved through the live probe to 272k; unrelated cache entries survive. - fresh 272k Codex entry is respected (no probe call, no invalidation). - non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter) are unaffected — the guard is strictly scoped to openai-codex. Full tests/agent/test_model_metadata.py: 88 passed.	2026-04-24 04:46:07 -07:00
Teknium	f58a16f520	fix(auth): apply verify= to Codex OAuth /models probe (#15049 ) Follow-up to PR #14533 — applies the same _resolve_requests_verify() treatment to the one requests.get() site the PR missed (Codex OAuth chatgpt.com /models probe). Keeps all seven requests.get() callsites in model_metadata.py consistent so HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE are honored everywhere. Co-authored-by: teknium1 <teknium@hermes-agent>	2026-04-24 03:02:24 -07:00
0xbyt4	8aa37a0cf9	fix(auth): honor SSL CA env vars across httpx + requests callsites - hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi fallback (mirrors weixin `3a0ec1d93`). Extend env var chain to include REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths. - agent/model_metadata.py: add _resolve_requests_verify() reading HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority order. Apply explicit verify= to all 6 requests.get callsites. - Tests: 18 new unit tests + autouse platform pin on existing TestResolveVerifyFallback to keep its "returns True" assertions platform-independent. Empirically verified against self-signed HTTPS server: requests honors REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now honors all three everywhere. Triggered by Discord reports — Nous OAuth SSL failure on macOS Homebrew Python; custom provider self-signed cert ignored despite REQUESTS_CA_BUNDLE set in env.	2026-04-24 03:00:33 -07:00
Teknium	51f4c9827f	fix(context): resolve real Codex OAuth context windows (272k, not 1M) (#14935 ) On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens, but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev) because openai-codex aliases to the openai entry there. At 1.05M the compressor never fires and requests hard-fail with 'context window exceeded' around the real 272k boundary. Verified live against chatgpt.com/backend-api/codex/models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.1-codex-max → context_window = 272000 Changes: - agent/model_metadata.py: * _fetch_codex_oauth_context_lengths() — probe the Codex /models endpoint with the OAuth bearer token and read context_window per slug (1h in-memory TTL). * _resolve_codex_oauth_context_length() — prefer the live probe, fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k). * Wire into get_model_context_length() when provider=='openai-codex', running BEFORE the models.dev lookup (which returns 1.05M). Result persists via save_context_length() so subsequent lookups skip the probe entirely. * Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5 entry (400k was never right for Codex; it's the catch-all for providers we can't probe live). Tests (4 new in TestCodexOAuthContextLength): - fallback table used when no token is available (no models.dev leakage) - live probe overrides the fallback - probe failure (non-200) falls back to hardcoded 272k - non-codex providers (openrouter, direct openai) unaffected Non-codex context resolution is unchanged — the Codex branch only fires when provider=='openai-codex'.	2026-04-23 22:39:47 -07:00
Teknium	8f5fee3e3e	feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720 ) OpenAI launched GPT-5.5 on Codex today (Apr 23 2026). Adds it to the static catalog and pipes the user's OAuth access token into the openai-codex path of provider_model_ids() so /model mid-session and the gateway picker hit the live ChatGPT codex/models endpoint — new models appear for each user according to what ChatGPT actually lists for their account, without a Hermes release. Verified live: 'gpt-5.5' returns priority 0 (featured) from the endpoint, 400k context per OpenAI's launch article. 'hermes chat --provider openai-codex --model gpt-5.5' completes end-to-end. Changes: - hermes_cli/codex_models.py: add gpt-5.5 to DEFAULT_CODEX_MODELS + forward-compat - agent/model_metadata.py: 400k context length entry - hermes_cli/models.py: resolve codex OAuth token before calling get_codex_model_ids() in provider_model_ids('openai-codex')	2026-04-23 13:32:43 -07:00
kshitij	82a0ed1afb	feat: add Xiaomi MiMo v2.5-pro and v2.5 model support (#14635 ) ## Merged Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard. ### Changes - Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144) - Provider lists: xiaomi, opencode-go, setup wizard - Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal) - Config description updated for XIAOMI_API_KEY - Tests updated for new vision model preference ### Verification - 4322 tests passed, 0 new regressions - Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass - Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)	2026-04-23 10:06:25 -07:00
wujhsu	276ef49c96	fix(provider): recognize open.bigmodel.cn as Zhipu/ZAI provider Zhipu AI (智谱) serves both international users via api.z.ai and China-based users via open.bigmodel.cn. The domestic endpoint was not mapped in _URL_TO_PROVIDER, causing Hermes to treat it as an unknown custom endpoint and fall back to the default 128K context length instead of resolving the correct 200K+ context via models.dev or the hardcoded GLM defaults. This affects users of both the standard API (https://open.bigmodel.cn/api/paas/v4) and the Coding Plan (https://open.bigmodel.cn/api/coding/paas/v4).	2026-04-22 17:35:55 -07:00
Clifford Garwood	27621ef836	feat: add ctx_size to context length keys for Lemonade server support - Adds 'ctx_size' field to _CONTEXT_LENGTH_KEYS tuple - Enables hermes agent to correctly detect context size from custom LLMs running on Lemonade server that use this field name instead of the standard keys (max_seq_len, n_ctx_train, n_ctx)	2026-04-22 17:25:04 -07:00
Feranmi	66d2d7090e	fix(model_metadata): add gemma-4 and gemma4 context length entries Fixes #12976 The generic "gemma": 8192 fallback was incorrectly matching gemma4:31b-cloud before the more specific Gemma 4 entries could match, causing Hermes to assign only 8K context instead of 262K. Added "gemma-4" and "gemma4" entries before the fallback to correctly handle Gemma 4 model naming conventions.	2026-04-22 16:33:25 -07:00
Teknium	c96a548bde	feat(models): add xiaomi/mimo-v2.5-pro and mimo-v2.5 to openrouter + nous (#14184 ) Replace xiaomi/mimo-v2-pro with xiaomi/mimo-v2.5-pro and xiaomi/mimo-v2.5 in the OpenRouter fallback catalog and the nous provider model list. Add matching DEFAULT_CONTEXT_LENGTHS entries (1M tokens each).	2026-04-22 16:12:39 -07:00
ismell0992-afk	6513138f26	fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts `is_local_endpoint()` leaned on `ipaddress.is_private`, which classifies RFC-1918 ranges and link-local as private but deliberately excludes the RFC 6598 CGNAT block (100.64.0.0/10) — the range Tailscale uses for its mesh IPs. As a result, Ollama reached over Tailscale (e.g. `http://100.77.243.5:11434`) was treated as remote and missed the automatic stream-read / stale-stream timeout bumps, so cold model load plus long prefill would trip the 300 s watchdog before the first token. Add a module-level `_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")` (built once) and extend `is_local_endpoint()` to match the block both via the parsed-`IPv4Address` path and the existing bare-string fallback (for symmetry with the 10/172/192 checks). Also hoist the previously function-local `import ipaddress` to module scope now that it's used by the constant. Extend `TestIsLocalEndpoint` with a CGNAT positive set (lower bound, representative host, MagicDNS anchor, upper bound) and a near-miss negative set (just below 100.64.0.0, just above 100.127.255.255, well outside the block, and first-octet-wrong).	2026-04-22 14:46:10 -07:00
hengm3467	c6b1ef4e58	feat: add Step Plan provider support (salvage #6005 ) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>	2026-04-22 02:59:58 -07:00
Teknium	62cbeb6367	test: stop testing mutable data — convert change-detectors to invariants (#13363 ) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).	2026-04-20 23:20:33 -07:00
Teknium	dbb7e00e7e	fix: sweep remaining provider-URL substring checks across codebase Completes the hostname-hardening sweep — every substring check against a provider host in live-routing code is now hostname-based. This closes the same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen, ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI, Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI, and Anthropic. New helper: - utils.base_url_host_matches(base_url, domain) — safe counterpart to 'domain in base_url'. Accepts hostname equality and subdomain matches; rejects path segments, host suffixes, and prefix collisions. Call sites converted (real-code only; tests, optional-skills, red-teaming scripts untouched): run_agent.py (10 sites): - AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check) - header cascade for openrouter / copilot / kimi / qwen / chatgpt - interleaved-thinking trigger (openrouter + claude) - _is_openrouter_url(), _is_qwen_portal() - is_native_anthropic check - github-models-vs-copilot detection (3 sites) - reasoning-capable route gate (nousresearch, vercel, github) - codex-backend detection in API kwargs build - fallback api_mode Bedrock detection agent/auxiliary_client.py (7 sites): - extra-headers cascades in 4 distinct client-construction paths (resolve custom, resolve auto, OpenRouter-fallback-to-custom, _async_client_from_sync, resolve_provider_client explicit-custom, resolve_auto_with_codex) - _is_openrouter_client() base_url sniff agent/usage_pricing.py: - resolve_billing_route openrouter branch agent/model_metadata.py: - _is_openrouter_base_url(), Bedrock context-length lookup hermes_cli/providers.py: - determine_api_mode Bedrock heuristic hermes_cli/runtime_provider.py: - _is_openrouter_url flag for API-key preference (issues #420, #560) hermes_cli/doctor.py: - Kimi User-Agent header for /models probes tools/delegate_tool.py: - subagent Codex endpoint detection trajectory_compressor.py: - _detect_provider() cascade (8 providers: openrouter, nous, codex, zai, kimi-coding, arcee, minimax-cn, minimax) cli.py, gateway/run.py: - /model-switch cache-enabled hint (openrouter + claude) Bedrock detection tightened from 'bedrock-runtime in url' to 'hostname starts with bedrock-runtime. AND host is under amazonaws.com'. ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'. Tests: - tests/test_base_url_hostname.py extended with a base_url_host_matches suite (exact match, subdomain, path-segment rejection, host-suffix rejection, host-prefix rejection, empty-input, case-insensitivity, trailing dot). Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock, gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback, fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution, delegate, credential_pool, context_compressor, plus the 4 hostname test modules). 26-assertion E2E call-site verification across 6 modules passes.	2026-04-20 22:14:29 -07:00
Teknium	cecf84daf7	fix: extend hostname-match provider detection across remaining call sites Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the two openai/xai sites in run_agent.py. This finishes the sweep: the same substring-match false-positive class (e.g. https://api.openai.com.evil/v1, https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1) existed in eight more call sites, and the hostname helper was duplicated in two modules. - utils: add shared base_url_hostname() (single source of truth). - hermes_cli/runtime_provider, run_agent: drop local duplicates, import from utils. Reuse the cached AIAgent._base_url_hostname attribute everywhere it's already populated. - agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg selection to hostname equality. - run_agent: native-anthropic check in the Claude-style model branch and in the AIAgent init provider-auto-detect branch. - agent/model_metadata: Anthropic /v1/models context-length lookup. - hermes_cli/providers.determine_api_mode: anthropic / openai URL heuristics for custom/unknown providers (the /anthropic path-suffix convention for third-party gateways is preserved). - tools/delegate_tool: anthropic detection for delegated subagent runtimes. - hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint native-OpenAI detection (paired with deduping the repeated check into a single is_native_openai boolean per branch). Tests: - tests/test_base_url_hostname.py covers the helper directly (path-containing-host, host-suffix, trailing dot, port, case). - tests/hermes_cli/test_determine_api_mode_hostname.py adds the same regression class for determine_api_mode, plus a test that the /anthropic third-party gateway convention still wins. Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.	2026-04-20 22:14:29 -07:00
Tanner Fokkens	cde7283821	fix: forward auth when probing local model metadata Pass the user's configured api_key through local-server detection and context-length probes (detect_local_server_type, _query_local_context_length, query_ollama_num_ctx) and use LM Studio's native /api/v1/models endpoint in fetch_endpoint_model_metadata when a loaded instance is present — so the probed context length is the actual runtime value the user loaded the model at, not just the model's theoretical max. Helps local-LLM users whose auto-detected context length was wrong, causing compression failures and context-overrun crashes.	2026-04-20 20:51:56 -07:00
kshitijk4poor	bc2559c44d	fix: remove codex spark model support Drop gpt-5.3-codex-spark from Codex forward-compat synthesis, provider catalogs, and context metadata now that the API no longer supports it.	2026-04-20 04:51:44 -07:00
Teknium	c6fd2619f7	fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833 ) Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit path (status_code on the exception, Retry-After preserved via error.response) instead of being opaque RuntimeErrors. User sees a one-line capacity message instead of a 500-char JSON dump. Changes - CodeAssistError grows status_code / response / retry_after / details attrs. _extract_status_code in error_classifier picks up status_code and classifies 429 as FailoverReason.rate_limit, so fallback_providers triggers the same way it does for SDK errors. run_agent.py line ~10428 already walks error.response.headers for Retry-After — preserving the response means that path just works. - _gemini_http_error parses the Google error envelope (error.status + error.details[].reason from google.rpc.ErrorInfo, retryDelay from google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404 model-not-found each produce a human-readable message; unknown shapes fall back to the previous raw-body format. - Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and agent/model_metadata.py — Google returned 404 for it today in local repro. Kept gemma-4-31b-it (capacity-constrained but not retired). Validation \| \| Before \| After \| \|---------------------------\|--------------------------------\|-------------------------------------------\| \| Error message \| 'Code Assist returned HTTP 429: {500 chars JSON}' \| 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' \| \| status_code on error \| None (opaque RuntimeError) \| 429 \| \| Classifier reason \| unknown (string-match fallback) \| FailoverReason.rate_limit \| \| Retry-After honored \| ignored \| extracted from RetryInfo or header \| \| gemma-4-26b-it picker \| advertised (404s on Google) \| removed \| Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found, Retry-After header fallback, malformed body, and classifier integration. Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full tests/hermes_cli (2203 tests) green. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 15:34:12 -07:00
Teknium	f362083c64	fix(providers): complete NVIDIA NIM parity with other providers Follow-up on the native NVIDIA NIM provider salvage. The original PR wired PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints required for full parity with other OpenAI-compatible providers (xai, huggingface, deepseek, zai). Gaps closed: - hermes_cli/main.py: - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key provider flow (previously fell through silently). - Add 'nvidia' to `hermes chat --provider` argparse choices so the documented test command (`hermes chat --provider nvidia --model ...`) parses successfully. - hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're auto-added to the subprocess env blocklist. - hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so `hermes doctor` probes https://integrate.api.nvidia.com/v1/models. - hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for `hermes dump` credential masking. - tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses. - agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry so all Nemotron variants get 128K context via substring match (rather than falling back to MINIMUM_CONTEXT_LENGTH). - hermes_cli/models.py: Fix hallucinated model ID 'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b' (verified against live integrate.api.nvidia.com/v1/models catalog). Expand curated list from 5 to 9 agentic models mapping to OpenRouter defaults per provider-guide convention: add qwen3.5-397b-a17b, deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b. - cli-config.yaml.example: Document 'nvidia' provider option. - scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP for CI attribution. E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's endpoint (returns 401 with bogus key instead of argparse error); `hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.	2026-04-17 13:47:46 -07:00
asurla	3b569ff576	feat(providers): add native NVIDIA NIM provider Adds NVIDIA NIM as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, curated models (Nemotron plus other open source models hosted on build.nvidia.com), URL mapping in model_metadata.py, aliases (nim, nvidia-nim, build-nvidia, nemotron), and env var tests. Docs updated: providers page, quickstart table, fallback providers table, and README provider list.	2026-04-17 13:47:46 -07:00
trevthefoolish	0517ac3e93	fix(agent): complete Claude Opus 4.7 API migration Claude Opus 4.7 introduced several breaking API changes that the current codebase partially handled but not completely. This patch finishes the migration per the official migration guide at https://platform.claude.com/docs/en/about-claude/models/migration-guide Fixes NousResearch/hermes-agent#11137 Breaking-change coverage: 1. Adaptive thinking + output_config.effort — 4.7 is now recognized by _supports_adaptive_thinking() (extends previous 4.6-only gate). 2. Sampling parameter stripping — 4.7 returns 400 for any non-default temperature / top_p / top_k. build_anthropic_kwargs drops them as a safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs) and AnthropicCompletionsAdapter.create() both early-exit before setting temperature for 4.7+ models. This keeps flush_memories and structured-JSON aux paths that hardcode temperature from 400ing when the aux model is flipped to 4.7. 3. thinking.display = "summarized" — 4.7 defaults display to "omitted", which silently hides reasoning text from Hermes's CLI activity feed during long tool runs. Restoring "summarized" preserves 4.6 UX. 4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which silently over-efforted every coding/agentic request). max is now a distinct ceiling per Anthropic's 5-level effort model. 5. New stop_reason values — refusal and model_context_window_exceeded were silently collapsed to "stop" (end_turn) by the adapter's stop_reason_map. Now mapped to "content_filter" and "length" respectively, matching upstream finish-reason handling already in bedrock_adapter. 6. Model catalogs — claude-opus-4-7 added to the Anthropic provider list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback catalog (recommended), claude-opus-4-7 added to model_metadata DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide). 7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role prefill (400). 8. Tests — 4 new tests in test_anthropic_adapter covering display default, xhigh preservation, max on 4.7, refusal / context-overflow stop_reason mapping, plus the sampling-param predicate. test_model_metadata accepts 4.7 at 1M context. Tested on macOS 15.5 (darwin). 119 tests pass in tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.	2026-04-16 10:48:20 -07:00
kshitijk4poor	1b61ec470b	feat: add Ollama Cloud as built-in provider Add ollama-cloud as a first-class provider with full parity to existing API-key providers (gemini, zai, minimax, etc.): - PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var - Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud - models.dev integration for accurate context lengths - URL-to-provider mapping (ollama.com -> ollama-cloud) - Passthrough model normalization (preserves Ollama model:tag format) - Default auxiliary model (nemotron-3-nano:30b) - HermesOverlay in providers.py - CLI --provider choices, CANONICAL_PROVIDERS entry - Dynamic model discovery with disk caching (1hr TTL) - 37 provider-specific tests Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926	2026-04-16 02:22:09 -07:00
JiaDe WU	0cb8c51fa5	feat: native AWS Bedrock provider via Converse API Salvaged from PR #7920 by JiaDe-Wu — cherry-picked Bedrock-specific additions onto current main, skipping stale-branch reverts (293 commits behind). Dual-path architecture: - Claude models → AnthropicBedrock SDK (prompt caching, thinking budgets) - Non-Claude models → Converse API via boto3 (Nova, DeepSeek, Llama, Mistral) Includes: - Core adapter (agent/bedrock_adapter.py, 1098 lines) - Full provider registration (auth, models, providers, config, runtime, main) - IAM credential chain + Bedrock API Key auth modes - Dynamic model discovery via ListFoundationModels + ListInferenceProfiles - Streaming with delta callbacks, error classification, guardrails - hermes doctor + hermes auth integration - /usage pricing for 7 Bedrock models - 130 automated tests (79 unit + 28 integration + follow-up fixes) - Documentation (website/docs/guides/aws-bedrock.md) - boto3 optional dependency (pip install hermes-agent[bedrock]) Co-authored-by: JiaDe WU <40445668+JiaDe-Wu@users.noreply.github.com>	2026-04-15 16:17:17 -07:00
Julien Talbot	3b50821555	feat(xai): add xAI/Grok to provider prefix stripping Add 'xai', 'x-ai', 'x.ai', 'grok' to _PROVIDER_PREFIXES so that colon-prefixed model names (e.g. xai:grok-4.20) are stripped correctly for context length lookups. Cherry-picked from PR #9184 by @Julientalbot.	2026-04-14 16:43:42 -07:00
Teknium	943c01536f	feat: add openrouter/elephant-alpha to curated model lists (#9378 ) * Add hermes debug share instructions to all issue templates - bug_report.yml: Add required Debug Report section with hermes debug share and /debug instructions, make OS/Python/Hermes version optional (covered by debug report), demote old logs field to optional supplementary - setup_help.yml: Replace hermes doctor reference with hermes debug share, add Debug Report section with fallback chain (debug share -> --local -> doctor) - feature_request.yml: Add optional Debug Report section for environment context All templates now guide users to run hermes debug share (or /debug in chat) and paste the resulting paste.rs links, giving maintainers system info, config, and recent logs in one step. * feat: add openrouter/elephant-alpha to curated model lists - Add to OPENROUTER_MODELS (free, positioned above GPT models) - Add to _PROVIDER_MODELS["nous"] mirror list - Add 256K context window fallback in model_metadata.py	2026-04-13 21:16:14 -07:00
Teknium	d15efc9c1b	fix: correct GPT-5 family context lengths in fallback defaults (#9309 ) The generic 'gpt-5' fallback was set to 128,000 — which is the max OUTPUT tokens, not the context window. GPT-5 base and most variants (codex, mini) have 400,000 context. This caused /model to report 128k for models like gpt-5.3-codex when models.dev was unavailable. Added specific entries for GPT-5 variants with different context sizes: - gpt-5.4, gpt-5.4-pro: 1,050,000 (1.05M) - gpt-5.4-mini, gpt-5.4-nano: 400,000 - gpt-5.3-codex-spark: 128,000 (reduced) - gpt-5.1-chat: 128,000 (chat variant) - gpt-5 (catch-all): 400,000 Sources: https://developers.openai.com/api/docs/models	2026-04-13 19:22:23 -07:00
arthurbr11	0a4cf5b3e1	feat(providers): add Arcee AI as direct API provider Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini. Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, setup.py, trajectory_compressor.py. Based on PR #9274 by arthurbr11, simplified to a standard direct provider without dual-endpoint OpenRouter routing.	2026-04-13 18:40:06 -07:00
Teknium	8d023e43ed	refactor: remove dead code — 1,784 lines across 77 files (#9180 ) Deep scan with vulture, pyflakes, and manual cross-referencing identified: - 41 dead functions/methods (zero callers in production) - 7 production-dead functions (only test callers, tests deleted) - 5 dead constants/variables - ~35 unused imports across agent/, hermes_cli/, tools/, gateway/ Categories of dead code removed: - Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection, rebuild_lookups, clear_session_context, get_logs_dir, clear_session - Unused API surface: search_models_dev, get_pricing, skills_categories, get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list - Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob - Stale debug helpers: get_debug_session_info copies in 4 tool files (centralized version in debug_helpers.py already exists) - Dead gateway methods: send_emote, send_notice (matrix), send_reaction (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history (matrix), _start_typing_indicator (signal), parse_feishu_post_content - Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION, FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR - Unused UI code: _interactive_provider_selection, _interactive_model_selection (superseded by prompt_toolkit picker) Test suite verified: 609 tests covering affected files all pass. Tests for removed functions deleted. Tests using removed utilities (clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.	2026-04-13 16:32:04 -07:00
hcshen0111	2b3aa36242	feat(providers): add kimi-coding-cn provider for mainland China users Cherry-picked from PR #7637 by hcshen0111. Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var and api.moonshot.cn/v1 endpoint for China-region Moonshot users.	2026-04-13 11:20:37 -07:00
ismell0992-afk	3e99964789	fix(agent): prefer Ollama Modelfile num_ctx over GGUF training max _query_local_context_length was checking model_info.context_length (the GGUF training max) before num_ctx (the Modelfile runtime override), inverse to query_ollama_num_ctx. The two helpers therefore disagreed on the same model: hermes-brain:qwen3-14b-ctx32k # Modelfile: num_ctx 32768 underlying qwen3:14b GGUF # qwen3.context_length: 40960 query_ollama_num_ctx correctly returned 32768 (the value Ollama will actually allocate KV cache for). _query_local_context_length returned 40960, which let ContextCompressor grow conversations past 32768 before triggering compression — at which point Ollama silently truncated the prefix, corrupting context. Swap the order so num_ctx is checked first, matching query_ollama_num_ctx. Adds a parametrized test that seeds both values and asserts num_ctx wins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 04:24:07 -07:00
Teknium	5c2ecdec49	fix: use ceiling division for token estimation, deduplicate inline formula Switch estimate_tokens_rough(), estimate_messages_tokens_rough(), and estimate_request_tokens_rough() from floor division (len // 4) to ceiling division ((len + 3) // 4). Short texts (1-3 chars) previously estimated as 0 tokens, causing the compressor and pre-flight checks to systematically undercount when many short tool results are present. Also replaced the inline duplicate formula in run_conversation() (total_chars // 4) with a call to the shared estimate_messages_tokens_rough() function. Updated 4 tests that hardcoded floor-division expected values. Related: issue #6217, PR #6629	2026-04-11 16:33:40 -07:00
Teknium	c8aff74632	fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking Three root causes of the 'agent stops mid-task' gateway bug: 1. Compression threshold floor (64K tokens minimum) - The 50% threshold on a 100K-context model fired at 50K tokens, causing premature compression that made models lose track of multi-step plans. Now threshold_tokens = max(50% * context, 64K). - Models with <64K context are rejected at startup with a clear error. 2. Budget warning removal — grace call instead - Removed the 70%/90% iteration budget warnings entirely. These injected '[BUDGET WARNING: Provide your final response NOW]' into tool results, causing models to abandon complex tasks prematurely. - Now: no warnings during normal execution. When the budget is actually exhausted (90/90), inject a user message asking the model to summarise, allow one grace API call, and only then fall back to _handle_max_iterations. 3. Activity touches during long terminal execution - _wait_for_process polls every 0.2s but never reported activity. The gateway's inactivity timeout (default 1800s) would fire during long-running commands that appeared 'idle.' - Now: thread-local activity callback fires every 10s during the poll loop, keeping the gateway's activity tracker alive. - Agent wires _touch_activity into the callback before each tool call. Also: docs update noting 64K minimum context requirement. Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).	2026-04-11 16:18:57 -07:00
Teknium	8c3935ebe8	fix: is_local_endpoint misses Docker/Podman DNS names (#7950 ) * fix(tools): neutralize shell injection in _write_to_sandbox via path quoting _write_to_sandbox interpolated storage_dir and remote_path directly into a shell command passed to env.execute(). Paths containing shell metacharacters (spaces, semicolons, $(), backticks) could trigger arbitrary command execution inside the sandbox. Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric + slashes/hyphens/dots) are left unmodified by shlex.quote, so existing behavior is unchanged. Paths with unsafe characters get single-quoted. Tests added for spaces, $(command) substitution, and semicolon injection. * fix: is_local_endpoint misses Docker/Podman DNS names host.docker.internal, host.containers.internal, gateway.docker.internal, and host.lima.internal are well-known DNS names that container runtimes use to resolve the host machine. Users running Ollama on the host with the agent in Docker/Podman hit the default 120s stream timeout instead of the bumped 1800s because these hostnames weren't recognized as local. Add _CONTAINER_LOCAL_SUFFIXES tuple and suffix check in is_local_endpoint(). Tests cover all three runtime families plus a negative case for domains that merely contain the suffix as a substring.	2026-04-11 14:46:18 -07:00
Teknium	d4bb44d4b9	docs: add Xiaomi MiMo to all provider docs + fix MiMo-V2-Flash ctx len - environment-variables.md: XIAOMI_API_KEY, XIAOMI_BASE_URL, provider list - cli-commands.md: --provider choices - integrations/providers.md: provider table, Chinese providers section, config example, base URL list, choosing table, fallback providers list - fallback-providers.md: supported providers table, auto-detection chain - Fix XiaomiMiMo/MiMo-V2-Flash context length 32768 → 256000 (OpenRouter entry)	2026-04-11 11:17:52 -07:00
kshitijk4poor	6693e2a497	feat(xiaomi): add Xiaomi MiMo as first-class provider Cherry-picked from PR #7702 by kshitijk4poor. Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models: - mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest) Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example. Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's main model. Non-vision aux uses the user's selected model. Added _PROVIDER_VISION_MODELS dict for provider-specific vision model overrides. On failure, falls back to aggregators (gemini flash) via existing fallback chain. Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000, mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000. 36 tests covering registry, aliases, auto-detect, credentials, models.dev, normalization, URL mapping, providers module, doctor, aux client, vision model override, and agent init.	2026-04-11 11:17:52 -07:00
kshitijk4poor	af9caec44f	fix(qwen): correct context lengths for qwen3-coder models and send max_tokens to portal Based on PR #7285 by @kshitijk4poor. Two bugs affecting Qwen OAuth users: 1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M. Added specific entries before the generic qwen catch-all: - qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per official Alibaba Cloud docs and OpenRouter) - qwen3-coder: 262,144 2. Random stopping — max_tokens was suppressed for Qwen Portal, so the server applied its own low default. Reasoning models exhaust that on thinking tokens. Now: honor explicit max_tokens, default to 65536 when unset. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-11 03:29:31 -07:00
kshitijk4poor	d442f25a2f	fix: align MiniMax provider with official API docs Aligns MiniMax provider with official API documentation. Fixes 6 bugs: transport mismatch (openai_chat -> anthropic_messages), credential leak in switch_model(), prompt caching sent to non-Anthropic endpoints, dot-to-hyphen model name corruption, trajectory compressor URL routing, and stale doctor health check. Also corrects context window (204,800), thinking support (manual mode), max output (131,072), and model catalog (M2 family only on /anthropic). Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-04-11 01:04:41 -07:00
Julien Talbot	8bcb8b8e87	feat(providers): add native xAI provider Adds xAI as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, 11 curated Grok models, URL mapping in model_metadata.py, aliases (x-ai, x.ai), and env var tests. Uses standard OpenAI-compatible chat completions. Closes #7050	2026-04-10 13:40:38 -07:00
Julien Talbot	b577697189	fix(model_metadata): add xAI Grok context length fallbacks xAI /v1/models does not return context_length metadata, so Hermes probes down to the 128k default whenever a user configures a custom provider pointing at https://api.x.ai/v1. This forces every xAI user to manually override model.context_length in config.yaml (2M for Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context window. Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the fallback lookup returns the correct value via substring matching. Values sourced from models.dev (2026-04) and cross-checked against the xAI /v1/models listing: - grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent) - grok-4-1-fast-* 2,000,000 - grok-4-fast-* 2,000,000 - grok-4 / grok-4-0709 256,000 - grok-code-fast-1 256,000 - grok-3* 131,072 - grok-2 / latest 131,072 - grok-2-vision* 8,192 - grok (catch-all) 131,072 Keys are ordered longest-first so that specific variants match before the catch-all, consistent with the existing Claude/Gemma/MiniMax entries. Add TestDefaultContextLengths.test_grok_models_context_lengths and test_grok_substring_matching to pin the values and verify the full lookup path. All 77 tests in test_model_metadata.py pass.	2026-04-10 03:04:19 -07:00
KUSH42	34d06a9802	fix(compaction): don't halve context_length on output-cap-too-large errors When the API returns "max_tokens too large given prompt" (input tokens are within the context window, but input + requested output > window), the old code incorrectly routed through the same handler as "prompt too long" errors, calling get_next_probe_tier() and permanently halving context_length. This made things worse: the window was fine, only the requested output size needed trimming for that one call. Two distinct error classes now handled separately: Prompt too long — input itself exceeds context window. Fix: compress history + halve context_length (existing behaviour, unchanged). Output cap too large — input OK, but input + max_tokens > window. Fix: parse available_tokens from the error message, set a one-shot _ephemeral_max_output_tokens override for the retry, and leave context_length completely untouched. Changes: - agent/model_metadata.py: add parse_available_output_tokens_from_error() that detects Anthropic's "available_tokens: N" error format and returns the available output budget, or None for all other error types. - run_agent.py: call the new parser first in the is_context_length_error block; if it fires, set _ephemeral_max_output_tokens (with a 64-token safety margin) and break to retry without touching context_length. _build_api_kwargs consumes the ephemeral value exactly once then clears it so subsequent calls use self.max_tokens normally. - agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to clearly document the max_tokens (output cap) vs context_length (total window) distinction, which is a persistent source of confusion due to the OpenAI-inherited "max_tokens" name. - cli-config.yaml.example: add inline comments explaining both keys side by side where users are most likely to look. - website/docs/integrations/providers.md: add a callout box at the top of "Context Length Detection" and clarify the troubleshooting entry. - tests/test_ctx_halving_fix.py: 24 tests across four classes covering the parser, build_anthropic_kwargs clamping, ephemeral one-shot consumption, and the invariant that context_length is never mutated on output-cap errors.	2026-04-09 11:27:41 -07:00
Hunter B	894e8c8a8f	fix: resolve opencode.ai context window to 1M and clean up display formatting Two issues resolved: 1. Add opencode.ai to _URL_TO_PROVIDER mapping so base_url routes through models.dev lookup (which has mimo-v2-pro at 1M context) instead of falling back to probing /models (404) and defaulting to 128K. 2. Fix _format_context_length to round cleanly: 1048576 → '1M' instead of '1.048576M'. Applies same rounding logic to K values.	2026-04-09 01:43:22 -07:00
Teknium	7156f8d866	fix: CI test failures — metadata key, cli console, docker env, vision order (#6294 ) Fixes 9 test failures on current main, incorporating ideas from PR stack #6219-#6222 by xinbenlv with corrections: - model_metadata: sync HF context length key casing (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5) - cli.py: route quick command error output through self.console instead of creating a new ChatConsole() instance - docker.py: explicit docker_forward_env entries now bypass the Hermes secret blocklist (intentional opt-in wins over generic filter) - auxiliary_client: revert _read_main_provider() to simple provider.strip().lower() — the _normalize_aux_provider() call introduced in `5c03f2e7` stripped the custom: prefix, breaking named custom provider resolution - auxiliary_client: flip vision auto-detection order to active provider → OpenRouter → Nous → stop (was OR → Nous → active) - test: update vision priority test to match new order Based on PR #6219-#6222 by xinbenlv.	2026-04-08 16:37:05 -07:00
kshitijk4poor	3377017eb4	feat(qwen): add Qwen OAuth provider with portal request support Based on #6079 by @tunamitom with critical fixes and comprehensive tests. Changes from #6079: - Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex field sanitization, not before (was silently discarding Qwen transforms) - Fix: missing try/except AuthError in runtime_provider.py — stale Qwen credentials now fall through to next provider on auto-detect - Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba' (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider - Fix: hardcoded ['coder-model'] replaced with live API fetch + curated fallback list (qwen3-coder-plus, qwen3-coder) - Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace 5 inline 'portal.qwen.ai' string checks and share headers between init and credential swap - Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session credential swaps - Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice - Fix: handle bare string items in content lists (were silently dropped) - Fix: remove redundant dict() copies after deepcopy in message prep - Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion New tests (30 test functions): - _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths) - _save_qwen_cli_tokens (roundtrip, parent creation, permissions) - _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew, None, non-numeric) - _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths, default expires_in, disk persistence) - resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh, missing token, env override) - get_qwen_auth_status (logged in, not logged in) - Runtime provider resolution (direct, pool entry, alias) - _build_api_kwargs (metadata, vl_high_resolution_images, message formatting, max_tokens suppression)	2026-04-08 13:46:30 -07:00
kshitij	22d1bda185	fix(minimax): correct context lengths, model catalog, thinking guard, aux model, and config base_url Cherry-picked from PR #6046 by kshitijk4poor with dead code stripped. - Context lengths: 204800 → 1M (M1) / 1048576 (M2.5/M2.7) per official docs - Model catalog: add M1 family, remove deprecated M2.1 and highspeed variants - Thinking guard: skip extended thinking for MiniMax (Anthropic-compat endpoint) - Aux model: MiniMax-M2.7-highspeed → MiniMax-M2.7 (same model, half price) - Config base_url: honour model.base_url for API-key providers (fixes China users) - Stripped unused get_minimax_max_output() / _MINIMAX_MAX_OUTPUT (no consumer) Fixes #5777, #4082, #6039. Closes #3895.	2026-04-08 02:20:46 -07:00
Teknium	5c03f2e7cc	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 ) Salvaged fixes from community PRs: - fix(model_switch): _read_auth_store → _load_auth_store + fix auth store key lookup (was checking top-level dict instead of store['providers']). OAuth providers now correctly detected in /model picker. Cherry-picked from PR #5911 by Xule Lin (linxule). - fix(ollama): pass num_ctx to override 2048 default context window. Ollama defaults to 2048 context regardless of model capabilities. Now auto-detects from /api/show metadata and injects num_ctx into every request. Config override via model.ollama_num_ctx. Fixes #2708. Cherry-picked from PR #5929 by kshitij (kshitijk4poor). - fix(aux): normalize provider aliases for vision/auxiliary routing. Adds _normalize_aux_provider() with 17 aliases (google→gemini, claude→anthropic, glm→zai, etc). Fixes vision routing failure when provider is set to 'google' instead of 'gemini'. Cherry-picked from PR #5793 by e11i (Elizabeth1979). - fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK. MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API), but auxiliary client uses OpenAI SDK which appends /chat/completions → 404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint. Inspired by PR #5786 by Lempkey. Added debug logging to silent exception blocks across all fixes. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-07 22:23:28 -07:00
Teknium	187e90e425	refactor: replace inline HERMES_HOME re-implementations with get_hermes_home() 16 callsites across 14 files were re-deriving the hermes home path via os.environ.get('HERMES_HOME', ...) instead of using the canonical get_hermes_home() from hermes_constants. This breaks profiles — each profile has its own HERMES_HOME, and the inline fallback defaults to ~/.hermes regardless. Fixed by importing and calling get_hermes_home() at each site. For files already inside the hermes process (agent/, hermes_cli/, tools/, gateway/, plugins/), this is always safe. Files that run outside the process context (mcp_serve.py, mcp_oauth.py) already had correct try/except ImportError fallbacks and were left alone. Skipped: hermes_constants.py (IS the implementation), env_loader.py (bootstrap), profiles.py (intentionally manipulates the env var), standalone scripts (optional-skills/, skills/), and tests.	2026-04-07 10:40:34 -07:00
Teknium	cc7136b1ac	fix: update Gemini model catalog + wire models.dev as live model source Follow-up for salvaged PR #5494: - Update model catalog to Gemini 3.x + Gemma 4 (drop deprecated 2.0) - Add list_agentic_models() to models_dev.py with noise filter - Wire models.dev into _model_flow_api_key_provider as primary source (static curated list serves as offline fallback) - Add gemini -> google mapping in PROVIDER_TO_MODELS_DEV - Fix Gemma 4 context lengths to 256K (models.dev values) - Update auxiliary model to gemini-3-flash-preview - Expand tests: 3.x catalog, context lengths, models.dev integration	2026-04-06 10:28:03 -07:00
Teknium	6dfab35501	feat(providers): add Google AI Studio (Gemini) as a first-class provider Cherry-picked from PR #5494 by kshitijk4poor. Adds native Gemini support via Google's OpenAI-compatible endpoint. Zero new dependencies.	2026-04-06 10:28:03 -07:00

1 2

87 commits