hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

Author	SHA1	Message	Date
rob-maron	2863e9484a	Use nous portal as model metadata authority (#24502 ) * nous portal metadata resolver * minor fixes	2026-05-12 11:59:31 -07:00
rob-maron	32abe742fa	fix comment	2026-05-11 21:30:29 -07:00
rob-maron	f0c2964f0b	remove comments	2026-05-11 21:30:29 -07:00
rob-maron	057fc7b073	fix guard	2026-05-11 21:30:29 -07:00
rob-maron	528bba6734	fix kimi	2026-05-11 21:30:29 -07:00
nicoechaniz	e2b713cced	fix(model-metadata): skip OpenRouter for known providers, add kimi/moonshot to PROVIDER_TO_MODELS_DEV Based on PR #23950 by @nicoechaniz. - Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding - Gate OpenRouter metadata step behind "if not effective_provider": known providers should not be overridden by community-maintained OR data - Keep the targeted Kimi-family 32k guard as a secondary safety net inside the OR gate (for unknown providers with Kimi models) Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>	2026-05-11 13:16:07 -07:00
kshitijk4poor	91eef6255e	fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K, tripping the 64K minimum-context guard and preventing use of the model on Ollama Cloud and Kimi Coding / Moonshot providers. Three fixes in the context-length resolution chain: 1. Ollama Cloud native /api/show query: new _query_ollama_api_show() queries the Ollama native API for authoritative GGUF model_info context_length. For hosted Ollama, prefers model_info over num_ctx since users can't set their own num_ctx on Cloud. Added at step 5e in get_model_context_length(), before the models.dev fallback. 2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context() now also tries appending :cloud and -cloud suffixes when the bare model name doesn't match. models.dev stores 'kimi-k2.6:cloud' but users and the live API use bare 'kimi-k2.6'. 3. Kimi-family 32K guard: after the OpenRouter metadata step, reject exactly 32768 for Kimi-named models (kimi-, moonshot) and fall through to hardcoded defaults ('kimi': 262144). OpenRouter reports 32768 for moonshotai/kimi-k2.6 but the model actually supports 262K. Narrow filter — only 32768, only Kimi-family — becomes dead code when OpenRouter updates its metadata. ---	2026-05-11 13:16:07 -07:00
kshitij	2ec8d2b42f	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 ) Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero).	2026-05-11 11:13:25 -07:00
Teknium	d6e1fadbf5	fix(xai): omit reasoning.effort for grok models that reject it (#23435 ) xAI's Responses API returns HTTP 400 ("Model X does not support parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-, grok-4-1-fast-, grok-3, grok-4.20-0309-, and grok-code-fast-1 — even though those models reason natively. Hermes was unconditionally sending `reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking direct `--provider xai` for the entire grok-4 line. Add a substring allowlist predicate (verified live against api.x.ai 2026-05-10) covering the only Grok families that accept the effort dial: grok-3-mini, grok-4.20-multi-agent, grok-4.3. The Responses transport omits the `reasoning` key entirely for everything else while still including `reasoning.encrypted_content` so we capture native reasoning tokens. Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709` went from HTTP 400 to a successful reply.	2026-05-10 15:21:30 -07:00
kshitijk4poor	44cdf555a8	fix(codex-spark): defensive 128k entry in DEFAULT_CONTEXT_LENGTHS + clarify validation test docstring Two follow-ups from self-review: 1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The primary resolution path for Spark goes through provider='openai-codex' → _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future code path resolves Spark's context with a different provider (custom proxy, generic fallthrough), the longest-substring-first lookup in step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x. Adding the explicit override is a cheap defensive correctness fix matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic gpt-5 entry. 2. Update test_openai_codex_model_validation_fallback.py docstring. The bug it was originally written for (gpt-5.3-codex-spark missing from listing) is now resolved by this PR's catalog restoration. The test still validly exercises the soft-accept code path for any future entitlement-gated Codex slug that ships before Hermes catalogs it, but the framing was stale — clarified.	2026-05-09 23:17:25 -07:00
kshitij	9ee9a4297d	docs(codex-spark): document ChatGPT Pro entitlement gating PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed via the Codex OAuth backend at chatgpt.com/backend-api/codex/models — not via the public OpenAI API. Add explanatory comments in: - DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py) - _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py) - list_authenticated_providers' live-discovery branch (model_switch.py) so future maintainers don't strip the entry again. Also documents the intentional asymmetry that Spark stays out of the "openai" provider catalog (it isn't on the public API) and why the supported_in_api filter is not applied for the openai-codex route.	2026-05-09 23:17:25 -07:00
olegdater	c6dc295a35	fix(model-metadata): set codex-spark fallback context to 128k	2026-05-09 23:17:25 -07:00
olegdater	2a6f3deb50	fix(model-metadata): restore gpt-5.3-codex-spark fallback context	2026-05-09 23:17:25 -07:00
Teknium	1c9ffb177c	fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805 ) Two co-located fixes: 1. agent/model_metadata.py: bump hy3-preview static fallback from 256000 to 262144 (256 * 1024) to match OpenRouter live metadata so cache and offline both agree (issue #22268). 2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the exact-value change-detector (assert ctx == 256000) with an invariant assertion (registered + >= 4096). Per AGENTS.md 'Don't write change-detector tests': pinning the upstream-controlled context length is exactly the test class the rule forbids — it breaks every time the provider bumps the published value, with zero behavioral coverage gained. Salvage of #22574 with a redirect on the test approach. The contributor's diff bumped the integer and added a SECOND change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144, which would re-break on the next published bump. We instead delete the change-detector entirely and assert the relationship. Closes #22268.	2026-05-09 13:37:19 -07:00
Teknium	cbce5e93fc	codebase: add encoding='utf-8' to all bare open() calls (PLW1514) Closes the last Python-on-Windows UTF-8 exposure by making every text-mode open() call explicit about its encoding. Before: on Windows, bare open(path, 'r') defaults to the system locale encoding (cp1252 on US-locale installs). That means reading any config/yaml/markdown/json file with non-ASCII content either crashes with UnicodeDecodeError or silently mis-decodes bytes. After: all 89 affected call sites in production code now pass encoding='utf-8' explicitly. Works identically on every platform and every locale, no surprise behavior. Mechanical sweep via: ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' . All 89 fixes have the same shape: open(x) or open(x, mode) became open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing else changed. Every modified file still parses and the Windows/sandbox test suite is still green (85 passed, 14 skipped, 0 failed across tests/tools/test_code_execution_windows_env.py + tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py + tests/test_hermes_bootstrap.py). Scope notes: - tests/ excluded: test fixtures can use locale encoding intentionally (exercising edge cases). If we want to tighten tests later that's a separate PR. - plugins/ excluded: plugin-specific conventions may differ; plugin authors own their code. - optional-skills/ and skills/ excluded: skill scripts are user-authored and we don't want to mass-edit them. - website/ and tinker-atropos/ excluded: vendored / generated content. 46 files touched, 89 +/- lines (symmetric replacement). No behavior change on POSIX or on Windows when the file is ASCII; bug fix on Windows when the file contains non-ASCII.	2026-05-08 14:27:40 -07:00
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
kshitijk4poor	20a4f79ed1	feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path Introduces providers/ package — single source of truth for every inference provider. Adding a simple api-key provider now requires one providers/<name>.py file with zero edits anywhere else. What this PR ships: - providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes) - ProviderProfile declarative fields: name, api_mode, aliases, display_name, env_vars, base_url, models_url, auth_type, fallback_models, hostname, default_headers, fixed_temperature, default_max_tokens, default_aux_model - 4 overridable hooks: prepare_messages, build_extra_body, build_api_kwargs_extras, fetch_models - chat_completions.build_kwargs: profile path via _build_kwargs_from_profile, legacy flag path retained for lmstudio/tencent-tokenhub (which have session-aware reasoning probing that doesn't map cleanly to hooks yet) - run_agent.py: profile path for all registered providers; legacy path variable scoping fixed (all flags defined before branching) - Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS, doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER - GeminiProfile: thinking_config translation (native + openai-compat nested) - New tests/providers/ (79 tests covering profile declarations, transport parity, hook overrides, e2e kwargs assembly) Deltas vs original PR (salvaged onto current main): - Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth (were added to main since original PR) - Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their reasoning_effort probing has no clean hook equivalent yet) - Removed lmstudio alias from custom profile (it's a separate provider now) - Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension (resolve_provider special-cases them; adding breaks runtime resolution) - runtime_provider: profile.api_mode only as fallback when URL detection finds nothing (was breaking minimax /v1 override) - Preserved main's legacy-path improvements: deepseek reasoning_content preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M beta recovery, etc. - Kept agent/copilot_acp_client.py in place (rejected PR's relocation — main has 7 fixes landed since; relocation would revert them) - _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing test imports Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Closes #14418	2026-05-05 13:40:01 -07:00
Rob Moen	0dd373ec43	fix(context): honor model.context_length for Ollama num_ctx and all display paths When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests).	2026-04-30 04:31:23 -07:00
Adam Manning	0b2f1bb27b	feat(agent): wire MiniMax-M2.7 for minimax-oauth provider Wire MiniMax-M2.7 and MiniMax-M2.7-highspeed into the model catalog, CLI model picker, and agent auxiliary/metadata subsystems. Changes: - hermes_cli/models.py: - Add 'minimax-oauth' to _PROVIDER_MODELS with MiniMax-M2.7 and MiniMax-M2.7-highspeed - Add ProviderEntry('minimax-oauth', 'MiniMax (OAuth)', ...) to CANONICAL_PROVIDERS near existing minimax entries - Add aliases: minimax-portal, minimax-global, minimax_oauth in _PROVIDER_ALIASES - hermes_cli/main.py: - Add 'minimax-oauth' to provider_labels dict - Insert 'minimax-oauth' into providers list in select_provider_and_model() near the other minimax entries - Add 'minimax-oauth' to --provider argparse choices - Add _model_flow_minimax_oauth() function: ensures login via _login_minimax_oauth(), resolves runtime credentials, prompts for model selection, saves model choice and config - Add dispatch elif branch for selected_provider == 'minimax-oauth' - agent/auxiliary_client.py: - Add 'minimax-oauth': 'MiniMax-M2.7-highspeed' to _API_KEY_PROVIDER_AUX_MODELS - Add 'minimax-oauth' to _ANTHROPIC_COMPAT_PROVIDERS set - agent/model_metadata.py: - Add 'minimax-oauth' to _PROVIDER_PREFIXES frozenset - MiniMax-M2.7 context length (200_000) already covered by the existing 'minimax' substring match in DEFAULT_CONTEXT_LENGTHS	2026-04-29 09:53:42 -07:00
Rugved Somwanshi	01ad0aacaf	fix(tui): show correct context length	2026-04-28 12:27:36 -07:00
Rugved Somwanshi	214ca943ac	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
simonweng	a6a6cf047d	feat(providers): add tencent-tokenhub provider support Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a new API-key provider with model tencent/hy3-preview (256K context). - PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars - Aliases: tencent, tokenhub, tencent-cloud, tencentmaas - openai_chat transport with is_tokenhub branch for top-level reasoning_effort (Hy3 is a reasoning model) - tencent/hy3-preview:free added to OpenRouter curated list - 60+ tests (provider registry, aliases, runtime resolution, credentials, model catalog, URL mapping, context length) - Docs: integrations/providers.md, environment-variables.md, model-catalog.json Author: simonweng <simonweng@tencent.com> Salvaged from PR #16860 onto current main (resolved conflicts with #16935 Azure Anthropic env-var hint tests and the --provider choices= list removal in chat_parser).	2026-04-28 03:45:52 -07:00
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
Isaac Huang	c53fcb0173	feat(providers): add GMI Cloud as a first-class API-key provider (#11955 ) Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>	2026-04-27 11:17:59 -07:00
Teknium	438db0c7b0	fix(cli): /model picker honors provider-specific context caps (#16030 ) `_apply_model_switch_result` (the interactive `/model` picker's confirmation path) printed `ModelInfo.context_window` straight from models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker showed 1M while the runtime (compressor, gateway `/model`, typed `/model <name>`) correctly used 272K — the classic 'sometimes 1M, sometimes 272K' mismatch on a single model. Both display paths now go through `resolve_display_context_length()`, matching the fix that `_handle_model_switch` received earlier. Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS (`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the 272K Codex cap is already enforced via the Codex-OAuth branch, so the fallback now reflects what every non-Codex probe-miss should see. Tests: adds `test_apply_model_switch_result_context.py` with three scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls back to ModelInfo). Updates the existing non-Codex fallback test to assert 1.05M (the correct value). ## Validation \| path \| before \| after \| \|-------------------------------\|-----------\|-----------\| \| picker -> gpt-5.5 on Codex \| 1,050,000 \| 272,000 \| \| picker -> gpt-5.5 on OpenAI \| 1,050,000 \| 1,050,000 \| \| picker -> gpt-5.5 on OpenRouter \| 1,050,000 \| 1,050,000 \| \| typed /model gpt-5.5 on Codex \| 272,000 \| 272,000 \|	2026-04-26 05:43:31 -07:00
zkl	2ccdadcca6	fix(deepseek): bump V4 family context window to 1M tokens #14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native provider but the context-window lookup still falls back to the existing "deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context window, so any caller relying on get_model_context_length() for pre-flight token budgeting (compression, context warnings) under-counts by ~8x. Add explicit lowercase entries for the four DeepSeek model ids that ship 1M context: - deepseek-v4-pro - deepseek-v4-flash - deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking) - deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking) Longest-key-first substring matching means these explicit entries also cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter and Nous Portal) without regressing the existing 128K fallback for older / unknown DeepSeek model ids on custom endpoints. Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing	2026-04-26 05:32:54 -07:00
Teknium	125de02056	fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844 ) Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback. ## What changed New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching. `agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe). Wired through five call sites that previously either duplicated the lookup or ignored it entirely: - `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning) - `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch - `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg - `gateway/run.py` /model confirmation (picker callback + text path) - `gateway/run.py` `_format_session_info` (/info) ## Context probe tiers `CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency. ## Repro (from #15779) ```yaml custom_providers: - name: my-custom-endpoint base_url: https://example.invalid/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1050000 ``` `/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000". ## Tests - `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants - `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver - `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K) - `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier	2026-04-25 18:47:53 -07:00
Andre Kurait	b290297d66	fix(bedrock): resolve context length via static table before custom-endpoint probe ## Problem `get_model_context_length()` in `agent/model_metadata.py` had a resolution order bug that caused every Bedrock model to fall back to the 128K default context length instead of reaching the static Bedrock table (200K for Claude, etc.). The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in `_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False. The resolution order then ran the custom-endpoint probe (step 2) before the Bedrock branch (step 4b), which: 1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`). 2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the bedrock-runtime URL (Bedrock doesn't serve this shape). 3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the "probe-down" branch — never reaching the Bedrock static table. Result: users on Bedrock saw 128K context for Claude models that actually support 200K on Bedrock, causing premature auto-compression. ## Fix Promote the Bedrock branch from step 4b to step 1b, so it runs before the custom-endpoint probe at step 2. The static table in `bedrock_adapter.py::get_bedrock_context_length()` is the authoritative source for Bedrock (the ListFoundationModels API doesn't expose context window sizes), so there's no reason to probe `/models` first. The original step 4b is replaced with a one-line breadcrumb comment pointing to the new location, to make the resolution-order docstring accurate. ## Changes - `agent/model_metadata.py` - Add step 1b: Bedrock static-table branch (unchanged predicate, moved). - Remove dead step 4b block, replace with breadcrumb comment. - Update resolution-order docstring to include step 1b. - `tests/agent/test_model_metadata.py` - New `TestBedrockContextResolution` class (3 tests): - `test_bedrock_provider_returns_static_table_before_probe`: confirms `provider="bedrock"` hits the static table and does NOT call `fetch_endpoint_model_metadata` (regression guard). - `test_bedrock_url_without_provider_hint`: confirms the `bedrock-runtime.*.amazonaws.com` host match works without an explicit `provider=` hint. - `test_non_bedrock_url_still_probes`: confirms the probe still fires for genuinely-custom endpoints (no over-reach). ## Testing pytest tests/agent/test_model_metadata.py -q # 83 passed in 1.95s (3 new + 80 existing) ## Risk Very low. - Predicate is identical to the original step 4b — no behaviour change for non-Bedrock paths. - Original step 4b was dead code for the user-facing case (always hit the 128K fallback first), so removing it cannot regress behaviour. - Bedrock path now short-circuits before any network I/O — faster too. - `ImportError` fall-through preserved so users without `boto3` installed are unaffected. ## Related - This is a prerequisite for accurate context-window accounting on Bedrock — the fix for #14710 (stale-connection client eviction) depends on correct context sizing to know when to compress. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
NiuNiu Xia	76329196c1	fix(copilot): wire live /models max_prompt_tokens into context-window resolver The Copilot provider resolved context windows via models.dev static data, which does not include account-specific models (e.g. claude-opus-4.6-1m with 1M context). This adds the live Copilot /models API as a higher- priority source for copilot/copilot-acp/github-copilot providers. New helper get_copilot_model_context() in hermes_cli/models.py extracts capabilities.limits.max_prompt_tokens from the cached catalog. Results are cached in-process for 1 hour. In agent/model_metadata.py, step 5a queries the live API before falling through to models.dev (step 5b). This ensures account-specific models get correct context windows while standard models still have a fallback. Part 1 of #7731. Refs: #7272	2026-04-24 05:09:08 -07:00
Teknium	346601ca8d	fix(context): invalidate stale Codex OAuth cache entries >= 400k (#15078 ) PR #14935 added a Codex-aware context resolver but only new lookups hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4 BEFORE that PR already had the wrong value (e.g. 1,050,000 from models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the cache-first lookup in get_model_context_length() returns it forever. Symptom (reported in the wild by Ludwig, min heo, Gaoge on current main at `6051fba9d`, which is AFTER #14935): * Startup banner shows context usage against 1M * Compression fires late and then OpenAI hard-rejects with 'context length will be reduced from 1,050,000 to 128,000' around the real 272k boundary. Fix: when the step-1 cache returns a value for an openai-codex lookup, check whether it's >= 400k. Codex OAuth caps every slug at 272k (live probe values) so anything at or above 400k is definitionally a pre-#14935 leftover. Drop that entry from the on-disk cache and fall through to step 5, which runs the live /models probe and repersists the correct value (or 272k from the hardcoded fallback if the probe fails). Non-Codex providers and legitimately-cached Codex entries at 272k are untouched. Changes: - agent/model_metadata.py: * _invalidate_cached_context_length() — drop a single entry from context_length_cache.yaml and rewrite the file. * Step-1 cache check in get_model_context_length() now gates provider=='openai-codex' entries >= 400k through invalidation instead of returning them. Tests (3 new in TestCodexOAuthContextLength): - stale 1.05M Codex entry is dropped from disk AND re-resolved through the live probe to 272k; unrelated cache entries survive. - fresh 272k Codex entry is respected (no probe call, no invalidation). - non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter) are unaffected — the guard is strictly scoped to openai-codex. Full tests/agent/test_model_metadata.py: 88 passed.	2026-04-24 04:46:07 -07:00
Teknium	f58a16f520	fix(auth): apply verify= to Codex OAuth /models probe (#15049 ) Follow-up to PR #14533 — applies the same _resolve_requests_verify() treatment to the one requests.get() site the PR missed (Codex OAuth chatgpt.com /models probe). Keeps all seven requests.get() callsites in model_metadata.py consistent so HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE are honored everywhere. Co-authored-by: teknium1 <teknium@hermes-agent>	2026-04-24 03:02:24 -07:00
0xbyt4	8aa37a0cf9	fix(auth): honor SSL CA env vars across httpx + requests callsites - hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi fallback (mirrors weixin `3a0ec1d93`). Extend env var chain to include REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths. - agent/model_metadata.py: add _resolve_requests_verify() reading HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority order. Apply explicit verify= to all 6 requests.get callsites. - Tests: 18 new unit tests + autouse platform pin on existing TestResolveVerifyFallback to keep its "returns True" assertions platform-independent. Empirically verified against self-signed HTTPS server: requests honors REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now honors all three everywhere. Triggered by Discord reports — Nous OAuth SSL failure on macOS Homebrew Python; custom provider self-signed cert ignored despite REQUESTS_CA_BUNDLE set in env.	2026-04-24 03:00:33 -07:00
Teknium	51f4c9827f	fix(context): resolve real Codex OAuth context windows (272k, not 1M) (#14935 ) On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens, but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev) because openai-codex aliases to the openai entry there. At 1.05M the compressor never fires and requests hard-fail with 'context window exceeded' around the real 272k boundary. Verified live against chatgpt.com/backend-api/codex/models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.1-codex-max → context_window = 272000 Changes: - agent/model_metadata.py: * _fetch_codex_oauth_context_lengths() — probe the Codex /models endpoint with the OAuth bearer token and read context_window per slug (1h in-memory TTL). * _resolve_codex_oauth_context_length() — prefer the live probe, fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k). * Wire into get_model_context_length() when provider=='openai-codex', running BEFORE the models.dev lookup (which returns 1.05M). Result persists via save_context_length() so subsequent lookups skip the probe entirely. * Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5 entry (400k was never right for Codex; it's the catch-all for providers we can't probe live). Tests (4 new in TestCodexOAuthContextLength): - fallback table used when no token is available (no models.dev leakage) - live probe overrides the fallback - probe failure (non-200) falls back to hardcoded 272k - non-codex providers (openrouter, direct openai) unaffected Non-codex context resolution is unchanged — the Codex branch only fires when provider=='openai-codex'.	2026-04-23 22:39:47 -07:00
Teknium	8f5fee3e3e	feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720 ) OpenAI launched GPT-5.5 on Codex today (Apr 23 2026). Adds it to the static catalog and pipes the user's OAuth access token into the openai-codex path of provider_model_ids() so /model mid-session and the gateway picker hit the live ChatGPT codex/models endpoint — new models appear for each user according to what ChatGPT actually lists for their account, without a Hermes release. Verified live: 'gpt-5.5' returns priority 0 (featured) from the endpoint, 400k context per OpenAI's launch article. 'hermes chat --provider openai-codex --model gpt-5.5' completes end-to-end. Changes: - hermes_cli/codex_models.py: add gpt-5.5 to DEFAULT_CODEX_MODELS + forward-compat - agent/model_metadata.py: 400k context length entry - hermes_cli/models.py: resolve codex OAuth token before calling get_codex_model_ids() in provider_model_ids('openai-codex')	2026-04-23 13:32:43 -07:00
kshitij	82a0ed1afb	feat: add Xiaomi MiMo v2.5-pro and v2.5 model support (#14635 ) ## Merged Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard. ### Changes - Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144) - Provider lists: xiaomi, opencode-go, setup wizard - Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal) - Config description updated for XIAOMI_API_KEY - Tests updated for new vision model preference ### Verification - 4322 tests passed, 0 new regressions - Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass - Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)	2026-04-23 10:06:25 -07:00
wujhsu	276ef49c96	fix(provider): recognize open.bigmodel.cn as Zhipu/ZAI provider Zhipu AI (智谱) serves both international users via api.z.ai and China-based users via open.bigmodel.cn. The domestic endpoint was not mapped in _URL_TO_PROVIDER, causing Hermes to treat it as an unknown custom endpoint and fall back to the default 128K context length instead of resolving the correct 200K+ context via models.dev or the hardcoded GLM defaults. This affects users of both the standard API (https://open.bigmodel.cn/api/paas/v4) and the Coding Plan (https://open.bigmodel.cn/api/coding/paas/v4).	2026-04-22 17:35:55 -07:00
Clifford Garwood	27621ef836	feat: add ctx_size to context length keys for Lemonade server support - Adds 'ctx_size' field to _CONTEXT_LENGTH_KEYS tuple - Enables hermes agent to correctly detect context size from custom LLMs running on Lemonade server that use this field name instead of the standard keys (max_seq_len, n_ctx_train, n_ctx)	2026-04-22 17:25:04 -07:00
Feranmi	66d2d7090e	fix(model_metadata): add gemma-4 and gemma4 context length entries Fixes #12976 The generic "gemma": 8192 fallback was incorrectly matching gemma4:31b-cloud before the more specific Gemma 4 entries could match, causing Hermes to assign only 8K context instead of 262K. Added "gemma-4" and "gemma4" entries before the fallback to correctly handle Gemma 4 model naming conventions.	2026-04-22 16:33:25 -07:00
Teknium	c96a548bde	feat(models): add xiaomi/mimo-v2.5-pro and mimo-v2.5 to openrouter + nous (#14184 ) Replace xiaomi/mimo-v2-pro with xiaomi/mimo-v2.5-pro and xiaomi/mimo-v2.5 in the OpenRouter fallback catalog and the nous provider model list. Add matching DEFAULT_CONTEXT_LENGTHS entries (1M tokens each).	2026-04-22 16:12:39 -07:00
ismell0992-afk	6513138f26	fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts `is_local_endpoint()` leaned on `ipaddress.is_private`, which classifies RFC-1918 ranges and link-local as private but deliberately excludes the RFC 6598 CGNAT block (100.64.0.0/10) — the range Tailscale uses for its mesh IPs. As a result, Ollama reached over Tailscale (e.g. `http://100.77.243.5:11434`) was treated as remote and missed the automatic stream-read / stale-stream timeout bumps, so cold model load plus long prefill would trip the 300 s watchdog before the first token. Add a module-level `_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")` (built once) and extend `is_local_endpoint()` to match the block both via the parsed-`IPv4Address` path and the existing bare-string fallback (for symmetry with the 10/172/192 checks). Also hoist the previously function-local `import ipaddress` to module scope now that it's used by the constant. Extend `TestIsLocalEndpoint` with a CGNAT positive set (lower bound, representative host, MagicDNS anchor, upper bound) and a near-miss negative set (just below 100.64.0.0, just above 100.127.255.255, well outside the block, and first-octet-wrong).	2026-04-22 14:46:10 -07:00
hengm3467	c6b1ef4e58	feat: add Step Plan provider support (salvage #6005 ) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>	2026-04-22 02:59:58 -07:00
Teknium	62cbeb6367	test: stop testing mutable data — convert change-detectors to invariants (#13363 ) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).	2026-04-20 23:20:33 -07:00
Teknium	dbb7e00e7e	fix: sweep remaining provider-URL substring checks across codebase Completes the hostname-hardening sweep — every substring check against a provider host in live-routing code is now hostname-based. This closes the same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen, ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI, Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI, and Anthropic. New helper: - utils.base_url_host_matches(base_url, domain) — safe counterpart to 'domain in base_url'. Accepts hostname equality and subdomain matches; rejects path segments, host suffixes, and prefix collisions. Call sites converted (real-code only; tests, optional-skills, red-teaming scripts untouched): run_agent.py (10 sites): - AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check) - header cascade for openrouter / copilot / kimi / qwen / chatgpt - interleaved-thinking trigger (openrouter + claude) - _is_openrouter_url(), _is_qwen_portal() - is_native_anthropic check - github-models-vs-copilot detection (3 sites) - reasoning-capable route gate (nousresearch, vercel, github) - codex-backend detection in API kwargs build - fallback api_mode Bedrock detection agent/auxiliary_client.py (7 sites): - extra-headers cascades in 4 distinct client-construction paths (resolve custom, resolve auto, OpenRouter-fallback-to-custom, _async_client_from_sync, resolve_provider_client explicit-custom, resolve_auto_with_codex) - _is_openrouter_client() base_url sniff agent/usage_pricing.py: - resolve_billing_route openrouter branch agent/model_metadata.py: - _is_openrouter_base_url(), Bedrock context-length lookup hermes_cli/providers.py: - determine_api_mode Bedrock heuristic hermes_cli/runtime_provider.py: - _is_openrouter_url flag for API-key preference (issues #420, #560) hermes_cli/doctor.py: - Kimi User-Agent header for /models probes tools/delegate_tool.py: - subagent Codex endpoint detection trajectory_compressor.py: - _detect_provider() cascade (8 providers: openrouter, nous, codex, zai, kimi-coding, arcee, minimax-cn, minimax) cli.py, gateway/run.py: - /model-switch cache-enabled hint (openrouter + claude) Bedrock detection tightened from 'bedrock-runtime in url' to 'hostname starts with bedrock-runtime. AND host is under amazonaws.com'. ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'. Tests: - tests/test_base_url_hostname.py extended with a base_url_host_matches suite (exact match, subdomain, path-segment rejection, host-suffix rejection, host-prefix rejection, empty-input, case-insensitivity, trailing dot). Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock, gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback, fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution, delegate, credential_pool, context_compressor, plus the 4 hostname test modules). 26-assertion E2E call-site verification across 6 modules passes.	2026-04-20 22:14:29 -07:00
Teknium	cecf84daf7	fix: extend hostname-match provider detection across remaining call sites Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the two openai/xai sites in run_agent.py. This finishes the sweep: the same substring-match false-positive class (e.g. https://api.openai.com.evil/v1, https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1) existed in eight more call sites, and the hostname helper was duplicated in two modules. - utils: add shared base_url_hostname() (single source of truth). - hermes_cli/runtime_provider, run_agent: drop local duplicates, import from utils. Reuse the cached AIAgent._base_url_hostname attribute everywhere it's already populated. - agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg selection to hostname equality. - run_agent: native-anthropic check in the Claude-style model branch and in the AIAgent init provider-auto-detect branch. - agent/model_metadata: Anthropic /v1/models context-length lookup. - hermes_cli/providers.determine_api_mode: anthropic / openai URL heuristics for custom/unknown providers (the /anthropic path-suffix convention for third-party gateways is preserved). - tools/delegate_tool: anthropic detection for delegated subagent runtimes. - hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint native-OpenAI detection (paired with deduping the repeated check into a single is_native_openai boolean per branch). Tests: - tests/test_base_url_hostname.py covers the helper directly (path-containing-host, host-suffix, trailing dot, port, case). - tests/hermes_cli/test_determine_api_mode_hostname.py adds the same regression class for determine_api_mode, plus a test that the /anthropic third-party gateway convention still wins. Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.	2026-04-20 22:14:29 -07:00
Tanner Fokkens	cde7283821	fix: forward auth when probing local model metadata Pass the user's configured api_key through local-server detection and context-length probes (detect_local_server_type, _query_local_context_length, query_ollama_num_ctx) and use LM Studio's native /api/v1/models endpoint in fetch_endpoint_model_metadata when a loaded instance is present — so the probed context length is the actual runtime value the user loaded the model at, not just the model's theoretical max. Helps local-LLM users whose auto-detected context length was wrong, causing compression failures and context-overrun crashes.	2026-04-20 20:51:56 -07:00
kshitijk4poor	bc2559c44d	fix: remove codex spark model support Drop gpt-5.3-codex-spark from Codex forward-compat synthesis, provider catalogs, and context metadata now that the API no longer supports it.	2026-04-20 04:51:44 -07:00
Teknium	c6fd2619f7	fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833 ) Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit path (status_code on the exception, Retry-After preserved via error.response) instead of being opaque RuntimeErrors. User sees a one-line capacity message instead of a 500-char JSON dump. Changes - CodeAssistError grows status_code / response / retry_after / details attrs. _extract_status_code in error_classifier picks up status_code and classifies 429 as FailoverReason.rate_limit, so fallback_providers triggers the same way it does for SDK errors. run_agent.py line ~10428 already walks error.response.headers for Retry-After — preserving the response means that path just works. - _gemini_http_error parses the Google error envelope (error.status + error.details[].reason from google.rpc.ErrorInfo, retryDelay from google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404 model-not-found each produce a human-readable message; unknown shapes fall back to the previous raw-body format. - Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and agent/model_metadata.py — Google returned 404 for it today in local repro. Kept gemma-4-31b-it (capacity-constrained but not retired). Validation \| \| Before \| After \| \|---------------------------\|--------------------------------\|-------------------------------------------\| \| Error message \| 'Code Assist returned HTTP 429: {500 chars JSON}' \| 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' \| \| status_code on error \| None (opaque RuntimeError) \| 429 \| \| Classifier reason \| unknown (string-match fallback) \| FailoverReason.rate_limit \| \| Retry-After honored \| ignored \| extracted from RetryInfo or header \| \| gemma-4-26b-it picker \| advertised (404s on Google) \| removed \| Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found, Retry-After header fallback, malformed body, and classifier integration. Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full tests/hermes_cli (2203 tests) green. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 15:34:12 -07:00
Teknium	f362083c64	fix(providers): complete NVIDIA NIM parity with other providers Follow-up on the native NVIDIA NIM provider salvage. The original PR wired PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints required for full parity with other OpenAI-compatible providers (xai, huggingface, deepseek, zai). Gaps closed: - hermes_cli/main.py: - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key provider flow (previously fell through silently). - Add 'nvidia' to `hermes chat --provider` argparse choices so the documented test command (`hermes chat --provider nvidia --model ...`) parses successfully. - hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're auto-added to the subprocess env blocklist. - hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so `hermes doctor` probes https://integrate.api.nvidia.com/v1/models. - hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for `hermes dump` credential masking. - tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses. - agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry so all Nemotron variants get 128K context via substring match (rather than falling back to MINIMUM_CONTEXT_LENGTH). - hermes_cli/models.py: Fix hallucinated model ID 'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b' (verified against live integrate.api.nvidia.com/v1/models catalog). Expand curated list from 5 to 9 agentic models mapping to OpenRouter defaults per provider-guide convention: add qwen3.5-397b-a17b, deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b. - cli-config.yaml.example: Document 'nvidia' provider option. - scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP for CI attribution. E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's endpoint (returns 401 with bogus key instead of argparse error); `hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.	2026-04-17 13:47:46 -07:00
asurla	3b569ff576	feat(providers): add native NVIDIA NIM provider Adds NVIDIA NIM as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, curated models (Nemotron plus other open source models hosted on build.nvidia.com), URL mapping in model_metadata.py, aliases (nim, nvidia-nim, build-nvidia, nemotron), and env var tests. Docs updated: providers page, quickstart table, fallback providers table, and README provider list.	2026-04-17 13:47:46 -07:00

1 2 3

115 commits