The google-gemini-cli (Cloud Code Assist) and gemini (native API) model
pickers only offered gemini-2.5-*, so users picking Gemini 3 had to type
a custom model name — usually wrong (e.g. "gemini-3.1-pro"), producing
a 404 from cloudcode-pa.googleapis.com.
Replace the 2.5-* entries with the actual Code Assist / Gemini API
preview IDs: gemini-3.1-pro-preview, gemini-3-pro-preview,
gemini-3-flash-preview (and gemini-3.1-flash-lite-preview on native).
Update the hardcoded fallback in hermes_cli/main.py to match.
Copilot's menu retains gemini-2.5-pro — that catalog is Microsoft's.
- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks
Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit
path (status_code on the exception, Retry-After preserved via error.response)
instead of being opaque RuntimeErrors. User sees a one-line capacity message
instead of a 500-char JSON dump.
Changes
- CodeAssistError grows status_code / response / retry_after / details attrs.
_extract_status_code in error_classifier picks up status_code and classifies
429 as FailoverReason.rate_limit, so fallback_providers triggers the same
way it does for SDK errors. run_agent.py line ~10428 already walks
error.response.headers for Retry-After — preserving the response means that
path just works.
- _gemini_http_error parses the Google error envelope (error.status +
error.details[].reason from google.rpc.ErrorInfo, retryDelay from
google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404
model-not-found each produce a human-readable message; unknown shapes fall
back to the previous raw-body format.
- Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and
agent/model_metadata.py — Google returned 404 for it today in local repro.
Kept gemma-4-31b-it (capacity-constrained but not retired).
Validation
| | Before | After |
|---------------------------|--------------------------------|-------------------------------------------|
| Error message | 'Code Assist returned HTTP 429: {500 chars JSON}' | 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' |
| status_code on error | None (opaque RuntimeError) | 429 |
| Classifier reason | unknown (string-match fallback) | FailoverReason.rate_limit |
| Retry-After honored | ignored | extracted from RetryInfo or header |
| gemma-4-26b-it picker | advertised (404s on Google) | removed |
Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found,
Retry-After header fallback, malformed body, and classifier integration.
Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full
tests/hermes_cli (2203 tests) green.
Co-authored-by: teknium1 <teknium@nousresearch.com>
Follow-up on the native NVIDIA NIM provider salvage. The original PR wired
PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints
required for full parity with other OpenAI-compatible providers (xai,
huggingface, deepseek, zai).
Gaps closed:
- hermes_cli/main.py:
- Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so
selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key
provider flow (previously fell through silently).
- Add 'nvidia' to `hermes chat --provider` argparse choices so the
documented test command (`hermes chat --provider nvidia --model ...`)
parses successfully.
- hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in
OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're
auto-added to the subprocess env blocklist.
- hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so
`hermes doctor` probes https://integrate.api.nvidia.com/v1/models.
- hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for
`hermes dump` credential masking.
- tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture
with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses.
- agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry
so all Nemotron variants get 128K context via substring match (rather
than falling back to MINIMUM_CONTEXT_LENGTH).
- hermes_cli/models.py: Fix hallucinated model ID
'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b'
(verified against live integrate.api.nvidia.com/v1/models catalog).
Expand curated list from 5 to 9 agentic models mapping to OpenRouter
defaults per provider-guide convention: add qwen3.5-397b-a17b,
deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b.
- cli-config.yaml.example: Document 'nvidia' provider option.
- scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP
for CI attribution.
E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's
endpoint (returns 401 with bogus key instead of argparse error);
`hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.
Adds NVIDIA NIM as a first-class provider: ProviderConfig in
auth.py, HermesOverlay in providers.py, curated models
(Nemotron plus other open source models hosted on
build.nvidia.com), URL mapping in model_metadata.py, aliases
(nim, nvidia-nim, build-nvidia, nemotron), and env var tests.
Docs updated: providers page, quickstart table, fallback
providers table, and README provider list.
Move moonshotai/kimi-k2.5 to position #1 in every model picker list:
- OPENROUTER_MODELS (with 'recommended' tag)
- _PROVIDER_MODELS: nous, kimi-coding, opencode-zen, opencode-go, alibaba, huggingface
- _model_flow_kimi() Coding Plan model list in main.py
kimi-coding-cn and moonshot lists already had kimi-k2.5 first.
The Copilot API returns HTTP 400 "model_not_supported" when it receives a
model ID it doesn't recognize (vendor-prefixed like
`anthropic/claude-sonnet-4.6` or dash-notation like `claude-sonnet-4-6`).
Two bugs combined to leave both formats unhandled:
1. `_COPILOT_MODEL_ALIASES` in hermes_cli/models.py only covered bare
dot-notation and vendor-prefixed dot-notation. Hermes' default Claude
IDs elsewhere use hyphens (anthropic native format), and users with an
aggregator-style config who switch `model.provider` to `copilot`
inherit `anthropic/claude-X-4.6` — neither case was in the table.
2. The Copilot branch of `normalize_model_for_provider()` only stripped
the vendor prefix when it matched the target provider (`copilot/`) or
was the special-cased `openai/` for openai-codex. Every other vendor
prefix survived to the Copilot request unchanged.
Fix:
- Add dash-notation aliases (`claude-{opus,sonnet,haiku}-4-{5,6}` and the
`anthropic/`-prefixed variants) to the alias table.
- Rewire the Copilot / Copilot-ACP branch of
`normalize_model_for_provider()` to delegate to the existing
`normalize_copilot_model_id()`. That function already does alias
lookups, catalog-aware resolution, and vendor-prefix fallback — it was
being bypassed for the generic normalisation entry point.
Because `switch_model()` already calls `normalize_model_for_provider()`
for every `/model` switch (line 685 in model_switch.py), this single fix
covers the CLI startup path (cli.py), the `/model` slash command path,
and the gateway load-from-config path.
Closes#6879
Credits dsr-restyn (#6743) who independently diagnosed the dash-notation
case; their aliases are folded into this consolidated fix alongside the
vendor-prefix stripping repair.
Mirrors OpenRouter which already lists anthropic/claude-opus-4.7 as
recommended. Surfaces the model in the `hermes model` picker and the
gateway /model flow for Nous Portal users.
Context length (1M) is already covered by the existing claude-opus-4.7
entry in agent/model_metadata.py DEFAULT_CONTEXT_LENGTHS.
All 61 TUI-related tests green across 3 consecutive xdist runs.
tests/tui_gateway/test_protocol.py:
- rename `get_messages` → `get_messages_as_conversation` on mock DB (method
was renamed in the real backend, test was still stubbing the old name)
- update tool-message shape expectation: `{role, name, context}` matches
current `_history_to_messages` output, not the legacy `{role, text}`
tests/hermes_cli/test_tui_resume_flow.py:
- `cmd_chat` grew a first-run provider-gate that bailed to "Run: hermes
setup" before `_launch_tui` was ever reached; 3 tests stubbed
`_resolve_last_session` + `_launch_tui` but not the gate
- factored a `main_mod` fixture that stubs `_has_any_provider_configured`,
reused by all three tests
tests/test_tui_gateway_server.py:
- `test_config_set_personality_resets_history_and_returns_info` was flaky
under xdist because the real `_write_config_key` touches
`~/.hermes/config.yaml`, racing with any other worker that writes
config. Stub it in the test.
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guideFixesNousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
provider_model_ids() and list_authenticated_providers() had no case for
"ollama-cloud", so the /model slash command showed 0 models despite
fetch_ollama_cloud_models() being fully implemented. The CLI subcommand
worked because it called fetch_ollama_cloud_models() directly.
- Add ollama-cloud case to provider_model_ids() in models.py
- Populate curated dict for ollama-cloud in list_authenticated_providers()
- Add tests for both code paths
Group A (3 tests): 'No LLM provider configured' RuntimeError
- test_user_message_surrogates_sanitized, test_counters_initialized_in_init,
test_openai_prompt_tokens_unchanged
- Root cause: AIAgent.__init__ now requires base_url alongside api_key to
skip resolve_provider_client() (which returns None when API keys are
blanked in CI). Added base_url='http://localhost:1234/v1' to test
agent construction.
Group B (5 tests): Discord slash command auto-registration
- test_auto_registers_missing_gateway_commands, test_auto_registered_command_*,
test_register_skill_group_*
- Root cause: xdist workers that loaded a discord mock WITHOUT
app_commands.Command/Group caused _register_slash_commands() to fail
silently. Added comprehensive shared discord mock in
tests/gateway/conftest.py (same pattern as existing telegram mock).
Group C (5 errors): Discord reply mode 'NoneType has no DMChannel'
- All TestReplyToText tests
- Root cause: FakeDMChannel was not a subclass of real discord.DMChannel,
so isinstance() checks in _handle_message failed when running in full
suite (real discord installed). Made FakeDMChannel inherit from
discord.DMChannel when available. Removed fragile monkeypatch approach.
Group D (2 tests): detect_provider_for_model wrong provider
- test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_
openrouter_slug (got 'copilot')
- Root cause: ai-gateway, copilot, and kilocode are multi-vendor
aggregators that list other providers' models (OpenRouter-style slugs).
They were being matched in Step 1 before OpenRouter. Added all three
to _AGGREGATORS set so they're skipped like nous/openrouter.
Group E (1 test): model_flow_custom StopIteration
- test_model_flow_custom_saves_verified_v1_base_url
- Root cause: 'Display name' prompt was added after the test was written.
The input iterator had 5 answers but the flow now asks 6 questions.
Added 6th empty string answer.
Group F (1 test): Telegram proxy env assertion
- test_uses_proxy_env_for_primary_and_fallback_transports
- Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first
(via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this
env var, allowing potential leakage from other tests in xdist workers.
Added TELEGRAM_PROXY to the cleanup list.
copilot_model_api_mode() called normalize_copilot_model_id() which
fetched the GitHub model catalog via HTTP, then the secondary endpoint
check fetched it again because the catalog was never passed through.
Fix: fetch the catalog once at the top of copilot_model_api_mode()
and pass it to normalize_copilot_model_id(). The secondary check
then sees a non-None catalog and skips the redundant fetch.
For a Claude model switch on Copilot this eliminates one 5-second-
timeout HTTP call from the interactive /model path.
Surfaced during review of PR #10533.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:
1. Credential check only looked at env vars from PROVIDER_REGISTRY,
missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
OAuth, Claude Code tokens) got silently rerouted
Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
status -- let client init handle missing creds with a clear error
rather than silently routing through the wrong provider
Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.
Closes#10300
Route kimi-coding-cn through _resolve_kimi_base_url() in both
get_api_key_provider_status() and resolve_api_key_provider_credentials()
so CN users with sk-kimi- prefixed keys get auto-detected to the Kimi
Coding Plan endpoint, matching the existing behavior for kimi-coding.
Also update the kimi-coding display label to accurately reflect the
dual-endpoint setup (Kimi Coding Plan + Moonshot API).
Salvaged from PR #10525 by kkikione999.
- Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists
- Add glm-5v context length entry (200K tokens) to model metadata
- Update Z.AI endpoint probe to try multiple candidate models per
endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for
newer coding plan accounts that lack older models
- Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks
(vision_analyze, browser screenshots) route through 5v
Fixes#9888
* feat(skills): add fitness-nutrition skill to optional-skills
Cherry-picked from PR #9177 by @haileymarshall.
Adds a fitness and nutrition skill for gym-goers and health-conscious users:
- Exercise search via wger API (690+ exercises, free, no auth)
- Nutrition lookup via USDA FoodData Central (380K+ foods, DEMO_KEY fallback)
- Offline body composition calculators (BMI, TDEE, 1RM, macros, body fat %)
- Pure stdlib Python, no pip dependencies
Changes from original PR:
- Moved from skills/ to optional-skills/health/ (correct location)
- Fixed BMR formula in FORMULAS.md (removed confusing -5+10, now just +5)
- Fixed author attribution to match PR submitter
- Marked USDA_API_KEY as optional (DEMO_KEY works without signup)
Also adds optional env var support to the skill readiness checker:
- New 'optional: true' field in required_environment_variables entries
- Optional vars are preserved in metadata but don't block skill readiness
- Optional vars skip the CLI capture prompt flow
- Skills with only optional missing vars show as 'available' not 'setup_needed'
* fix: auto-correct close model name matches in /model validation
When a user types a model name with a minor typo (e.g. gpt5.3-codex instead
of gpt-5.3-codex), the validation now auto-corrects to the closest match
instead of accepting the wrong name with a warning.
Uses difflib get_close_matches with cutoff=0.9 to avoid false corrections
(e.g. gpt-5.3 should not silently become gpt-5.4). Applied consistently
across all three validation paths: codex provider, custom endpoints, and
generic API-probed providers.
The validate_requested_model() return dict gains an optional corrected_model
key that switch_model() applies before building the result.
Reported by Discord user — /model gpt5.3-codex was accepted with a warning
but would fail at the API level.
---------
Co-authored-by: haileymarshall <haileymarshall@users.noreply.github.com>
* Add hermes debug share instructions to all issue templates
- bug_report.yml: Add required Debug Report section with hermes debug share
and /debug instructions, make OS/Python/Hermes version optional (covered
by debug report), demote old logs field to optional supplementary
- setup_help.yml: Replace hermes doctor reference with hermes debug share,
add Debug Report section with fallback chain (debug share -> --local -> doctor)
- feature_request.yml: Add optional Debug Report section for environment context
All templates now guide users to run hermes debug share (or /debug in chat)
and paste the resulting paste.rs links, giving maintainers system info,
config, and recent logs in one step.
* feat: add openrouter/elephant-alpha to curated model lists
- Add to OPENROUTER_MODELS (free, positioned above GPT models)
- Add to _PROVIDER_MODELS["nous"] mirror list
- Add 256K context window fallback in model_metadata.py
Remove the two-tier (top/extended) provider picker that hid most
providers behind a 'More providers...' submenu. All providers now
appear in a single flat list.
- Remove tier field from ProviderEntry namedtuple
- Remove tier values from all CANONICAL_PROVIDERS entries
- Flatten the hermes model picker (no more 'More...' submenu)
- Move 'Custom endpoint' to the bottom of the main list
Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with
Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini.
Standard OpenAI-compatible provider checklist: auth.py, config.py,
models.py, main.py, providers.py, doctor.py, model_normalize.py,
model_metadata.py, setup.py, trajectory_compressor.py.
Based on PR #9274 by arthurbr11, simplified to a standard direct
provider without dual-endpoint OpenRouter routing.
Three separate hardcoded provider lists (/model, /provider, hermes model)
diverged over time, causing providers to be missing from some commands.
- Create CANONICAL_PROVIDERS in hermes_cli/models.py as the single source
of truth for all provider identity, labels, and TUI ordering
- Derive _PROVIDER_LABELS and list_available_providers() from canonical list
- Add step 2b in list_authenticated_providers() to cross-check canonical
list — catches providers with credentials that weren't found via
PROVIDER_TO_MODELS_DEV or HERMES_OVERLAYS mappings
- Derive hermes model TUI provider menus from canonical list
- Add deepseek and xai as first-class providers (were missing from TUI)
- Add grok/x-ai/x.ai aliases for xai provider
Fixes: /model command not showing all providers that hermes model shows
Cherry-picked from PR #7637 by hcshen0111.
Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var
and api.moonshot.cn/v1 endpoint for China-region Moonshot users.
OpenCode Zen was in _DOT_TO_HYPHEN_PROVIDERS, causing all dotted model
names (minimax-m2.5-free, gpt-5.4, glm-5.1) to be mangled. The fix:
Layer 1 (model_normalize.py): Remove opencode-zen from the blanket
dot-to-hyphen set. Add an explicit block that preserves dots for
non-Claude models while keeping Claude hyphenated (Zen's Claude
endpoint uses anthropic_messages mode which expects hyphens).
Layer 2 (run_agent.py _anthropic_preserve_dots): Add opencode-zen and
zai to the provider allowlist. Broaden URL check from opencode.ai/zen/go
to opencode.ai/zen/ to cover both Go and Zen endpoints. Add bigmodel.cn
for ZAI URL detection.
Also adds glm-5.1 to ZAI model lists in models.py and setup.py.
Closes#7710
Salvaged from contributions by:
- konsisumer (PR #7739, #7719)
- DomGrieco (PR #8708)
- Esashiero (PR #7296)
- sharziki (PR #7497)
- XiaoYingGee (PR #8750)
- APTX4869-maker (PR #8752)
- kagura-agent (PR #7157)
Users who set up Nous auth without explicitly selecting a model via
`hermes model` were silently falling back to anthropic/claude-opus-4.6
(the first entry in _PROVIDER_MODELS['nous']), causing unexpected
charges on their Nous plan. Move xiaomi/mimo-v2-pro to the first
position so unconfigured users default to a free model instead.
When a user configures a provider (e.g. `hermes auth add openai-codex`)
but never selects a model via `hermes model`, the gateway and CLI would
pass an empty model string to the API, causing:
'Codex Responses request model must be a non-empty string'
Now both gateway (_resolve_session_agent_runtime) and CLI
(_ensure_runtime_credentials) detect an empty model and fill it from
the provider's first catalog entry in _PROVIDER_MODELS. This covers
all providers that have a static model list (openai-codex, anthropic,
gemini, copilot, etc.).
The fix is conservative: it only triggers when model is truly empty
and a known provider was resolved. Explicit model choices are never
overridden.
When /model is called with no arguments in the interactive CLI, open a
two-step prompt_toolkit modal instead of the previous text-only listing:
1. Provider selection — curses_single_select with all authenticated providers
2. Model selection — live API fetch with curated fallback
Also fixes:
- OpenAI Codex model normalization (openai/gpt-5.4 → gpt-5.4)
- Dedicated Codex validation path using provider_model_ids()
Preserves curses_radiolist (used by setup, tools, plugins) alongside the
new curses_single_select. Retains tool elapsed timer in spinner.
Cherry-picked from PR #7438 by MestreY0d4-Uninter.
Cherry-picked from PR #7702 by kshitijk4poor.
Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)
Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.
Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.
Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.
36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.