* docs: deep audit — fix stale config keys, missing commands, and registry drift Cross-checked ~80 high-impact docs pages (getting-started, reference, top-level user-guide, user-guide/features) against the live registries: hermes_cli/commands.py COMMAND_REGISTRY (slash commands) hermes_cli/auth.py PROVIDER_REGISTRY (providers) hermes_cli/config.py DEFAULT_CONFIG (config keys) toolsets.py TOOLSETS (toolsets) tools/registry.py get_all_tool_names() (tools) python -m hermes_cli.main <subcmd> --help (CLI args) reference/ - cli-commands.md: drop duplicate hermes fallback row + duplicate section, add stepfun/lmstudio to --provider enum, expand auth/mcp/curator subcommand lists to match --help output (status/logout/spotify, login, archive/prune/ list-archived). - slash-commands.md: add missing /sessions and /reload-skills entries + correct the cross-platform Notes line. - tools-reference.md: drop bogus '68 tools' headline, drop fictional 'browser-cdp toolset' (these tools live in 'browser' and are runtime-gated), add missing 'kanban' and 'video' toolset sections, fix MCP example to use the real mcp_<server>_<tool> prefix. - toolsets-reference.md: list browser_cdp/browser_dialog inside the 'browser' row, add missing 'kanban' and 'video' toolset rows, drop the stale '38 tools' count for hermes-cli. - profile-commands.md: add missing install/update/info subcommands, document fish completion. - environment-variables.md: dedupe GMI_API_KEY/GMI_BASE_URL rows (kept the one with the correct gmi-serving.com default). - faq.md: Anthropic/Google/OpenAI examples — direct providers exist (not just via OpenRouter), refresh the OpenAI model list. getting-started/ - installation.md: PortableGit (not MinGit) is what the Windows installer fetches; document the 32-bit MinGit fallback. - installation.md / termux.md: installer prefers .[termux-all] then falls back to .[termux]. - nix-setup.md: Python 3.12 (not 3.11), Node.js 22 (not 20); fix invalid 'nix flake update --flake' invocation. - updating.md: 'hermes backup restore --state pre-update' doesn't exist — point at the snapshot/quick-snapshot flow; correct config key 'updates.pre_update_backup' (was 'update.backup'). user-guide/ - configuration.md: api_max_retries default 3 (not 2); display.runtime_footer is the real key (not display.runtime_metadata_footer); checkpoints defaults enabled=false / max_snapshots=20 (not true / 50). - configuring-models.md: 'hermes model list' / 'hermes model set ...' don't exist — hermes model is interactive only. - tui.md: busy_indicator -> tui_status_indicator with values kaomoji|emoji|unicode|ascii (not kawaii|minimal|dots|wings|none). - security.md: SSH backend keys (TERMINAL_SSH_HOST/USER/KEY) live in .env, not config.yaml. - windows-wsl-quickstart.md: there is no 'hermes api' subcommand — the OpenAI-compatible API server runs inside hermes gateway. user-guide/features/ - computer-use.md: approvals.mode (not security.approval_level); fix broken ./browser-use.md link to ./browser.md. - fallback-providers.md: top-level fallback_providers (not model.fallback_providers); the picker is subcommand-based, not modal. - api-server.md: API_SERVER_* are env vars — write to per-profile .env, not 'hermes config set' which targets YAML. - web-search.md: drop web_crawl as a registered tool (it isn't); deep-crawl modes are exposed through web_extract. - kanban.md: failure_limit default is 2, not '~5'. - plugins.md: drop hard-coded '33 providers' count. - honcho.md: fix unclosed quote in echo HONCHO_API_KEY snippet; document that 'hermes honcho' subcommand is gated on memory.provider=honcho; reconcile subcommand list with actual --help output. - memory-providers.md: legacy 'hermes honcho setup' redirect documented. Verified via 'npm run build' — site builds cleanly; broken-link count went from 149 to 146 (no regressions, fixed a few in passing). * docs: round 2 audit fixes + regenerate skill catalogs Follow-up to the previous commit on this branch: Round 2 manual fixes: - quickstart.md: KIMI_CODING_API_KEY mentioned alongside KIMI_API_KEY; voice-mode and ACP install commands rewritten — bare 'pip install ...' doesn't work for curl-installed setups (no pip on PATH, not in repo dir); replaced with 'cd ~/.hermes/hermes-agent && uv pip install -e ".[voice]"'. ACP already ships in [all] so the curl install includes it. - cli.md / configuration.md: 'auxiliary.compression.model' shown as 'google/gemini-3-flash-preview' (the doc's own claimed default); actual default is empty (= use main model). Reworded as 'leave empty (default) or pin a cheap model'. - built-in-plugins.md: added the bundled 'kanban/dashboard' plugin row that was missing from the table. Regenerated skill catalogs: - ran website/scripts/generate-skill-docs.py to refresh all 163 per-skill pages and both reference catalogs (skills-catalog.md, optional-skills-catalog.md). This adds the entries that were genuinely missing — productivity/teams-meeting-pipeline (bundled), optional/finance/* (entire category — 7 skills: 3-statement-model, comps-analysis, dcf-model, excel-author, lbo-model, merger-model, pptx-author), creative/hyperframes, creative/kanban-video-orchestrator, devops/watchers, productivity/shop-app, research/searxng-search, apple/macos-computer-use — and rewrites every other per-skill page from the current SKILL.md. Most diffs are tiny (one line of refreshed metadata). Validation: - 'npm run build' succeeded. - Broken-link count moved 146 -> 155 — the +9 are zh-Hans translation shells that lag every newly-added skill page (pre-existing pattern). No regressions on any en/ page.
16 KiB
| title | description | sidebar_label | sidebar_position |
|---|---|---|---|
| Fallback Providers | Configure automatic failover to backup LLM providers when your primary model is unavailable. | Fallback Providers | 8 |
Fallback Providers
Hermes Agent has three layers of resilience that keep your sessions running when providers hit issues:
- Credential pools — rotate across multiple API keys for the same provider (tried first)
- Primary model fallback — automatically switches to a different provider:model when your main model fails
- Auxiliary task fallback — independent provider resolution for side tasks like vision, compression, and web extraction
Credential pools handle same-provider rotation (e.g., multiple OpenRouter keys). This page covers cross-provider fallback. Both are optional and work independently.
Primary Model Fallback
When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.
Configuration
The easiest path is the interactive manager:
hermes fallback
hermes fallback reuses the provider picker from hermes model — same provider list, same credential prompts, same validation. Use the subcommands add, list (alias ls), remove (alias rm), and clear to manage the chain. Changes persist under the top-level fallback_providers: list in config.yaml.
If you'd rather edit the YAML directly, add a fallback_model section to ~/.hermes/config.yaml:
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
Both provider and model are required. If either is missing, the fallback is disabled.
:::note fallback_model vs fallback_providers
fallback_model (singular) is the legacy single-fallback key — Hermes still honors it for back-compat. fallback_providers (plural, list) supports multiple fallbacks tried in order; hermes fallback writes to this key. When both are set, Hermes merges them with fallback_providers taking priority.
:::
Supported Providers
| Provider | Value | Requirements |
|---|---|---|
| AI Gateway | ai-gateway |
AI_GATEWAY_API_KEY |
| OpenRouter | openrouter |
OPENROUTER_API_KEY |
| Nous Portal | nous |
hermes auth (OAuth) |
| OpenAI Codex | openai-codex |
hermes model (ChatGPT OAuth) |
| GitHub Copilot | copilot |
COPILOT_GITHUB_TOKEN, GH_TOKEN, or GITHUB_TOKEN |
| GitHub Copilot ACP | copilot-acp |
External process (editor integration) |
| Anthropic | anthropic |
ANTHROPIC_API_KEY or Claude Code credentials |
| z.ai / GLM | zai |
GLM_API_KEY |
| Kimi / Moonshot | kimi-coding |
KIMI_API_KEY |
| MiniMax | minimax |
MINIMAX_API_KEY |
| MiniMax (China) | minimax-cn |
MINIMAX_CN_API_KEY |
| DeepSeek | deepseek |
DEEPSEEK_API_KEY |
| NVIDIA NIM | nvidia |
NVIDIA_API_KEY (optional: NVIDIA_BASE_URL) |
| GMI Cloud | gmi |
GMI_API_KEY (optional: GMI_BASE_URL) |
| StepFun | stepfun |
STEPFUN_API_KEY (optional: STEPFUN_BASE_URL) |
| Ollama Cloud | ollama-cloud |
OLLAMA_API_KEY |
| Google Gemini (OAuth) | google-gemini-cli |
hermes model (Google OAuth; optional: HERMES_GEMINI_PROJECT_ID) |
| Google AI Studio | gemini |
GOOGLE_API_KEY (alias: GEMINI_API_KEY) |
| xAI (Grok) | xai (alias grok) |
XAI_API_KEY (optional: XAI_BASE_URL) |
| AWS Bedrock | bedrock |
Standard boto3 auth (AWS_REGION + AWS_PROFILE or AWS_ACCESS_KEY_ID) |
| Qwen Portal (OAuth) | qwen-oauth |
hermes model (Qwen Portal OAuth; optional: HERMES_QWEN_BASE_URL) |
| MiniMax (OAuth) | minimax-oauth |
hermes model (MiniMax portal OAuth) |
| OpenCode Zen | opencode-zen |
OPENCODE_ZEN_API_KEY |
| OpenCode Go | opencode-go |
OPENCODE_GO_API_KEY |
| Kilo Code | kilocode |
KILOCODE_API_KEY |
| Xiaomi MiMo | xiaomi |
XIAOMI_API_KEY |
| Arcee AI | arcee |
ARCEEAI_API_KEY |
| GMI Cloud | gmi |
GMI_API_KEY |
| Alibaba / DashScope | alibaba |
DASHSCOPE_API_KEY |
| Alibaba Coding Plan | alibaba-coding-plan |
ALIBABA_CODING_PLAN_API_KEY (falls back to DASHSCOPE_API_KEY) |
| Kimi / Moonshot (China) | kimi-coding-cn |
KIMI_CN_API_KEY |
| StepFun | stepfun |
STEPFUN_API_KEY |
| Tencent TokenHub | tencent-tokenhub |
TOKENHUB_API_KEY |
| Azure AI Foundry | azure-foundry |
AZURE_FOUNDRY_API_KEY + AZURE_FOUNDRY_BASE_URL |
| LM Studio (local) | lmstudio |
LM_API_KEY (or none for local) + LM_BASE_URL |
| Hugging Face | huggingface |
HF_TOKEN |
| Custom endpoint | custom |
base_url + key_env (see below) |
Custom Endpoint Fallback
For a custom OpenAI-compatible endpoint, add base_url and optionally key_env:
fallback_model:
provider: custom
model: my-local-model
base_url: http://localhost:8000/v1
key_env: MY_LOCAL_KEY # env var name containing the API key
When Fallback Triggers
The fallback activates automatically when the primary model fails with:
- Rate limits (HTTP 429) — after exhausting retry attempts
- Server errors (HTTP 500, 502, 503) — after exhausting retry attempts
- Auth failures (HTTP 401, 403) — immediately (no point retrying)
- Not found (HTTP 404) — immediately
- Invalid responses — when the API returns malformed or empty responses repeatedly
When triggered, Hermes:
- Resolves credentials for the fallback provider
- Builds a new API client
- Swaps the model, provider, and client in-place
- Resets the retry counter and continues the conversation
The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.
:::info Per-Turn, Not Per-Session Fallback is turn-scoped: each new user message starts with the primary model restored. If the primary fails mid-turn, fallback activates for that turn only. On the next message, Hermes tries the primary again. Within a single turn, fallback activates at most once — if the fallback also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops within a turn while giving the primary model a fresh chance every turn. :::
Examples
OpenRouter as fallback for Anthropic native:
model:
provider: anthropic
default: claude-sonnet-4-6
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
Nous Portal as fallback for OpenRouter:
model:
provider: openrouter
default: anthropic/claude-opus-4
fallback_model:
provider: nous
model: nous-hermes-3
Local model as fallback for cloud:
fallback_model:
provider: custom
model: llama-3.1-70b
base_url: http://localhost:8000/v1
key_env: LOCAL_API_KEY
Codex OAuth as fallback:
fallback_model:
provider: openai-codex
model: gpt-5.3-codex
Where Fallback Works
| Context | Fallback Supported |
|---|---|
| CLI sessions | ✔ |
| Messaging gateway (Telegram, Discord, etc.) | ✔ |
| Subagent delegation | ✘ (subagents do not inherit fallback config) |
| Cron jobs | ✘ (run with a fixed provider) |
| Auxiliary tasks (vision, compression) | ✘ (use their own provider chain — see below) |
:::tip
There are no environment variables for fallback_model — it is configured exclusively through config.yaml. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override.
:::
Auxiliary Task Fallback
Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.
Tasks with Independent Provider Resolution
| Task | What It Does | Config Key |
|---|---|---|
| Vision | Image analysis, browser screenshots | auxiliary.vision |
| Web Extract | Web page summarization | auxiliary.web_extract |
| Compression | Context compression summaries | auxiliary.compression |
| Session Search | Past session summarization | auxiliary.session_search |
| Skills Hub | Skill search and discovery | auxiliary.skills_hub |
| MCP | MCP helper operations | auxiliary.mcp |
| Approval | Smart command-approval classification | auxiliary.approval |
| Title Generation | Session title summaries | auxiliary.title_generation |
| Triage Specifier | hermes kanban specify / dashboard ✨ button — fleshes out a one-liner triage task into a real spec |
auxiliary.triage_specifier |
Auto-Detection Chain
When a task's provider is set to "auto" (the default), Hermes tries providers in order until one works:
For text tasks (compression, web extract, etc.):
OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
API-key providers (z.ai, Kimi, MiniMax, Xiaomi MiMo, Hugging Face, Anthropic) → give up
For vision tasks:
Main provider (if vision-capable) → OpenRouter → Nous Portal →
Codex OAuth → Anthropic → Custom endpoint → give up
If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit base_url is set, it tries OpenRouter as a last-resort fallback.
Configuring Auxiliary Providers
Each task can be configured independently in config.yaml:
auxiliary:
vision:
provider: "auto" # auto | openrouter | nous | codex | main | anthropic
model: "" # e.g. "openai/gpt-4o"
base_url: "" # direct endpoint (takes precedence over provider)
api_key: "" # API key for base_url
web_extract:
provider: "auto"
model: ""
compression:
provider: "auto"
model: ""
session_search:
provider: "auto"
model: ""
timeout: 30
max_concurrency: 3
extra_body: {}
skills_hub:
provider: "auto"
model: ""
mcp:
provider: "auto"
model: ""
Every task above follows the same provider / model / base_url pattern. Context compression is configured under auxiliary.compression:
auxiliary:
compression:
provider: main # Same provider options as other auxiliary tasks
model: google/gemini-3-flash-preview
base_url: null # Custom OpenAI-compatible endpoint
And the fallback model uses:
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
# base_url: http://localhost:8000/v1 # Optional custom endpoint
For auxiliary.session_search, Hermes also supports:
max_concurrencyto limit how many session summaries run at onceextra_bodyto pass provider-specific OpenAI-compatible request fields through on the summarization calls
Example:
auxiliary:
session_search:
provider: main
model: glm-4.5-air
max_concurrency: 2
extra_body:
enable_thinking: false
If your provider does not support a native OpenAI-compatible reasoning-control field, extra_body will not help for that part; in that case max_concurrency is still useful for reducing request-burst 429s.
All three — auxiliary, compression, fallback — work the same way: set provider to pick who handles the request, model to pick which model, and base_url to point at a custom endpoint (overrides provider).
Provider Options for Auxiliary Tasks
These options apply to auxiliary:, compression:, and fallback_model: configs only — "main" is not a valid value for your top-level model.provider. For custom endpoints, use provider: custom in your model: section (see AI Providers).
| Provider | Description | Requirements |
|---|---|---|
"auto" |
Try providers in order until one works (default) | At least one provider configured |
"openrouter" |
Force OpenRouter | OPENROUTER_API_KEY |
"nous" |
Force Nous Portal | hermes auth |
"codex" |
Force Codex OAuth | hermes model → Codex |
"main" |
Use whatever provider the main agent uses (auxiliary tasks only) | Active main provider configured |
"anthropic" |
Force Anthropic native | ANTHROPIC_API_KEY or Claude Code credentials |
Direct Endpoint Override
For any auxiliary task, setting base_url bypasses provider resolution entirely and sends requests directly to that endpoint:
auxiliary:
vision:
base_url: "http://localhost:1234/v1"
api_key: "local-key"
model: "qwen2.5-vl"
base_url takes precedence over provider. Hermes uses the configured api_key for authentication, falling back to OPENAI_API_KEY if not set. It does not reuse OPENROUTER_API_KEY for custom endpoints.
Context Compression Fallback
Context compression uses the auxiliary.compression config block to control which model and provider handles summarization:
auxiliary:
compression:
provider: "auto" # auto | openrouter | nous | main
model: "google/gemini-3-flash-preview"
:::info Legacy migration
Older configs with compression.summary_model / compression.summary_provider / compression.summary_base_url are automatically migrated to auxiliary.compression.* on first load (config version 17).
:::
If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
Delegation Provider Override
Subagents spawned by delegate_task do not use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:
delegation:
provider: "openrouter" # override provider for all subagents
model: "google/gemini-3-flash-preview" # override model
# base_url: "http://localhost:1234/v1" # or use a direct endpoint
# api_key: "local-key"
See Subagent Delegation for full configuration details.
Cron Job Providers
Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure provider and model overrides on the cron job itself:
cronjob(
action="create",
schedule="every 2h",
prompt="Check server status",
provider="openrouter",
model="google/gemini-3-flash-preview"
)
See Scheduled Tasks (Cron) for full configuration details.
Summary
| Feature | Fallback Mechanism | Config Location |
|---|---|---|
| Main agent model | fallback_model in config.yaml — per-turn failover on errors (primary restored each turn) |
fallback_model: (top-level) |
| Vision | Auto-detection chain + internal OpenRouter retry | auxiliary.vision |
| Web extraction | Auto-detection chain + internal OpenRouter retry | auxiliary.web_extract |
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | auxiliary.compression |
| Session search | Auto-detection chain | auxiliary.session_search |
| Skills hub | Auto-detection chain | auxiliary.skills_hub |
| MCP helpers | Auto-detection chain | auxiliary.mcp |
| Approval classification | Auto-detection chain | auxiliary.approval |
| Title generation | Auto-detection chain | auxiliary.title_generation |
| Triage specifier | Auto-detection chain | auxiliary.triage_specifier |
| Delegation | Provider override only (no automatic fallback) | delegation.provider / delegation.model |
| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job provider / model |