docs: fallback providers + /background command documentation

* docs: comprehensive fallback providers documentation

- New dedicated page: user-guide/features/fallback-providers.md covering
  both primary model fallback and auxiliary task fallback systems
- Updated configuration.md with fallback_model config section
- Updated environment-variables.md noting fallback is config-only
- Fleshed out developer-guide/provider-runtime.md fallback section with
  internal architecture details (trigger points, activation flow, config flow)
- Added cross-reference from provider-routing.md distinguishing OpenRouter
  sub-provider routing from Hermes-level model fallback
- Added new page to sidebar under Integrations

* docs: comprehensive /background command documentation

- Added Background Sessions section to cli.md covering how it works
  (daemon threads, isolated sessions, config inheritance, Rich panel
  output, bell notification, concurrent tasks)
- Added Background Sessions section to messaging/index.md covering
  messaging-specific behavior (async execution, result delivery back
  to same chat, fire-and-forget pattern)
- Documented background_process_notifications config
  (all/result/error/off) in messaging docs and configuration.md
- Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page
- Fixed inconsistency in slash-commands.md: /background was listed as
  messaging-only but works in both CLI and messaging. Moved it to the
  'both surfaces' note.
- Expanded one-liner table descriptions with detail and cross-references
This commit is contained in:
Teknium 2026-03-15 06:24:28 -07:00 committed by GitHub
parent 60cce9ca6d
commit 463239ed85
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 495 additions and 5 deletions

View file

@ -259,6 +259,55 @@ compression:
When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.
## Background Sessions
Run a prompt in a separate background session while continuing to use the CLI for other work:
```
/background Analyze the logs in /var/log and summarize any errors from today
```
Hermes immediately confirms the task and gives you back the prompt:
```
🔄 Background task #1 started: "Analyze the logs in /var/log and summarize..."
Task ID: bg_143022_a1b2c3
```
### How It Works
Each `/background` prompt spawns a **completely separate agent session** in a daemon thread:
- **Isolated conversation** — the background agent has no knowledge of your current session's history. It receives only the prompt you provide.
- **Same configuration** — the background agent inherits your model, provider, toolsets, reasoning settings, and fallback model from the current session.
- **Non-blocking** — your foreground session stays fully interactive. You can chat, run commands, or even start more background tasks.
- **Multiple tasks** — you can run several background tasks simultaneously. Each gets a numbered ID.
### Results
When a background task finishes, the result appears as a panel in your terminal:
```
╭─ ⚕ Hermes (background #1) ──────────────────────────────────╮
│ Found 3 errors in syslog from today: │
│ 1. OOM killer invoked at 03:22 — killed process nginx │
│ 2. Disk I/O error on /dev/sda1 at 07:15 │
│ 3. Failed SSH login attempts from 192.168.1.50 at 14:30 │
╰──────────────────────────────────────────────────────────────╯
```
If the task fails, you'll see an error notification instead. If `display.bell_on_complete` is enabled in your config, the terminal bell rings when the task finishes.
### Use Cases
- **Long-running research** — "/background research the latest developments in quantum error correction" while you work on code
- **File processing** — "/background analyze all Python files in this repo and list any security issues" while you continue a conversation
- **Parallel investigations** — start multiple background tasks to explore different angles simultaneously
:::info
Background sessions do not appear in your main conversation history. They are standalone sessions with their own task ID (e.g., `bg_143022_a1b2c3`).
:::
## Quiet Mode
By default, the CLI runs in quiet mode which:

View file

@ -421,6 +421,26 @@ provider_routing:
**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.
## Fallback Model
Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
```yaml
fallback_model:
provider: openrouter # required
model: anthropic/claude-sonnet-4 # required
# base_url: http://localhost:8000/v1 # optional, for custom endpoints
# api_key_env: MY_CUSTOM_KEY # optional, env var name for custom endpoint API key
```
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
:::tip
Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
:::
## Terminal Backend Configuration
Configure which environment the agent uses for terminal commands:
@ -733,6 +753,7 @@ display:
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
show_reasoning: false # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
background_process_notifications: all # all | result | error | off (gateway only)
```
| Mode | What you see |

View file

@ -0,0 +1,311 @@
---
title: Fallback Providers
description: Configure automatic failover to backup LLM providers when your primary model is unavailable.
sidebar_label: Fallback Providers
sidebar_position: 8
---
# Fallback Providers
Hermes Agent has two separate fallback systems that keep your sessions running when providers hit issues:
1. **Primary model fallback** — automatically switches to a backup provider:model when your main model fails
2. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction
Both are optional and work independently.
## Primary Model Fallback
When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.
### Configuration
Add a `fallback_model` section to `~/.hermes/config.yaml`:
```yaml
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
```
Both `provider` and `model` are **required**. If either is missing, the fallback is disabled.
### Supported Providers
| Provider | Value | Requirements |
|----------|-------|-------------|
| OpenRouter | `openrouter` | `OPENROUTER_API_KEY` |
| Nous Portal | `nous` | `hermes login` (OAuth) |
| OpenAI Codex | `openai-codex` | `hermes model` (ChatGPT OAuth) |
| Anthropic | `anthropic` | `ANTHROPIC_API_KEY` or Claude Code credentials |
| z.ai / GLM | `zai` | `GLM_API_KEY` |
| Kimi / Moonshot | `kimi-coding` | `KIMI_API_KEY` |
| MiniMax | `minimax` | `MINIMAX_API_KEY` |
| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
| Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) |
### Custom Endpoint Fallback
For a custom OpenAI-compatible endpoint, add `base_url` and optionally `api_key_env`:
```yaml
fallback_model:
provider: custom
model: my-local-model
base_url: http://localhost:8000/v1
api_key_env: MY_LOCAL_KEY # env var name containing the API key
```
### When Fallback Triggers
The fallback activates automatically when the primary model fails with:
- **Rate limits** (HTTP 429) — after exhausting retry attempts
- **Server errors** (HTTP 500, 502, 503) — after exhausting retry attempts
- **Auth failures** (HTTP 401, 403) — immediately (no point retrying)
- **Not found** (HTTP 404) — immediately
- **Invalid responses** — when the API returns malformed or empty responses repeatedly
When triggered, Hermes:
1. Resolves credentials for the fallback provider
2. Builds a new API client
3. Swaps the model, provider, and client in-place
4. Resets the retry counter and continues the conversation
The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.
:::info One-Shot
Fallback activates **at most once** per session. If the fallback provider also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops.
:::
### Examples
**OpenRouter as fallback for Anthropic native:**
```yaml
model:
provider: anthropic
default: claude-sonnet-4-6
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
```
**Nous Portal as fallback for OpenRouter:**
```yaml
model:
provider: openrouter
default: anthropic/claude-opus-4
fallback_model:
provider: nous
model: nous-hermes-3
```
**Local model as fallback for cloud:**
```yaml
fallback_model:
provider: custom
model: llama-3.1-70b
base_url: http://localhost:8000/v1
api_key_env: LOCAL_API_KEY
```
**Codex OAuth as fallback:**
```yaml
fallback_model:
provider: openai-codex
model: gpt-5.3-codex
```
### Where Fallback Works
| Context | Fallback Supported |
|---------|-------------------|
| CLI sessions | ✔ |
| Messaging gateway (Telegram, Discord, etc.) | ✔ |
| Subagent delegation | ✘ (subagents do not inherit fallback config) |
| Cron jobs | ✘ (run with a fixed provider) |
| Auxiliary tasks (vision, compression) | ✘ (use their own provider chain — see below) |
:::tip
There are no environment variables for `fallback_model` — it is configured exclusively through `config.yaml`. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override.
:::
---
## Auxiliary Task Fallback
Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.
### Tasks with Independent Provider Resolution
| Task | What It Does | Config Key |
|------|-------------|-----------|
| Vision | Image analysis, browser screenshots | `auxiliary.vision` |
| Web Extract | Web page summarization | `auxiliary.web_extract` |
| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
| Session Search | Past session summarization | `auxiliary.session_search` |
| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
| MCP | MCP helper operations | `auxiliary.mcp` |
| Memory Flush | Memory consolidation | `auxiliary.flush_memories` |
### Auto-Detection Chain
When a task's provider is set to `"auto"` (the default), Hermes tries providers in order until one works:
**For text tasks (compression, web extract, etc.):**
```text
OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up
```
**For vision tasks:**
```text
Main provider (if vision-capable) → OpenRouter → Nous Portal →
Codex OAuth → Anthropic → Custom endpoint → give up
```
If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit `base_url` is set, it tries OpenRouter as a last-resort fallback.
### Configuring Auxiliary Providers
Each task can be configured independently in `config.yaml`:
```yaml
auxiliary:
vision:
provider: "auto" # auto | openrouter | nous | codex | main | anthropic
model: "" # e.g. "openai/gpt-4o"
base_url: "" # direct endpoint (takes precedence over provider)
api_key: "" # API key for base_url
web_extract:
provider: "auto"
model: ""
compression:
provider: "auto"
model: ""
session_search:
provider: "auto"
model: ""
skills_hub:
provider: "auto"
model: ""
mcp:
provider: "auto"
model: ""
flush_memories:
provider: "auto"
model: ""
```
Or via environment variables:
```bash
AUXILIARY_VISION_PROVIDER=openrouter
AUXILIARY_VISION_MODEL=openai/gpt-4o
AUXILIARY_WEB_EXTRACT_PROVIDER=nous
CONTEXT_COMPRESSION_PROVIDER=main
CONTEXT_COMPRESSION_MODEL=google/gemini-3-flash-preview
```
### Provider Options for Auxiliary Tasks
| Provider | Description | Requirements |
|----------|-------------|-------------|
| `"auto"` | Try providers in order until one works (default) | At least one provider configured |
| `"openrouter"` | Force OpenRouter | `OPENROUTER_API_KEY` |
| `"nous"` | Force Nous Portal | `hermes login` |
| `"codex"` | Force Codex OAuth | `hermes model` → Codex |
| `"main"` | Use whatever provider the main agent uses | Active main provider configured |
| `"anthropic"` | Force Anthropic native | `ANTHROPIC_API_KEY` or Claude Code credentials |
### Direct Endpoint Override
For any auxiliary task, setting `base_url` bypasses provider resolution entirely and sends requests directly to that endpoint:
```yaml
auxiliary:
vision:
base_url: "http://localhost:1234/v1"
api_key: "local-key"
model: "qwen2.5-vl"
```
`base_url` takes precedence over `provider`. Hermes uses the configured `api_key` for authentication, falling back to `OPENAI_API_KEY` if not set. It does **not** reuse `OPENROUTER_API_KEY` for custom endpoints.
---
## Context Compression Fallback
Context compression has a legacy configuration path in addition to the auxiliary system:
```yaml
compression:
summary_provider: "auto" # auto | openrouter | nous | main
summary_model: "google/gemini-3-flash-preview"
```
This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
---
## Delegation Provider Override
Subagents spawned by `delegate_task` do **not** use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:
```yaml
delegation:
provider: "openrouter" # override provider for all subagents
model: "google/gemini-3-flash-preview" # override model
# base_url: "http://localhost:1234/v1" # or use a direct endpoint
# api_key: "local-key"
```
See [Subagent Delegation](/docs/user-guide/features/delegation) for full configuration details.
---
## Cron Job Providers
Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure `provider` and `model` overrides on the cron job itself:
```python
cronjob(
action="create",
schedule="every 2h",
prompt="Check server status",
provider="openrouter",
model="google/gemini-3-flash-preview"
)
```
See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configuration details.
---
## Summary
| Feature | Fallback Mechanism | Config Location |
|---------|-------------------|----------------|
| Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
| Session search | Auto-detection chain | `auxiliary.session_search` |
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
| Memory flush | Auto-detection chain | `auxiliary.flush_memories` |
| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |

View file

@ -194,3 +194,7 @@ provider_routing:
## Default Behavior
When no `provider_routing` section is configured (the default), OpenRouter uses its own default routing logic, which generally balances cost and availability automatically.
:::tip Provider Routing vs. Fallback Models
Provider routing controls which **sub-providers within OpenRouter** handle your requests. For automatic failover to an entirely different provider when your primary model fails, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
:::

View file

@ -181,6 +181,63 @@ When enabled, the bot sends status messages as it works:
🐍 execute_code...
```
## Background Sessions
Run a prompt in a separate background session so the agent works on it independently while your main chat stays responsive:
```
/background Check all servers in the cluster and report any that are down
```
Hermes confirms immediately:
```
🔄 Background task started: "Check all servers in the cluster..."
Task ID: bg_143022_a1b2c3
```
### How It Works
Each `/background` prompt spawns a **separate agent instance** that runs asynchronously:
- **Isolated session** — the background agent has its own session with its own conversation history. It has no knowledge of your current chat context and receives only the prompt you provide.
- **Same configuration** — inherits your model, provider, toolsets, reasoning settings, and provider routing from the current gateway setup.
- **Non-blocking** — your main chat stays fully interactive. Send messages, run other commands, or start more background tasks while it works.
- **Result delivery** — when the task finishes, the result is sent back to the **same chat or channel** where you issued the command, prefixed with "✅ Background task complete". If it fails, you'll see "❌ Background task failed" with the error.
### Background Process Notifications
When the agent running a background session uses `terminal(background=true)` to start long-running processes (servers, builds, etc.), the gateway can push status updates to your chat. Control this with `display.background_process_notifications` in `~/.hermes/config.yaml`:
```yaml
display:
background_process_notifications: all # all | result | error | off
```
| Mode | What you receive |
|------|-----------------|
| `all` | Running-output updates **and** the final completion message (default) |
| `result` | Only the final completion message (regardless of exit code) |
| `error` | Only the final message when the exit code is non-zero |
| `off` | No process watcher messages at all |
You can also set this via environment variable:
```bash
HERMES_BACKGROUND_NOTIFICATIONS=result
```
### Use Cases
- **Server monitoring** — "/background Check the health of all services and alert me if anything is down"
- **Long builds** — "/background Build and deploy the staging environment" while you continue chatting
- **Research tasks** — "/background Research competitor pricing and summarize in a table"
- **File operations** — "/background Organize the photos in ~/Downloads by date into folders"
:::tip
Background tasks on messaging platforms are fire-and-forget — you don't need to wait or check on them. Results arrive in the same chat automatically when the task finishes.
:::
## Service Management
### Linux (systemd)