docs: fallback providers + /background command documentation

* docs: comprehensive fallback providers documentation - New dedicated page: user-guide/features/fallback-providers.md covering both primary model fallback and auxiliary task fallback systems - Updated configuration.md with fallback_model config section - Updated environment-variables.md noting fallback is config-only - Fleshed out developer-guide/provider-runtime.md fallback section with internal architecture details (trigger points, activation flow, config flow) - Added cross-reference from provider-routing.md distinguishing OpenRouter sub-provider routing from Hermes-level model fallback - Added new page to sidebar under Integrations * docs: comprehensive /background command documentation - Added Background Sessions section to cli.md covering how it works (daemon threads, isolated sessions, config inheritance, Rich panel output, bell notification, concurrent tasks) - Added Background Sessions section to messaging/index.md covering messaging-specific behavior (async execution, result delivery back to same chat, fire-and-forget pattern) - Documented background_process_notifications config (all/result/error/off) in messaging docs and configuration.md - Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page - Fixed inconsistency in slash-commands.md: /background was listed as messaging-only but works in both CLI and messaging. Moved it to the 'both surfaces' note. - Expanded one-liner table descriptions with detail and cross-references
2026-04-26 01:01:40 +00:00 · 2026-03-15 06:24:28 -07:00 · 2026-03-15 06:24:28 -07:00 · 463239ed85
commit 463239ed85
parent 60cce9ca6d
9 changed files with 495 additions and 5 deletions
--- a/website/docs/user-guide/cli.md
+++ b/website/docs/user-guide/cli.md
@ -259,6 +259,55 @@ compression:

 When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.

+## Background Sessions
+
+Run a prompt in a separate background session while continuing to use the CLI for other work:
+
+```
+/background Analyze the logs in /var/log and summarize any errors from today
+```
+
+Hermes immediately confirms the task and gives you back the prompt:
+
+```
+🔄 Background task #1 started: "Analyze the logs in /var/log and summarize..."
+   Task ID: bg_143022_a1b2c3
+```
+
+### How It Works
+
+Each `/background` prompt spawns a **completely separate agent session** in a daemon thread:
+
+- **Isolated conversation** — the background agent has no knowledge of your current session's history. It receives only the prompt you provide.
+- **Same configuration** — the background agent inherits your model, provider, toolsets, reasoning settings, and fallback model from the current session.
+- **Non-blocking** — your foreground session stays fully interactive. You can chat, run commands, or even start more background tasks.
+- **Multiple tasks** — you can run several background tasks simultaneously. Each gets a numbered ID.
+
+### Results
+
+When a background task finishes, the result appears as a panel in your terminal:
+
+```
+╭─ ⚕ Hermes (background #1) ──────────────────────────────────╮
+│ Found 3 errors in syslog from today:                         │
+│ 1. OOM killer invoked at 03:22 — killed process nginx        │
+│ 2. Disk I/O error on /dev/sda1 at 07:15                      │
+│ 3. Failed SSH login attempts from 192.168.1.50 at 14:30      │
+╰──────────────────────────────────────────────────────────────╯
+```
+
+If the task fails, you'll see an error notification instead. If `display.bell_on_complete` is enabled in your config, the terminal bell rings when the task finishes.
+
+### Use Cases
+
+- **Long-running research** — "/background research the latest developments in quantum error correction" while you work on code
+- **File processing** — "/background analyze all Python files in this repo and list any security issues" while you continue a conversation
+- **Parallel investigations** — start multiple background tasks to explore different angles simultaneously
+
+:::info
+Background sessions do not appear in your main conversation history. They are standalone sessions with their own task ID (e.g., `bg_143022_a1b2c3`).
+:::
+
 ## Quiet Mode

 By default, the CLI runs in quiet mode which:
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -421,6 +421,26 @@ provider_routing:

 **Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.

+## Fallback Model
+
+Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
+
+```yaml
+fallback_model:
+  provider: openrouter                    # required
+  model: anthropic/claude-sonnet-4        # required
+  # base_url: http://localhost:8000/v1    # optional, for custom endpoints
+  # api_key_env: MY_CUSTOM_KEY           # optional, env var name for custom endpoint API key
+```
+
+When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
+
+Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
+
+:::tip
+Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
+:::
+
 ## Terminal Backend Configuration

 Configure which environment the agent uses for terminal commands:
@ -733,6 +753,7 @@ display:
  resume_display: full    # full (show previous messages on resume) | minimal (one-liner only)
  bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
  show_reasoning: false   # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
+  background_process_notifications: all  # all | result | error | off (gateway only)
 ```

 | Mode | What you see |
--- a/website/docs/user-guide/features/fallback-providers.md
+++ b/website/docs/user-guide/features/fallback-providers.md
@ -0,0 +1,311 @@
+---
+title: Fallback Providers
+description: Configure automatic failover to backup LLM providers when your primary model is unavailable.
+sidebar_label: Fallback Providers
+sidebar_position: 8
+---
+
+# Fallback Providers
+
+Hermes Agent has two separate fallback systems that keep your sessions running when providers hit issues:
+
+1. **Primary model fallback** — automatically switches to a backup provider:model when your main model fails
+2. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction
+
+Both are optional and work independently.
+
+## Primary Model Fallback
+
+When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.
+
+### Configuration
+
+Add a `fallback_model` section to `~/.hermes/config.yaml`:
+
+```yaml
+fallback_model:
+  provider: openrouter
+  model: anthropic/claude-sonnet-4
+```
+
+Both `provider` and `model` are **required**. If either is missing, the fallback is disabled.
+
+### Supported Providers
+
+| Provider | Value | Requirements |
+|----------|-------|-------------|
+| OpenRouter | `openrouter` | `OPENROUTER_API_KEY` |
+| Nous Portal | `nous` | `hermes login` (OAuth) |
+| OpenAI Codex | `openai-codex` | `hermes model` (ChatGPT OAuth) |
+| Anthropic | `anthropic` | `ANTHROPIC_API_KEY` or Claude Code credentials |
+| z.ai / GLM | `zai` | `GLM_API_KEY` |
+| Kimi / Moonshot | `kimi-coding` | `KIMI_API_KEY` |
+| MiniMax | `minimax` | `MINIMAX_API_KEY` |
+| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
+| Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) |
+
+### Custom Endpoint Fallback
+
+For a custom OpenAI-compatible endpoint, add `base_url` and optionally `api_key_env`:
+
+```yaml
+fallback_model:
+  provider: custom
+  model: my-local-model
+  base_url: http://localhost:8000/v1
+  api_key_env: MY_LOCAL_KEY          # env var name containing the API key
+```
+
+### When Fallback Triggers
+
+The fallback activates automatically when the primary model fails with:
+
+- **Rate limits** (HTTP 429) — after exhausting retry attempts
+- **Server errors** (HTTP 500, 502, 503) — after exhausting retry attempts
+- **Auth failures** (HTTP 401, 403) — immediately (no point retrying)
+- **Not found** (HTTP 404) — immediately
+- **Invalid responses** — when the API returns malformed or empty responses repeatedly
+
+When triggered, Hermes:
+
+1. Resolves credentials for the fallback provider
+2. Builds a new API client
+3. Swaps the model, provider, and client in-place
+4. Resets the retry counter and continues the conversation
+
+The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.
+
+:::info One-Shot
+Fallback activates **at most once** per session. If the fallback provider also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops.
+:::
+
+### Examples
+
+**OpenRouter as fallback for Anthropic native:**
+```yaml
+model:
+  provider: anthropic
+  default: claude-sonnet-4-6
+
+fallback_model:
+  provider: openrouter
+  model: anthropic/claude-sonnet-4
+```
+
+**Nous Portal as fallback for OpenRouter:**
+```yaml
+model:
+  provider: openrouter
+  default: anthropic/claude-opus-4
+
+fallback_model:
+  provider: nous
+  model: nous-hermes-3
+```
+
+**Local model as fallback for cloud:**
+```yaml
+fallback_model:
+  provider: custom
+  model: llama-3.1-70b
+  base_url: http://localhost:8000/v1
+  api_key_env: LOCAL_API_KEY
+```
+
+**Codex OAuth as fallback:**
+```yaml
+fallback_model:
+  provider: openai-codex
+  model: gpt-5.3-codex
+```
+
+### Where Fallback Works
+
+| Context | Fallback Supported |
+|---------|-------------------|
+| CLI sessions | ✔ |
+| Messaging gateway (Telegram, Discord, etc.) | ✔ |
+| Subagent delegation | ✘ (subagents do not inherit fallback config) |
+| Cron jobs | ✘ (run with a fixed provider) |
+| Auxiliary tasks (vision, compression) | ✘ (use their own provider chain — see below) |
+
+:::tip
+There are no environment variables for `fallback_model` — it is configured exclusively through `config.yaml`. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override.
+:::
+
+---
+
+## Auxiliary Task Fallback
+
+Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.
+
+### Tasks with Independent Provider Resolution
+
+| Task | What It Does | Config Key |
+|------|-------------|-----------|
+| Vision | Image analysis, browser screenshots | `auxiliary.vision` |
+| Web Extract | Web page summarization | `auxiliary.web_extract` |
+| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
+| Session Search | Past session summarization | `auxiliary.session_search` |
+| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
+| MCP | MCP helper operations | `auxiliary.mcp` |
+| Memory Flush | Memory consolidation | `auxiliary.flush_memories` |
+
+### Auto-Detection Chain
+
+When a task's provider is set to `"auto"` (the default), Hermes tries providers in order until one works:
+
+**For text tasks (compression, web extract, etc.):**
+
+```text
+OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
+API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up
+```
+
+**For vision tasks:**
+
+```text
+Main provider (if vision-capable) → OpenRouter → Nous Portal →
+Codex OAuth → Anthropic → Custom endpoint → give up
+```
+
+If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit `base_url` is set, it tries OpenRouter as a last-resort fallback.
+
+### Configuring Auxiliary Providers
+
+Each task can be configured independently in `config.yaml`:
+
+```yaml
+auxiliary:
+  vision:
+    provider: "auto"              # auto | openrouter | nous | codex | main | anthropic
+    model: ""                     # e.g. "openai/gpt-4o"
+    base_url: ""                  # direct endpoint (takes precedence over provider)
+    api_key: ""                   # API key for base_url
+
+  web_extract:
+    provider: "auto"
+    model: ""
+
+  compression:
+    provider: "auto"
+    model: ""
+
+  session_search:
+    provider: "auto"
+    model: ""
+
+  skills_hub:
+    provider: "auto"
+    model: ""
+
+  mcp:
+    provider: "auto"
+    model: ""
+
+  flush_memories:
+    provider: "auto"
+    model: ""
+```
+
+Or via environment variables:
+
+```bash
+AUXILIARY_VISION_PROVIDER=openrouter
+AUXILIARY_VISION_MODEL=openai/gpt-4o
+AUXILIARY_WEB_EXTRACT_PROVIDER=nous
+CONTEXT_COMPRESSION_PROVIDER=main
+CONTEXT_COMPRESSION_MODEL=google/gemini-3-flash-preview
+```
+
+### Provider Options for Auxiliary Tasks
+
+| Provider | Description | Requirements |
+|----------|-------------|-------------|
+| `"auto"` | Try providers in order until one works (default) | At least one provider configured |
+| `"openrouter"` | Force OpenRouter | `OPENROUTER_API_KEY` |
+| `"nous"` | Force Nous Portal | `hermes login` |
+| `"codex"` | Force Codex OAuth | `hermes model` → Codex |
+| `"main"` | Use whatever provider the main agent uses | Active main provider configured |
+| `"anthropic"` | Force Anthropic native | `ANTHROPIC_API_KEY` or Claude Code credentials |
+
+### Direct Endpoint Override
+
+For any auxiliary task, setting `base_url` bypasses provider resolution entirely and sends requests directly to that endpoint:
+
+```yaml
+auxiliary:
+  vision:
+    base_url: "http://localhost:1234/v1"
+    api_key: "local-key"
+    model: "qwen2.5-vl"
+```
+
+`base_url` takes precedence over `provider`. Hermes uses the configured `api_key` for authentication, falling back to `OPENAI_API_KEY` if not set. It does **not** reuse `OPENROUTER_API_KEY` for custom endpoints.
+
+---
+
+## Context Compression Fallback
+
+Context compression has a legacy configuration path in addition to the auxiliary system:
+
+```yaml
+compression:
+  summary_provider: "auto"                    # auto | openrouter | nous | main
+  summary_model: "google/gemini-3-flash-preview"
+```
+
+This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
+
+If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
+
+---
+
+## Delegation Provider Override
+
+Subagents spawned by `delegate_task` do **not** use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:
+
+```yaml
+delegation:
+  provider: "openrouter"                      # override provider for all subagents
+  model: "google/gemini-3-flash-preview"      # override model
+  # base_url: "http://localhost:1234/v1"      # or use a direct endpoint
+  # api_key: "local-key"
+```
+
+See [Subagent Delegation](/docs/user-guide/features/delegation) for full configuration details.
+
+---
+
+## Cron Job Providers
+
+Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure `provider` and `model` overrides on the cron job itself:
+
+```python
+cronjob(
+    action="create",
+    schedule="every 2h",
+    prompt="Check server status",
+    provider="openrouter",
+    model="google/gemini-3-flash-preview"
+)
+```
+
+See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configuration details.
+
+---
+
+## Summary
+
+| Feature | Fallback Mechanism | Config Location |
+|---------|-------------------|----------------|
+| Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
+| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
+| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
+| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
+| Session search | Auto-detection chain | `auxiliary.session_search` |
+| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
+| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
+| Memory flush | Auto-detection chain | `auxiliary.flush_memories` |
+| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
+| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |
--- a/website/docs/user-guide/features/provider-routing.md
+++ b/website/docs/user-guide/features/provider-routing.md
@ -194,3 +194,7 @@ provider_routing:
 ## Default Behavior

 When no `provider_routing` section is configured (the default), OpenRouter uses its own default routing logic, which generally balances cost and availability automatically.
+
+:::tip Provider Routing vs. Fallback Models
+Provider routing controls which **sub-providers within OpenRouter** handle your requests. For automatic failover to an entirely different provider when your primary model fails, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
+:::
--- a/website/docs/user-guide/messaging/index.md
+++ b/website/docs/user-guide/messaging/index.md
@ -181,6 +181,63 @@ When enabled, the bot sends status messages as it works:
 🐍 execute_code...
 ```

+## Background Sessions
+
+Run a prompt in a separate background session so the agent works on it independently while your main chat stays responsive:
+
+```
+/background Check all servers in the cluster and report any that are down
+```
+
+Hermes confirms immediately:
+
+```
+🔄 Background task started: "Check all servers in the cluster..."
+   Task ID: bg_143022_a1b2c3
+```
+
+### How It Works
+
+Each `/background` prompt spawns a **separate agent instance** that runs asynchronously:
+
+- **Isolated session** — the background agent has its own session with its own conversation history. It has no knowledge of your current chat context and receives only the prompt you provide.
+- **Same configuration** — inherits your model, provider, toolsets, reasoning settings, and provider routing from the current gateway setup.
+- **Non-blocking** — your main chat stays fully interactive. Send messages, run other commands, or start more background tasks while it works.
+- **Result delivery** — when the task finishes, the result is sent back to the **same chat or channel** where you issued the command, prefixed with "✅ Background task complete". If it fails, you'll see "❌ Background task failed" with the error.
+
+### Background Process Notifications
+
+When the agent running a background session uses `terminal(background=true)` to start long-running processes (servers, builds, etc.), the gateway can push status updates to your chat. Control this with `display.background_process_notifications` in `~/.hermes/config.yaml`:
+
+```yaml
+display:
+  background_process_notifications: all    # all | result | error | off
+```
+
+| Mode | What you receive |
+|------|-----------------|
+| `all` | Running-output updates **and** the final completion message (default) |
+| `result` | Only the final completion message (regardless of exit code) |
+| `error` | Only the final message when the exit code is non-zero |
+| `off` | No process watcher messages at all |
+
+You can also set this via environment variable:
+
+```bash
+HERMES_BACKGROUND_NOTIFICATIONS=result
+```
+
+### Use Cases
+
+- **Server monitoring** — "/background Check the health of all services and alert me if anything is down"
+- **Long builds** — "/background Build and deploy the staging environment" while you continue chatting
+- **Research tasks** — "/background Research competitor pricing and summarize in a table"
+- **File operations** — "/background Organize the photos in ~/Downloads by date into folders"
+
+:::tip
+Background tasks on messaging platforms are fire-and-forget — you don't need to wait or check on them. Results arrive in the same chat automatically when the task finishes.
+:::
+
 ## Service Management

 ### Linux (systemd)