docs(config): document auxiliary task fallback_chain

2026-06-23 10:42:00 +00:00 · 2026-06-16 00:54:08 +05:30 · 2026-06-16 00:54:08 +05:30 · cc30e0b659
commit cc30e0b659
parent 5eb158e317
3 changed files with 77 additions and 7 deletions
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -1006,6 +1006,13 @@ auxiliary:
  # Context compression timeout (separate from compression.* config)
  compression:
    timeout: 120               # seconds — compression summarizes long conversations, needs more time
+    # fallback_chain:           # Optional — providers to try on rate-limit / connectivity failure
+    #   - provider: nous
+    #     model: deepseek/deepseek-chat
+    #   - provider: openrouter
+    #     model: google/gemini-2.5-flash
+    #     base_url: ""
+    #     api_key: ""

  # Auto-generated session titles. Empty language follows the conversation;
  # set e.g. "English" or "Japanese" to pin titles to one language.
@ -1054,6 +1061,34 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
 Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The primary fallback chain uses a top-level `fallback_providers:` list — see [Fallback Providers](/integrations/providers#fallback-providers). All three follow the same provider/model/base_url pattern.
 :::

+### Per-task fallback chain for auxiliary tasks
+
+Each auxiliary task can optionally define a `fallback_chain` — a list of provider/model entries that Hermes tries when the primary auxiliary provider fails due to rate limits, connectivity issues, or payment restrictions:
+
+```yaml
+auxiliary:
+  compression:
+    provider: openrouter
+    model: openai/gpt-4o-mini
+    fallback_chain:
+      - provider: nous
+        model: deepseek/deepseek-chat
+      - provider: openrouter
+        model: google/gemini-2.5-flash
+```
+
+When the primary auxiliary provider (`openrouter` / `openai/gpt-4o-mini`) returns a rate-limit, connection timeout, or payment-required error, Hermes walks the `fallback_chain` in order. It skips entries whose provider matches the already-failed provider, and tries each remaining entry until one succeeds or the chain is exhausted. If all fallbacks fail, Hermes falls back to the main agent model as a final safety net.
+
+Each entry supports the same three knobs as any auxiliary task config:
+
+| Key | Description |
+|-----|-------------|
+| `provider` | Provider name (`nous`, `openrouter`, `anthropic`, `gemini`, `main`, etc.) |
+| `model` | Model name for that provider |
+| `base_url` | (Optional) Custom OpenAI-compatible endpoint |
+
+`fallback_chain` is available on any auxiliary task — `compression`, `vision`, `web_extract`, `approval`, `skills_hub`, `mcp`, etc.
+
 ### OpenRouter routing & Pareto Code for auxiliary tasks

 When an auxiliary task resolves to OpenRouter (either explicitly or via `provider: "main"` while your main agent is on OpenRouter), the main agent's `provider_routing` and `openrouter.min_coding_score` settings **do not propagate** — by design, each auxiliary task is independent. To set OpenRouter provider preferences or use the [Pareto Code router](/integrations/providers#openrouter-pareto-code-router) for a specific aux task, set them per-task via `extra_body`:
--- a/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/user-guide/configuration.md
+++ b/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/user-guide/configuration.md
@ -820,6 +820,13 @@ auxiliary:
  # 上下文压缩超时（与 compression.* 配置分开）
  compression:
    timeout: 120               # 秒 —— 压缩摘要长对话，需要更多时间
+    # fallback_chain:           # 可选 —— 发生速率限制/连接故障时尝试的 provider
+    #   - provider: nous
+    #     model: deepseek/deepseek-chat
+    #   - provider: openrouter
+    #     model: google/gemini-2.5-flash
+    #     base_url: ""
+    #     api_key: ""

  # 技能中心 —— 技能匹配和搜索
  skills_hub:
@ -855,9 +862,37 @@ auxiliary:
 :::

 :::info
-上下文压缩有自己的 `compression:` 块用于阈值，以及 `auxiliary.compression:` 块用于模型/provider 设置 —— 参阅上方的[上下文压缩](#context-compression)。回退模型使用 `fallback_model:` 块 —— 参阅[回退模型](/integrations/providers#fallback-model)。三者都遵循相同的 provider/model/base_url 模式。
+上下文压缩有自己的 `compression:` 块用于阈值，以及 `auxiliary.compression:` 块用于模型/provider 设置 —— 参阅上方的[上下文压缩](#context-compression)。主备用链使用顶层的 `fallback_providers:` 列表 —— 参阅[备用提供商](/integrations/providers#fallback-providers)。三者都遵循相同的 provider/model/base_url 模式。
 :::

+### 辅助任务的每任务回退链
+
+每个辅助任务都可以选择性地定义一个 `fallback_chain` —— 一个 provider/model 条目列表，当主要辅助 provider 因速率限制、网络连接问题或付费限制而失败时，Hermes 会尝试使用该列表：
+
+```yaml
+auxiliary:
+  compression:
+    provider: openrouter
+    model: openai/gpt-4o-mini
+    fallback_chain:
+      - provider: nous
+        model: deepseek/deepseek-chat
+      - provider: openrouter
+        model: google/gemini-2.5-flash
+```
+
+当主要辅助 provider（`openrouter` / `openai/gpt-4o-mini`）返回速率限制、连接超时或需要付费错误时，Hermes 将依次遍历 `fallback_chain`。它会跳过 provider 与已失败 provider 相同的条目，并尝试每个剩余条目，直到有一个成功或该链耗尽。如果所有回退都失败，Hermes 会回退到主 agent 模型作为最终的安全网。
+
+每个条目支持与任何辅助任务配置相同的三个旋钮：
+
+| 键 | 描述 |
+|-----|-------------|
+| `provider` | Provider 名称（`nous`、`openrouter`、`anthropic`、`gemini`、`main` 等） |
+| `model` | 该 provider 的模型名称 |
+| `base_url` | （可选）自定义 OpenAI 兼容端点 |
+
+`fallback_chain` 适用于任何辅助任务 —— `compression`、`vision`、`web_extract`、`approval`、`skills_hub`、`mcp` 等。
+
 ### OpenRouter 路由和辅助任务的 Pareto Code

 当辅助任务解析到 OpenRouter（显式或通过 `provider: "main"` 而您的主 agent 在 OpenRouter 上）时，主 agent 的 `provider_routing` 和 `openrouter.min_coding_score` 设置**不会传播** —— 按设计，每个辅助任务是独立的。要为特定辅助任务设置 OpenRouter provider 偏好或使用 [Pareto Code 路由器](/integrations/providers#openrouter-pareto-code-router)，请通过 `extra_body` 按任务设置：
--- a/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/user-guide/features/fallback-providers.md
+++ b/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/user-guide/features/fallback-providers.md
@ -166,12 +166,12 @@ fallback_model:
 |---------|-------------------|
 | CLI 会话 | ✔ |
 | 消息网关（Telegram、Discord 等） | ✔ |
-| 子 Agent 委派 | ✘（子 Agent 不继承备用配置） |
-| Cron 任务 | ✘（使用固定提供商运行） |
+| 子 Agent 委派 | ✔（子 Agent 继承父 Agent 的备用链） |
+| Cron 任务 | ✔（Cron Agent 继承配置的备用提供商） |
 | 辅助任务（视觉、压缩等） | ✘（使用各自的提供商链——见下文） |

 :::tip
-`fallback_model` 没有对应的环境变量——它只能通过 `config.yaml` 配置。这是有意为之：备用配置是一个经过深思熟虑的选择，不应被过期的 shell 导出变量覆盖。
+没有针对主备用链的环境变量——只能通过 `config.yaml` 或 `hermes fallback` 进行配置。这是有意为之：备用配置是一个经过深思熟虑的选择，不应被过期的 shell 导出变量覆盖。
 :::

 ---
@ -362,7 +362,7 @@ auxiliary:

 ## 委派提供商覆盖

-由 `delegate_task` 生成的子 Agent **不会**使用主备用模型。但可以将它们路由到不同的提供商:模型对以优化成本：
+由 `delegate_task` 生成的子 Agent 会继承父 Agent 的主备用链。你仍然可以将子 Agent 路由到不同的主提供商:模型对以进行成本优化：

 ```yaml
 delegation:
@ -378,7 +378,7 @@ delegation:

 ## Cron 任务提供商

-Cron 任务使用执行时配置的提供商运行，不支持备用模型。若要为 Cron 任务使用不同的提供商，请在 Cron 任务本身上配置 `provider` 和 `model` 覆盖：
+Cron 任务在创建 Agent 时会继承你配置的 `fallback_providers` 链（或旧版 `fallback_model`）。要为 Cron 任务使用不同的主提供商，请在 Cron 任务本身配置 `provider` 和 `model` 覆盖：

 ```python
 cronjob(
@ -398,7 +398,7 @@ cronjob(

 | 功能 | 备用机制 | 配置位置 |
 |---------|-------------------|----------------|
-| 主 Agent 模型 | `fallback_model`（config.yaml 中）——出错时按轮次故障转移（每轮次恢复主模型） | `fallback_model:`（顶层） |
+| 主 Agent 模型 | `fallback_providers`（config.yaml 中）——出错时按轮次故障转移（每轮次恢复主模型） | `fallback_providers:`（顶层列表） |
 | 辅助任务（任意）— auto 用户 | 容量错误时完整自动检测链（主 Agent 模型优先，然后提供商链） | `auxiliary.<task>.provider: auto` |
 | 辅助任务（任意）— 显式提供商 | `fallback_chain`（若已设置）→ 主 Agent 模型 → 警告 + 抛出，仅在容量错误时触发 | `auxiliary.<task>.fallback_chain` |
 | 视觉 | 分层（见上文）+ 内部 OpenRouter 重试 | `auxiliary.vision` |