docs(fallback): document layered auxiliary fallback ladder
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-main (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run

Adds a new 'Auxiliary Capacity-Error Fallback' section to
website/docs/user-guide/features/fallback-providers.md covering:

- The 4-step ladder (primary → fallback_chain → main agent → warn)
- Which errors trigger fallback (402, 429 quota, connection) vs
  which respect explicit provider choice (transient 429 rate limits)
- Optional fallback_chain config schema with vision + compression examples
- Recognized quota-error phrases (Bedrock, Vertex AI, generic)

Updates the bottom summary table — every auxiliary task now shows
'Layered (see above)' instead of 'Auto-detection chain' since
explicit-provider users also get the main-agent safety net.
This commit is contained in:
teknium1 2026-05-17 16:53:11 -07:00 committed by Teknium
parent 766f263bd2
commit 43e566f77e

View file

@ -320,6 +320,55 @@ auxiliary:
---
## Auxiliary Capacity-Error Fallback
When you set an explicit auxiliary provider (e.g. `auxiliary.vision.provider: glm`), Hermes treats that as your preferred choice — but if the provider literally cannot serve the request because of a **capacity error** (HTTP 402 payment required, HTTP 429 daily-quota exhaustion, connection failure), Hermes falls back through a layered chain instead of failing silently:
1. **Primary aux provider** — the one you configured (tried first, always)
2. **`auxiliary.<task>.fallback_chain`** — your per-task override list, if you wrote one
3. **Main agent provider + model** — last-resort safety net (always tried, even if you didn't write a chain)
4. **Warn + re-raise** — if every layer fails, Hermes logs `Auxiliary <task>: ... all fallbacks exhausted` at WARNING level and re-raises the original error
Transient HTTP 429 rate limits (`Retry-After: ...`) are treated as request constraints, not capacity problems — they respect your explicit provider choice and do **not** trigger the fallback ladder. Only daily/monthly quota exhaustion, payment errors, and connection failures bypass the explicit-provider gate.
For users on `provider: auto` (no explicit aux provider), the existing auto-detection chain runs in place of steps 23. Its first step is already the main agent model, so `auto` users get the same outcome with zero config.
### Optional: per-task fallback chain
If you want a different fallback ordering than "main agent model first", configure `fallback_chain` explicitly. Each entry needs at least `provider`; `model`, `base_url`, and `api_key` are optional.
```yaml
auxiliary:
vision:
provider: glm
model: glm-4v-flash
fallback_chain:
- provider: openrouter
model: google/gemini-3-flash-preview
- provider: nous
model: anthropic/claude-sonnet-4
compression:
provider: openrouter
fallback_chain:
- provider: openai
model: gpt-4o-mini
```
You do **not** need to configure `fallback_chain` to get fallback — the main-agent safety net runs regardless. Use it only when you specifically want a different order than the default.
### Provider quota errors that trigger fallback
Hermes recognizes these as capacity-equivalent to 402 credit exhaustion (not transient rate limits):
- Bedrock / LiteLLM: `Too many tokens per day`, `daily limit`, `tokens per day`
- Vertex AI / GCP: `quota exceeded`, `resource exhausted`, `RESOURCE_EXHAUSTED`
- Generic: `daily quota`, `quota_exceeded`
If your provider returns a different phrase for daily-quota exhaustion and Hermes doesn't trigger fallback, that's a bug — open an issue with the exact error string.
---
## Context Compression Fallback
Context compression uses the `auxiliary.compression` config block to control which model and provider handles summarization:
@ -378,14 +427,16 @@ See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configurat
| Feature | Fallback Mechanism | Config Location |
|---------|-------------------|----------------|
| Main agent model | `fallback_model` in config.yaml — per-turn failover on errors (primary restored each turn) | `fallback_model:` (top-level) |
| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` |
| Session search | Auto-detection chain | `auxiliary.session_search` |
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
| Approval classification | Auto-detection chain | `auxiliary.approval` |
| Title generation | Auto-detection chain | `auxiliary.title_generation` |
| Triage specifier | Auto-detection chain | `auxiliary.triage_specifier` |
| Auxiliary tasks (any) — auto users | Full auto-detection chain (main agent model first, then provider chain) on capacity errors | `auxiliary.<task>.provider: auto` |
| Auxiliary tasks (any) — explicit provider | `fallback_chain` (if set) → main agent model → warn + raise, on capacity errors only | `auxiliary.<task>.fallback_chain` |
| Vision | Layered (see above) + internal OpenRouter retry | `auxiliary.vision` |
| Web extraction | Layered (see above) + internal OpenRouter retry | `auxiliary.web_extract` |
| Context compression | Layered (see above); degrades to no-summary if all layers unavailable | `auxiliary.compression` |
| Session search | Layered (see above) | `auxiliary.session_search` |
| Skills hub | Layered (see above) | `auxiliary.skills_hub` |
| MCP helpers | Layered (see above) | `auxiliary.mcp` |
| Approval classification | Layered (see above) | `auxiliary.approval` |
| Title generation | Layered (see above) | `auxiliary.title_generation` |
| Triage specifier | Layered (see above) | `auxiliary.triage_specifier` |
| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |