mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
docs(fallback): document layered auxiliary fallback ladder
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-main (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-main (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
Adds a new 'Auxiliary Capacity-Error Fallback' section to website/docs/user-guide/features/fallback-providers.md covering: - The 4-step ladder (primary → fallback_chain → main agent → warn) - Which errors trigger fallback (402, 429 quota, connection) vs which respect explicit provider choice (transient 429 rate limits) - Optional fallback_chain config schema with vision + compression examples - Recognized quota-error phrases (Bedrock, Vertex AI, generic) Updates the bottom summary table — every auxiliary task now shows 'Layered (see above)' instead of 'Auto-detection chain' since explicit-provider users also get the main-agent safety net.
This commit is contained in:
parent
766f263bd2
commit
43e566f77e
1 changed files with 60 additions and 9 deletions
|
|
@ -320,6 +320,55 @@ auxiliary:
|
|||
|
||||
---
|
||||
|
||||
## Auxiliary Capacity-Error Fallback
|
||||
|
||||
When you set an explicit auxiliary provider (e.g. `auxiliary.vision.provider: glm`), Hermes treats that as your preferred choice — but if the provider literally cannot serve the request because of a **capacity error** (HTTP 402 payment required, HTTP 429 daily-quota exhaustion, connection failure), Hermes falls back through a layered chain instead of failing silently:
|
||||
|
||||
1. **Primary aux provider** — the one you configured (tried first, always)
|
||||
2. **`auxiliary.<task>.fallback_chain`** — your per-task override list, if you wrote one
|
||||
3. **Main agent provider + model** — last-resort safety net (always tried, even if you didn't write a chain)
|
||||
4. **Warn + re-raise** — if every layer fails, Hermes logs `Auxiliary <task>: ... all fallbacks exhausted` at WARNING level and re-raises the original error
|
||||
|
||||
Transient HTTP 429 rate limits (`Retry-After: ...`) are treated as request constraints, not capacity problems — they respect your explicit provider choice and do **not** trigger the fallback ladder. Only daily/monthly quota exhaustion, payment errors, and connection failures bypass the explicit-provider gate.
|
||||
|
||||
For users on `provider: auto` (no explicit aux provider), the existing auto-detection chain runs in place of steps 2–3. Its first step is already the main agent model, so `auto` users get the same outcome with zero config.
|
||||
|
||||
### Optional: per-task fallback chain
|
||||
|
||||
If you want a different fallback ordering than "main agent model first", configure `fallback_chain` explicitly. Each entry needs at least `provider`; `model`, `base_url`, and `api_key` are optional.
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
vision:
|
||||
provider: glm
|
||||
model: glm-4v-flash
|
||||
fallback_chain:
|
||||
- provider: openrouter
|
||||
model: google/gemini-3-flash-preview
|
||||
- provider: nous
|
||||
model: anthropic/claude-sonnet-4
|
||||
|
||||
compression:
|
||||
provider: openrouter
|
||||
fallback_chain:
|
||||
- provider: openai
|
||||
model: gpt-4o-mini
|
||||
```
|
||||
|
||||
You do **not** need to configure `fallback_chain` to get fallback — the main-agent safety net runs regardless. Use it only when you specifically want a different order than the default.
|
||||
|
||||
### Provider quota errors that trigger fallback
|
||||
|
||||
Hermes recognizes these as capacity-equivalent to 402 credit exhaustion (not transient rate limits):
|
||||
|
||||
- Bedrock / LiteLLM: `Too many tokens per day`, `daily limit`, `tokens per day`
|
||||
- Vertex AI / GCP: `quota exceeded`, `resource exhausted`, `RESOURCE_EXHAUSTED`
|
||||
- Generic: `daily quota`, `quota_exceeded`
|
||||
|
||||
If your provider returns a different phrase for daily-quota exhaustion and Hermes doesn't trigger fallback, that's a bug — open an issue with the exact error string.
|
||||
|
||||
---
|
||||
|
||||
## Context Compression Fallback
|
||||
|
||||
Context compression uses the `auxiliary.compression` config block to control which model and provider handles summarization:
|
||||
|
|
@ -378,14 +427,16 @@ See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configurat
|
|||
| Feature | Fallback Mechanism | Config Location |
|
||||
|---------|-------------------|----------------|
|
||||
| Main agent model | `fallback_model` in config.yaml — per-turn failover on errors (primary restored each turn) | `fallback_model:` (top-level) |
|
||||
| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
|
||||
| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
|
||||
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` |
|
||||
| Session search | Auto-detection chain | `auxiliary.session_search` |
|
||||
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
|
||||
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
|
||||
| Approval classification | Auto-detection chain | `auxiliary.approval` |
|
||||
| Title generation | Auto-detection chain | `auxiliary.title_generation` |
|
||||
| Triage specifier | Auto-detection chain | `auxiliary.triage_specifier` |
|
||||
| Auxiliary tasks (any) — auto users | Full auto-detection chain (main agent model first, then provider chain) on capacity errors | `auxiliary.<task>.provider: auto` |
|
||||
| Auxiliary tasks (any) — explicit provider | `fallback_chain` (if set) → main agent model → warn + raise, on capacity errors only | `auxiliary.<task>.fallback_chain` |
|
||||
| Vision | Layered (see above) + internal OpenRouter retry | `auxiliary.vision` |
|
||||
| Web extraction | Layered (see above) + internal OpenRouter retry | `auxiliary.web_extract` |
|
||||
| Context compression | Layered (see above); degrades to no-summary if all layers unavailable | `auxiliary.compression` |
|
||||
| Session search | Layered (see above) | `auxiliary.session_search` |
|
||||
| Skills hub | Layered (see above) | `auxiliary.skills_hub` |
|
||||
| MCP helpers | Layered (see above) | `auxiliary.mcp` |
|
||||
| Approval classification | Layered (see above) | `auxiliary.approval` |
|
||||
| Title generation | Layered (see above) | `auxiliary.title_generation` |
|
||||
| Triage specifier | Layered (see above) | `auxiliary.triage_specifier` |
|
||||
| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
|
||||
| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue