mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
docs: comprehensive update for recent merged PRs (#9019)
Audit and update documentation across 12 files to match changes from
~50 recently merged PRs. Key updates:
Slash commands (slash-commands.md):
- Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart
- Fix /status incorrectly labeled as messaging-only (available in both)
- Add --global flag to /model docs
- Add [focus topic] arg to /compress docs
CLI commands (cli-commands.md):
- Add hermes debug share section with options and examples
- Add hermes backup section with --quick and --label flags
- Add hermes import section
Feature docs:
- TTS: document global tts.speed and per-provider speed for Edge/OpenAI
- Web dashboard: add docs for 5 missing pages (Sessions, Logs,
Analytics, Cron, Skills) and 15+ API endpoints
- WhatsApp: add streaming, 4K chunking, and markdown formatting docs
- Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip
- Budget: document CLI notification on iteration budget exhaustion
Config migration (compression.summary_* → auxiliary.compression.*):
- Update configuration.md, environment-variables.md,
fallback-providers.md, cli.md, and context-compression-and-caching.md
- Replace legacy compression.summary_model/provider/base_url references
with auxiliary.compression.model/provider/base_url
- Add legacy migration info boxes explaining auto-migration
Minor fixes:
- wecom-callback.md: clarify 'text only' limitation (input only)
- Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
This commit is contained in:
parent
c449cd1af5
commit
4ca6668daf
12 changed files with 299 additions and 40 deletions
|
|
@ -441,11 +441,19 @@ compression:
|
|||
threshold: 0.50 # Compress at this % of context limit
|
||||
target_ratio: 0.20 # Fraction of threshold to preserve as recent tail
|
||||
protect_last_n: 20 # Min recent messages to keep uncompressed
|
||||
summary_model: "google/gemini-3-flash-preview" # Model for summarization
|
||||
summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
|
||||
summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
|
||||
|
||||
# The summarization model/provider is configured under auxiliary:
|
||||
auxiliary:
|
||||
compression:
|
||||
model: "google/gemini-3-flash-preview" # Model for summarization
|
||||
provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
|
||||
base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
|
||||
```
|
||||
|
||||
:::info Legacy config migration
|
||||
Older configs with `compression.summary_model`, `compression.summary_provider`, and `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17). No manual action needed.
|
||||
:::
|
||||
|
||||
### Common setups
|
||||
|
||||
**Default (auto-detect) — no configuration needed:**
|
||||
|
|
@ -458,30 +466,32 @@ Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Fl
|
|||
|
||||
**Force a specific provider** (OAuth or API-key based):
|
||||
```yaml
|
||||
compression:
|
||||
summary_provider: nous
|
||||
summary_model: gemini-3-flash
|
||||
auxiliary:
|
||||
compression:
|
||||
provider: nous
|
||||
model: gemini-3-flash
|
||||
```
|
||||
Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc.
|
||||
|
||||
**Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.):
|
||||
```yaml
|
||||
compression:
|
||||
summary_model: glm-4.7
|
||||
summary_base_url: https://api.z.ai/api/coding/paas/v4
|
||||
auxiliary:
|
||||
compression:
|
||||
model: glm-4.7
|
||||
base_url: https://api.z.ai/api/coding/paas/v4
|
||||
```
|
||||
Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth.
|
||||
|
||||
### How the three knobs interact
|
||||
|
||||
| `summary_provider` | `summary_base_url` | Result |
|
||||
| `auxiliary.compression.provider` | `auxiliary.compression.base_url` | Result |
|
||||
|---------------------|---------------------|--------|
|
||||
| `auto` (default) | not set | Auto-detect best available provider |
|
||||
| `nous` / `openrouter` / etc. | not set | Force that provider, use its auth |
|
||||
| any | set | Use the custom endpoint directly (provider ignored) |
|
||||
|
||||
:::warning Summary model context length requirement
|
||||
The `summary_model` **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override `summary_model`, verify its context length meets or exceeds your main model's.
|
||||
The summary model **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override the model, verify its context length meets or exceeds your main model's.
|
||||
:::
|
||||
|
||||
## Context Engine
|
||||
|
|
@ -522,6 +532,8 @@ agent:
|
|||
|
||||
Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.
|
||||
|
||||
When the iteration budget is fully exhausted, the CLI shows a notification to the user: `⚠ Iteration budget reached (90/90) — response may be incomplete`. If the budget runs out during active work, the agent generates a summary of what was accomplished before stopping.
|
||||
|
||||
### Streaming Timeouts
|
||||
|
||||
The LLM streaming connection has two timeout layers. Both auto-adjust for local providers (localhost, LAN IPs) — no configuration needed for most setups.
|
||||
|
|
@ -666,7 +678,7 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
|
|||
:::
|
||||
|
||||
:::info
|
||||
Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
|
||||
Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
|
||||
:::
|
||||
|
||||
### Changing the Vision Model
|
||||
|
|
@ -839,16 +851,21 @@ agent:
|
|||
|
||||
```yaml
|
||||
tts:
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts" | "minimax"
|
||||
speed: 1.0 # Global speed multiplier (fallback for all providers)
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
speed: 1.0 # Speed multiplier (converted to rate percentage, e.g. 1.5 → +50%)
|
||||
elevenlabs:
|
||||
voice_id: "pNInz6obpgDQGcFmaJgB"
|
||||
model_id: "eleven_multilingual_v2"
|
||||
openai:
|
||||
model: "gpt-4o-mini-tts"
|
||||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||||
speed: 1.0 # Speed multiplier (clamped to 0.25–4.0 by the API)
|
||||
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
|
||||
minimax:
|
||||
speed: 1.0 # Speech speed multiplier
|
||||
neutts:
|
||||
ref_audio: ''
|
||||
ref_text: ''
|
||||
|
|
@ -858,6 +875,8 @@ tts:
|
|||
|
||||
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
|
||||
|
||||
**Speed fallback hierarchy:** provider-specific speed (e.g. `tts.edge.speed`) → global `tts.speed` → `1.0` default. Set the global `tts.speed` to apply a uniform speed across all providers, or override per-provider for fine-grained control.
|
||||
|
||||
## Display Settings
|
||||
|
||||
```yaml
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue