docs: comprehensive update for recent merged PRs (#9019)

Audit and update documentation across 12 files to match changes from
~50 recently merged PRs. Key updates:

Slash commands (slash-commands.md):
- Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart
- Fix /status incorrectly labeled as messaging-only (available in both)
- Add --global flag to /model docs
- Add [focus topic] arg to /compress docs

CLI commands (cli-commands.md):
- Add hermes debug share section with options and examples
- Add hermes backup section with --quick and --label flags
- Add hermes import section

Feature docs:
- TTS: document global tts.speed and per-provider speed for Edge/OpenAI
- Web dashboard: add docs for 5 missing pages (Sessions, Logs,
  Analytics, Cron, Skills) and 15+ API endpoints
- WhatsApp: add streaming, 4K chunking, and markdown formatting docs
- Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip
- Budget: document CLI notification on iteration budget exhaustion

Config migration (compression.summary_* → auxiliary.compression.*):
- Update configuration.md, environment-variables.md,
  fallback-providers.md, cli.md, and context-compression-and-caching.md
- Replace legacy compression.summary_model/provider/base_url references
  with auxiliary.compression.model/provider/base_url
- Add legacy migration info boxes explaining auto-migration

Minor fixes:
- wecom-callback.md: clarify 'text only' limitation (input only)
- Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
This commit is contained in:
Teknium 2026-04-13 10:50:59 -07:00 committed by GitHub
parent c449cd1af5
commit 4ca6668daf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 299 additions and 40 deletions

View file

@ -322,7 +322,11 @@ Long conversations are automatically summarized when approaching context limits:
compression:
enabled: true
threshold: 0.50 # Compress at 50% of context limit by default
summary_model: "google/gemini-3-flash-preview" # Model used for summarization
# Summarization model configured under auxiliary:
auxiliary:
compression:
model: "google/gemini-3-flash-preview" # Model used for summarization
```
When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.

View file

@ -441,11 +441,19 @@ compression:
threshold: 0.50 # Compress at this % of context limit
target_ratio: 0.20 # Fraction of threshold to preserve as recent tail
protect_last_n: 20 # Min recent messages to keep uncompressed
summary_model: "google/gemini-3-flash-preview" # Model for summarization
summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
# The summarization model/provider is configured under auxiliary:
auxiliary:
compression:
model: "google/gemini-3-flash-preview" # Model for summarization
provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
```
:::info Legacy config migration
Older configs with `compression.summary_model`, `compression.summary_provider`, and `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17). No manual action needed.
:::
### Common setups
**Default (auto-detect) — no configuration needed:**
@ -458,30 +466,32 @@ Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Fl
**Force a specific provider** (OAuth or API-key based):
```yaml
compression:
summary_provider: nous
summary_model: gemini-3-flash
auxiliary:
compression:
provider: nous
model: gemini-3-flash
```
Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc.
**Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.):
```yaml
compression:
summary_model: glm-4.7
summary_base_url: https://api.z.ai/api/coding/paas/v4
auxiliary:
compression:
model: glm-4.7
base_url: https://api.z.ai/api/coding/paas/v4
```
Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth.
### How the three knobs interact
| `summary_provider` | `summary_base_url` | Result |
| `auxiliary.compression.provider` | `auxiliary.compression.base_url` | Result |
|---------------------|---------------------|--------|
| `auto` (default) | not set | Auto-detect best available provider |
| `nous` / `openrouter` / etc. | not set | Force that provider, use its auth |
| any | set | Use the custom endpoint directly (provider ignored) |
:::warning Summary model context length requirement
The `summary_model` **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override `summary_model`, verify its context length meets or exceeds your main model's.
The summary model **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override the model, verify its context length meets or exceeds your main model's.
:::
## Context Engine
@ -522,6 +532,8 @@ agent:
Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.
When the iteration budget is fully exhausted, the CLI shows a notification to the user: `⚠ Iteration budget reached (90/90) — response may be incomplete`. If the budget runs out during active work, the agent generates a summary of what was accomplished before stopping.
### Streaming Timeouts
The LLM streaming connection has two timeout layers. Both auto-adjust for local providers (localhost, LAN IPs) — no configuration needed for most setups.
@ -666,7 +678,7 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
:::
:::info
Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
:::
### Changing the Vision Model
@ -839,16 +851,21 @@ agent:
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts" | "minimax"
speed: 1.0 # Global speed multiplier (fallback for all providers)
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
speed: 1.0 # Speed multiplier (converted to rate percentage, e.g. 1.5 → +50%)
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
speed: 1.0 # Speed multiplier (clamped to 0.254.0 by the API)
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
minimax:
speed: 1.0 # Speech speed multiplier
neutts:
ref_audio: ''
ref_text: ''
@ -858,6 +875,8 @@ tts:
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
**Speed fallback hierarchy:** provider-specific speed (e.g. `tts.edge.speed`) → global `tts.speed``1.0` default. Set the global `tts.speed` to apply a uniform speed across all providers, or override per-provider for fine-grained control.
## Display Settings
```yaml

View file

@ -156,7 +156,7 @@ Hermes uses separate lightweight models for side tasks. Each task has its own pr
|------|-------------|-----------|
| Vision | Image analysis, browser screenshots | `auxiliary.vision` |
| Web Extract | Web page summarization | `auxiliary.web_extract` |
| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
| Compression | Context compression summaries | `auxiliary.compression` |
| Session Search | Past session summarization | `auxiliary.session_search` |
| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
| MCP | MCP helper operations | `auxiliary.mcp` |
@ -219,13 +219,14 @@ auxiliary:
model: ""
```
Every task above follows the same **provider / model / base_url** pattern. Context compression uses its own top-level block:
Every task above follows the same **provider / model / base_url** pattern. Context compression is configured under `auxiliary.compression`:
```yaml
compression:
summary_provider: main # Same provider options as auxiliary tasks
summary_model: google/gemini-3-flash-preview
summary_base_url: null # Custom OpenAI-compatible endpoint
auxiliary:
compression:
provider: main # Same provider options as other auxiliary tasks
model: google/gemini-3-flash-preview
base_url: null # Custom OpenAI-compatible endpoint
```
And the fallback model uses:
@ -270,15 +271,18 @@ auxiliary:
## Context Compression Fallback
Context compression has a legacy configuration path in addition to the auxiliary system:
Context compression uses the `auxiliary.compression` config block to control which model and provider handles summarization:
```yaml
compression:
summary_provider: "auto" # auto | openrouter | nous | main
summary_model: "google/gemini-3-flash-preview"
auxiliary:
compression:
provider: "auto" # auto | openrouter | nous | main
model: "google/gemini-3-flash-preview"
```
This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
:::info Legacy migration
Older configs with `compression.summary_model` / `compression.summary_provider` / `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17).
:::
If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
@ -325,7 +329,7 @@ See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configurat
| Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` |
| Session search | Auto-detection chain | `auxiliary.session_search` |
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |

View file

@ -426,6 +426,10 @@ hermes skills update react # Update one specific installed hub skill
This uses the stored source identifier plus the current upstream bundle content hash to detect drift.
:::tip GitHub rate limits
Skills hub operations use the GitHub API, which has a rate limit of 60 requests/hour for unauthenticated users. If you see rate-limit errors during install or search, set `GITHUB_TOKEN` in your `.env` file to increase the limit to 5,000 requests/hour. The error message includes an actionable hint when this happens.
:::
### Slash commands (inside chat)
All the same commands work with `/skills`:

View file

@ -36,8 +36,10 @@ Convert text to speech with six providers:
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "neutts"
speed: 1.0 # Global speed multiplier (provider-specific settings override this)
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
speed: 1.0 # Converted to rate percentage (+/-%)
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
@ -45,6 +47,7 @@ tts:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
speed: 1.0 # 0.25 - 4.0
minimax:
model: "speech-2.8-hd" # speech-2.8-hd (default), speech-2.8-turbo
voice_id: "English_Graceful_Lady" # See https://platform.minimax.io/faq/system-voice-id
@ -61,6 +64,8 @@ tts:
device: cpu
```
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
### Telegram Voice Bubbles & ffmpeg
Telegram voice bubbles require Opus/OGG audio format:

View file

@ -1,7 +1,7 @@
---
sidebar_position: 15
title: "Web Dashboard"
description: "Browser-based dashboard for managing configuration, API keys, and monitoring sessions"
description: "Browser-based dashboard for managing configuration, API keys, sessions, logs, analytics, cron jobs, and skills"
---
# Web Dashboard
@ -104,6 +104,54 @@ Each key shows:
Advanced/rarely-used keys are hidden by default behind a toggle.
### Sessions
Browse and inspect all agent sessions. Each row shows the session title, source platform icon (CLI, Telegram, Discord, Slack, cron), model name, message count, tool call count, and how long ago it was active. Live sessions are marked with a pulsing badge.
- **Search** — full-text search across all message content using FTS5. Results show highlighted snippets and auto-scroll to the first matching message when expanded.
- **Expand** — click a session to load its full message history. Messages are color-coded by role (user, assistant, system, tool) and rendered as Markdown with syntax highlighting.
- **Tool calls** — assistant messages with tool calls show collapsible blocks with the function name and JSON arguments.
- **Delete** — remove a session and its message history with the trash icon.
### Logs
View agent, gateway, and error log files with filtering and live tailing.
- **File** — switch between `agent`, `errors`, and `gateway` log files
- **Level** — filter by log level: ALL, DEBUG, INFO, WARNING, or ERROR
- **Component** — filter by source component: all, gateway, agent, tools, cli, or cron
- **Lines** — choose how many lines to display (50, 100, 200, or 500)
- **Auto-refresh** — toggle live tailing that polls for new log lines every 5 seconds
- **Color-coded** — log lines are colored by severity (red for errors, yellow for warnings, dim for debug)
### Analytics
Usage and cost analytics computed from session history. Select a time period (7, 30, or 90 days) to see:
- **Summary cards** — total tokens (input/output), cache hit percentage, total estimated or actual cost, and total session count with daily average
- **Daily token chart** — stacked bar chart showing input and output token usage per day, with hover tooltips showing breakdowns and cost
- **Daily breakdown table** — date, session count, input tokens, output tokens, cache hit rate, and cost for each day
- **Per-model breakdown** — table showing each model used, its session count, token usage, and estimated cost
### Cron
Create and manage scheduled cron jobs that run agent prompts on a recurring schedule.
- **Create** — fill in a name (optional), prompt, cron expression (e.g. `0 9 * * *`), and delivery target (local, Telegram, Discord, Slack, or email)
- **Job list** — each job shows its name, prompt preview, schedule expression, state badge (enabled/paused/error), delivery target, last run time, and next run time
- **Pause / Resume** — toggle a job between active and paused states
- **Trigger now** — immediately execute a job outside its normal schedule
- **Delete** — permanently remove a cron job
### Skills
Browse, search, and toggle skills and toolsets. Skills are loaded from `~/.hermes/skills/` and grouped by category.
- **Search** — filter skills and toolsets by name, description, or category
- **Category filter** — click category pills to narrow the list (e.g. MLOps, MCP, Red Teaming, AI)
- **Toggle** — enable or disable individual skills with a switch. Changes take effect on the next session.
- **Toolsets** — a separate section shows built-in toolsets (file operations, web browsing, etc.) with their active/inactive status, setup requirements, and list of included tools
:::warning Security
The web dashboard reads and writes your `.env` file, which contains API keys and secrets. It binds to `127.0.0.1` by default — only accessible from your local machine. If you bind to `0.0.0.0`, anyone on your network can view and modify your credentials. The dashboard has no authentication of its own.
:::
@ -159,6 +207,66 @@ Sets an environment variable. Body: `{"key": "VAR_NAME", "value": "secret"}`.
Removes an environment variable. Body: `{"key": "VAR_NAME"}`.
### GET /api/sessions/\{session_id\}
Returns metadata for a single session.
### GET /api/sessions/\{session_id\}/messages
Returns the full message history for a session, including tool calls and timestamps.
### GET /api/sessions/search
Full-text search across message content. Query parameter: `q`. Returns matching session IDs with highlighted snippets.
### DELETE /api/sessions/\{session_id\}
Deletes a session and its message history.
### GET /api/logs
Returns log lines. Query parameters: `file` (agent/errors/gateway), `lines` (count), `level`, `component`.
### GET /api/analytics/usage
Returns token usage, cost, and session analytics. Query parameter: `days` (default 30). Response includes daily breakdowns and per-model aggregates.
### GET /api/cron/jobs
Returns all configured cron jobs with their state, schedule, and run history.
### POST /api/cron/jobs
Creates a new cron job. Body: `{"prompt": "...", "schedule": "0 9 * * *", "name": "...", "deliver": "local"}`.
### POST /api/cron/jobs/\{job_id\}/pause
Pauses a cron job.
### POST /api/cron/jobs/\{job_id\}/resume
Resumes a paused cron job.
### POST /api/cron/jobs/\{job_id\}/trigger
Immediately triggers a cron job outside its schedule.
### DELETE /api/cron/jobs/\{job_id\}
Deletes a cron job.
### GET /api/skills
Returns all skills with their name, description, category, and enabled status.
### PUT /api/skills/toggle
Enables or disables a skill. Body: `{"name": "skill-name", "enabled": true}`.
### GET /api/tools/toolsets
Returns all toolsets with their label, description, tools list, and active/configured status.
## CORS
The web server restricts CORS to localhost origins only:

View file

@ -143,5 +143,5 @@ The crypto implementation is compatible with Tencent's official WXBizMsgCrypt SD
- **No streaming** — replies arrive as complete messages after the agent finishes
- **No typing indicators** — the callback model doesn't support typing status
- **Text only** — currently supports text messages; image/file/voice not yet implemented
- **Text only** — currently supports text messages for input; image/file/voice input not yet implemented. The agent is aware of outbound media capabilities via the WeCom platform hint (images, documents, video, voice).
- **Response latency** — agent sessions take 330 minutes; users see the reply when processing completes

View file

@ -174,6 +174,33 @@ whatsapp:
---
## Message Formatting & Delivery
WhatsApp supports **streaming (progressive) responses** — the bot edits its message in real-time as the AI generates text, just like Discord and Telegram. Internally, WhatsApp is classified as a TIER_MEDIUM platform for delivery capabilities.
### Chunking
Long responses are automatically split into multiple messages at **4,096 characters** per chunk (WhatsApp's practical display limit). You don't need to configure anything — the gateway handles splitting and sends chunks sequentially.
### WhatsApp-Compatible Markdown
Standard Markdown in AI responses is automatically converted to WhatsApp's native formatting:
| Markdown | WhatsApp | Renders as |
|----------|----------|------------|
| `**bold**` | `*bold*` | **bold** |
| `~~strikethrough~~` | `~strikethrough~` | ~~strikethrough~~ |
| `# Heading` | `*Heading*` | Bold text (no native headings) |
| `[link text](url)` | `link text (url)` | Inline URL |
Code blocks and inline code are preserved as-is since WhatsApp supports triple-backtick formatting natively.
### Tool Progress
When the agent calls tools (web search, file operations, etc.), WhatsApp displays real-time progress indicators showing which tool is running. This is enabled by default — no configuration needed.
---
## Troubleshooting
| Problem | Solution |