docs: comprehensive update for recent merged PRs (#9019)

Audit and update documentation across 12 files to match changes from ~50 recently merged PRs. Key updates: Slash commands (slash-commands.md): - Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart - Fix /status incorrectly labeled as messaging-only (available in both) - Add --global flag to /model docs - Add [focus topic] arg to /compress docs CLI commands (cli-commands.md): - Add hermes debug share section with options and examples - Add hermes backup section with --quick and --label flags - Add hermes import section Feature docs: - TTS: document global tts.speed and per-provider speed for Edge/OpenAI - Web dashboard: add docs for 5 missing pages (Sessions, Logs, Analytics, Cron, Skills) and 15+ API endpoints - WhatsApp: add streaming, 4K chunking, and markdown formatting docs - Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip - Budget: document CLI notification on iteration budget exhaustion Config migration (compression.summary_* → auxiliary.compression.*): - Update configuration.md, environment-variables.md, fallback-providers.md, cli.md, and context-compression-and-caching.md - Replace legacy compression.summary_model/provider/base_url references with auxiliary.compression.model/provider/base_url - Add legacy migration info boxes explaining auto-migration Minor fixes: - wecom-callback.md: clarify 'text only' limitation (input only) - Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
2026-04-29 01:31:41 +00:00 · 2026-04-13 10:50:59 -07:00 · 2026-04-13 10:50:59 -07:00 · 4ca6668daf
commit 4ca6668daf
parent c449cd1af5
12 changed files with 299 additions and 40 deletions
--- a/website/docs/user-guide/cli.md
+++ b/website/docs/user-guide/cli.md
@ -322,7 +322,11 @@ Long conversations are automatically summarized when approaching context limits:
 compression:
  enabled: true
  threshold: 0.50    # Compress at 50% of context limit by default
-  summary_model: "google/gemini-3-flash-preview"  # Model used for summarization
+
+# Summarization model configured under auxiliary:
+auxiliary:
+  compression:
+    model: "google/gemini-3-flash-preview"  # Model used for summarization
 ```

 When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -441,11 +441,19 @@ compression:
  threshold: 0.50                                   # Compress at this % of context limit
  target_ratio: 0.20                                # Fraction of threshold to preserve as recent tail
  protect_last_n: 20                                # Min recent messages to keep uncompressed
-  summary_model: "google/gemini-3-flash-preview"    # Model for summarization
-  summary_provider: "auto"                          # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
-  summary_base_url: null                            # Custom OpenAI-compatible endpoint (overrides provider)
+
+# The summarization model/provider is configured under auxiliary:
+auxiliary:
+  compression:
+    model: "google/gemini-3-flash-preview"          # Model for summarization
+    provider: "auto"                                # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
+    base_url: null                                  # Custom OpenAI-compatible endpoint (overrides provider)
 ```

+:::info Legacy config migration
+Older configs with `compression.summary_model`, `compression.summary_provider`, and `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17). No manual action needed.
+:::
+
 ### Common setups

 **Default (auto-detect) — no configuration needed:**
@ -458,30 +466,32 @@ Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Fl

 **Force a specific provider** (OAuth or API-key based):
 ```yaml
-compression:
-  summary_provider: nous
-  summary_model: gemini-3-flash
+auxiliary:
+  compression:
+    provider: nous
+    model: gemini-3-flash
 ```
 Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc.

 **Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.):
 ```yaml
-compression:
-  summary_model: glm-4.7
-  summary_base_url: https://api.z.ai/api/coding/paas/v4
+auxiliary:
+  compression:
+    model: glm-4.7
+    base_url: https://api.z.ai/api/coding/paas/v4
 ```
 Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth.

 ### How the three knobs interact

-| `summary_provider` | `summary_base_url` | Result |
+| `auxiliary.compression.provider` | `auxiliary.compression.base_url` | Result |
 |---------------------|---------------------|--------|
 | `auto` (default) | not set | Auto-detect best available provider |
 | `nous` / `openrouter` / etc. | not set | Force that provider, use its auth |
 | any | set | Use the custom endpoint directly (provider ignored) |

 :::warning Summary model context length requirement
-The `summary_model` **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override `summary_model`, verify its context length meets or exceeds your main model's.
+The summary model **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override the model, verify its context length meets or exceeds your main model's.
 :::

 ## Context Engine
@ -522,6 +532,8 @@ agent:

 Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.

+When the iteration budget is fully exhausted, the CLI shows a notification to the user: `⚠ Iteration budget reached (90/90) — response may be incomplete`. If the budget runs out during active work, the agent generates a summary of what was accomplished before stopping.
+
 ### Streaming Timeouts

 The LLM streaming connection has two timeout layers. Both auto-adjust for local providers (localhost, LAN IPs) — no configuration needed for most setups.
@ -666,7 +678,7 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
 :::

 :::info
-Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
+Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
 :::

 ### Changing the Vision Model
@ -839,16 +851,21 @@ agent:

 ```yaml
 tts:
-  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "neutts"
+  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "neutts" | "minimax"
+  speed: 1.0                    # Global speed multiplier (fallback for all providers)
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
+    speed: 1.0                  # Speed multiplier (converted to rate percentage, e.g. 1.5 → +50%)
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
+    speed: 1.0                  # Speed multiplier (clamped to 0.25–4.0 by the API)
    base_url: "https://api.openai.com/v1"  # Override for OpenAI-compatible TTS endpoints
+  minimax:
+    speed: 1.0                  # Speech speed multiplier
  neutts:
    ref_audio: ''
    ref_text: ''
@ -858,6 +875,8 @@ tts:

 This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).

+**Speed fallback hierarchy:** provider-specific speed (e.g. `tts.edge.speed`) → global `tts.speed` → `1.0` default. Set the global `tts.speed` to apply a uniform speed across all providers, or override per-provider for fine-grained control.
+
 ## Display Settings

 ```yaml
--- a/website/docs/user-guide/features/fallback-providers.md
+++ b/website/docs/user-guide/features/fallback-providers.md
@ -156,7 +156,7 @@ Hermes uses separate lightweight models for side tasks. Each task has its own pr
 |------|-------------|-----------|
 | Vision | Image analysis, browser screenshots | `auxiliary.vision` |
 | Web Extract | Web page summarization | `auxiliary.web_extract` |
-| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
+| Compression | Context compression summaries | `auxiliary.compression` |
 | Session Search | Past session summarization | `auxiliary.session_search` |
 | Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
 | MCP | MCP helper operations | `auxiliary.mcp` |
@ -219,13 +219,14 @@ auxiliary:
    model: ""
 ```

-Every task above follows the same **provider / model / base_url** pattern. Context compression uses its own top-level block:
+Every task above follows the same **provider / model / base_url** pattern. Context compression is configured under `auxiliary.compression`:

 ```yaml
-compression:
-  summary_provider: main                             # Same provider options as auxiliary tasks
-  summary_model: google/gemini-3-flash-preview
-  summary_base_url: null                             # Custom OpenAI-compatible endpoint
+auxiliary:
+  compression:
+    provider: main                                    # Same provider options as other auxiliary tasks
+    model: google/gemini-3-flash-preview
+    base_url: null                                    # Custom OpenAI-compatible endpoint
 ```

 And the fallback model uses:
@ -270,15 +271,18 @@ auxiliary:

 ## Context Compression Fallback

-Context compression has a legacy configuration path in addition to the auxiliary system:
+Context compression uses the `auxiliary.compression` config block to control which model and provider handles summarization:

 ```yaml
-compression:
-  summary_provider: "auto"                    # auto | openrouter | nous | main
-  summary_model: "google/gemini-3-flash-preview"
+auxiliary:
+  compression:
+    provider: "auto"                              # auto | openrouter | nous | main
+    model: "google/gemini-3-flash-preview"
 ```

-This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
+:::info Legacy migration
+Older configs with `compression.summary_model` / `compression.summary_provider` / `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17).
+:::

 If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.

@ -325,7 +329,7 @@ See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configurat
 | Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
 | Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
 | Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
-| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
+| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` |
 | Session search | Auto-detection chain | `auxiliary.session_search` |
 | Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
 | MCP helpers | Auto-detection chain | `auxiliary.mcp` |
--- a/website/docs/user-guide/features/skills.md
+++ b/website/docs/user-guide/features/skills.md
@ -426,6 +426,10 @@ hermes skills update react   # Update one specific installed hub skill

 This uses the stored source identifier plus the current upstream bundle content hash to detect drift.

+:::tip GitHub rate limits
+Skills hub operations use the GitHub API, which has a rate limit of 60 requests/hour for unauthenticated users. If you see rate-limit errors during install or search, set `GITHUB_TOKEN` in your `.env` file to increase the limit to 5,000 requests/hour. The error message includes an actionable hint when this happens.
+:::
+
 ### Slash commands (inside chat)

 All the same commands work with `/skills`:
--- a/website/docs/user-guide/features/tts.md
+++ b/website/docs/user-guide/features/tts.md
@ -36,8 +36,10 @@ Convert text to speech with six providers:
 # In ~/.hermes/config.yaml
 tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "neutts"
+  speed: 1.0                    # Global speed multiplier (provider-specific settings override this)
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
+    speed: 1.0                  # Converted to rate percentage (+/-%)
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"  # Adam
    model_id: "eleven_multilingual_v2"
@ -45,6 +47,7 @@ tts:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
    base_url: "https://api.openai.com/v1"  # Override for OpenAI-compatible TTS endpoints
+    speed: 1.0                  # 0.25 - 4.0
  minimax:
    model: "speech-2.8-hd"     # speech-2.8-hd (default), speech-2.8-turbo
    voice_id: "English_Graceful_Lady"  # See https://platform.minimax.io/faq/system-voice-id
@ -61,6 +64,8 @@ tts:
    device: cpu
 ```

+**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
+
 ### Telegram Voice Bubbles & ffmpeg

 Telegram voice bubbles require Opus/OGG audio format:
--- a/website/docs/user-guide/features/web-dashboard.md
+++ b/website/docs/user-guide/features/web-dashboard.md
@ -1,7 +1,7 @@
 ---
 sidebar_position: 15
 title: "Web Dashboard"
-description: "Browser-based dashboard for managing configuration, API keys, and monitoring sessions"
+description: "Browser-based dashboard for managing configuration, API keys, sessions, logs, analytics, cron jobs, and skills"
 ---

 # Web Dashboard
@ -104,6 +104,54 @@ Each key shows:

 Advanced/rarely-used keys are hidden by default behind a toggle.

+### Sessions
+
+Browse and inspect all agent sessions. Each row shows the session title, source platform icon (CLI, Telegram, Discord, Slack, cron), model name, message count, tool call count, and how long ago it was active. Live sessions are marked with a pulsing badge.
+
+- **Search** — full-text search across all message content using FTS5. Results show highlighted snippets and auto-scroll to the first matching message when expanded.
+- **Expand** — click a session to load its full message history. Messages are color-coded by role (user, assistant, system, tool) and rendered as Markdown with syntax highlighting.
+- **Tool calls** — assistant messages with tool calls show collapsible blocks with the function name and JSON arguments.
+- **Delete** — remove a session and its message history with the trash icon.
+
+### Logs
+
+View agent, gateway, and error log files with filtering and live tailing.
+
+- **File** — switch between `agent`, `errors`, and `gateway` log files
+- **Level** — filter by log level: ALL, DEBUG, INFO, WARNING, or ERROR
+- **Component** — filter by source component: all, gateway, agent, tools, cli, or cron
+- **Lines** — choose how many lines to display (50, 100, 200, or 500)
+- **Auto-refresh** — toggle live tailing that polls for new log lines every 5 seconds
+- **Color-coded** — log lines are colored by severity (red for errors, yellow for warnings, dim for debug)
+
+### Analytics
+
+Usage and cost analytics computed from session history. Select a time period (7, 30, or 90 days) to see:
+
+- **Summary cards** — total tokens (input/output), cache hit percentage, total estimated or actual cost, and total session count with daily average
+- **Daily token chart** — stacked bar chart showing input and output token usage per day, with hover tooltips showing breakdowns and cost
+- **Daily breakdown table** — date, session count, input tokens, output tokens, cache hit rate, and cost for each day
+- **Per-model breakdown** — table showing each model used, its session count, token usage, and estimated cost
+
+### Cron
+
+Create and manage scheduled cron jobs that run agent prompts on a recurring schedule.
+
+- **Create** — fill in a name (optional), prompt, cron expression (e.g. `0 9 * * *`), and delivery target (local, Telegram, Discord, Slack, or email)
+- **Job list** — each job shows its name, prompt preview, schedule expression, state badge (enabled/paused/error), delivery target, last run time, and next run time
+- **Pause / Resume** — toggle a job between active and paused states
+- **Trigger now** — immediately execute a job outside its normal schedule
+- **Delete** — permanently remove a cron job
+
+### Skills
+
+Browse, search, and toggle skills and toolsets. Skills are loaded from `~/.hermes/skills/` and grouped by category.
+
+- **Search** — filter skills and toolsets by name, description, or category
+- **Category filter** — click category pills to narrow the list (e.g. MLOps, MCP, Red Teaming, AI)
+- **Toggle** — enable or disable individual skills with a switch. Changes take effect on the next session.
+- **Toolsets** — a separate section shows built-in toolsets (file operations, web browsing, etc.) with their active/inactive status, setup requirements, and list of included tools
+
 :::warning Security
 The web dashboard reads and writes your `.env` file, which contains API keys and secrets. It binds to `127.0.0.1` by default — only accessible from your local machine. If you bind to `0.0.0.0`, anyone on your network can view and modify your credentials. The dashboard has no authentication of its own.
 :::
@ -159,6 +207,66 @@ Sets an environment variable. Body: `{"key": "VAR_NAME", "value": "secret"}`.

 Removes an environment variable. Body: `{"key": "VAR_NAME"}`.

+### GET /api/sessions/\{session_id\}
+
+Returns metadata for a single session.
+
+### GET /api/sessions/\{session_id\}/messages
+
+Returns the full message history for a session, including tool calls and timestamps.
+
+### GET /api/sessions/search
+
+Full-text search across message content. Query parameter: `q`. Returns matching session IDs with highlighted snippets.
+
+### DELETE /api/sessions/\{session_id\}
+
+Deletes a session and its message history.
+
+### GET /api/logs
+
+Returns log lines. Query parameters: `file` (agent/errors/gateway), `lines` (count), `level`, `component`.
+
+### GET /api/analytics/usage
+
+Returns token usage, cost, and session analytics. Query parameter: `days` (default 30). Response includes daily breakdowns and per-model aggregates.
+
+### GET /api/cron/jobs
+
+Returns all configured cron jobs with their state, schedule, and run history.
+
+### POST /api/cron/jobs
+
+Creates a new cron job. Body: `{"prompt": "...", "schedule": "0 9 * * *", "name": "...", "deliver": "local"}`.
+
+### POST /api/cron/jobs/\{job_id\}/pause
+
+Pauses a cron job.
+
+### POST /api/cron/jobs/\{job_id\}/resume
+
+Resumes a paused cron job.
+
+### POST /api/cron/jobs/\{job_id\}/trigger
+
+Immediately triggers a cron job outside its schedule.
+
+### DELETE /api/cron/jobs/\{job_id\}
+
+Deletes a cron job.
+
+### GET /api/skills
+
+Returns all skills with their name, description, category, and enabled status.
+
+### PUT /api/skills/toggle
+
+Enables or disables a skill. Body: `{"name": "skill-name", "enabled": true}`.
+
+### GET /api/tools/toolsets
+
+Returns all toolsets with their label, description, tools list, and active/configured status.
+
 ## CORS

 The web server restricts CORS to localhost origins only:
--- a/website/docs/user-guide/messaging/wecom-callback.md
+++ b/website/docs/user-guide/messaging/wecom-callback.md
@ -143,5 +143,5 @@ The crypto implementation is compatible with Tencent's official WXBizMsgCrypt SD

 - **No streaming** — replies arrive as complete messages after the agent finishes
 - **No typing indicators** — the callback model doesn't support typing status
- **Text only** — currently supports text messages; image/file/voice not yet implemented
+- **Text only** — currently supports text messages for input; image/file/voice input not yet implemented. The agent is aware of outbound media capabilities via the WeCom platform hint (images, documents, video, voice).
 - **Response latency** — agent sessions take 3–30 minutes; users see the reply when processing completes
--- a/website/docs/user-guide/messaging/whatsapp.md
+++ b/website/docs/user-guide/messaging/whatsapp.md
@ -174,6 +174,33 @@ whatsapp:

 ---

+## Message Formatting & Delivery
+
+WhatsApp supports **streaming (progressive) responses** — the bot edits its message in real-time as the AI generates text, just like Discord and Telegram. Internally, WhatsApp is classified as a TIER_MEDIUM platform for delivery capabilities.
+
+### Chunking
+
+Long responses are automatically split into multiple messages at **4,096 characters** per chunk (WhatsApp's practical display limit). You don't need to configure anything — the gateway handles splitting and sends chunks sequentially.
+
+### WhatsApp-Compatible Markdown
+
+Standard Markdown in AI responses is automatically converted to WhatsApp's native formatting:
+
+| Markdown | WhatsApp | Renders as |
+|----------|----------|------------|
+| `**bold**` | `*bold*` | **bold** |
+| `~~strikethrough~~` | `~strikethrough~` | ~~strikethrough~~ |
+| `# Heading` | `*Heading*` | Bold text (no native headings) |
+| `[link text](url)` | `link text (url)` | Inline URL |
+
+Code blocks and inline code are preserved as-is since WhatsApp supports triple-backtick formatting natively.
+
+### Tool Progress
+
+When the agent calls tools (web search, file operations, etc.), WhatsApp displays real-time progress indicators showing which tool is running. This is enabled by default — no configuration needed.
+
+---
+
 ## Troubleshooting

 | Problem | Solution |