Merge branch 'main' into rewbs/tool-use-charge-to-subscription

2026-04-27 01:11:40 +00:00 · 2026-04-02 11:00:35 +11:00 · 2026-04-02 11:00:35 +11:00 · a2e56d044b
commit a2e56d044b
parent 1b7473e702 bd9e0b605f
175 changed files with 18848 additions and 3772 deletions
--- a/website/docs/developer-guide/adding-providers.md
+++ b/website/docs/developer-guide/adding-providers.md
@ -28,7 +28,7 @@ A built-in provider has to line up across a few layers:
   - `api_key`
   - `source`
 3. `run_agent.py` uses `api_mode` to decide how requests are built and sent.
-4. `hermes_cli/models.py`, `hermes_cli/main.py`, and `hermes_cli/setup.py` make the provider show up in the CLI.
+4. `hermes_cli/models.py` and `hermes_cli/main.py` make the provider show up in the CLI. (`hermes_cli/setup.py` delegates to `main.py` automatically — no changes needed there.)
 5. `agent/auxiliary_client.py` and `agent/model_metadata.py` keep side tasks and token budgeting working.

 The important abstraction is `api_mode`.
@ -78,11 +78,14 @@ This path includes everything from Path A plus:
 2. `hermes_cli/models.py`
 3. `hermes_cli/runtime_provider.py`
 4. `hermes_cli/main.py`
-5. `hermes_cli/setup.py`
-6. `agent/auxiliary_client.py`
-7. `agent/model_metadata.py`
-8. tests
-9. user-facing docs under `website/docs/`
+5. `agent/auxiliary_client.py`
+6. `agent/model_metadata.py`
+7. tests
+8. user-facing docs under `website/docs/`
+
+:::tip
+`hermes_cli/setup.py` does **not** need changes. The setup wizard delegates provider/model selection to `select_provider_and_model()` in `main.py` — any provider added there is automatically available in `hermes setup`.
+:::

 ### Additional for native / non-OpenAI providers

@ -185,29 +188,22 @@ If the provider is OpenAI-compatible, `api_mode` should usually stay `chat_compl

 Be careful with API-key precedence. Hermes already contains logic to avoid leaking an OpenRouter key to unrelated endpoints. A new provider should be equally explicit about which key goes to which base URL.

-## Step 5: Wire the CLI in `hermes_cli/main.py` and `hermes_cli/setup.py`
+## Step 5: Wire the CLI in `hermes_cli/main.py`

-A provider is not discoverable until it shows up in the interactive flows.
+A provider is not discoverable until it shows up in the interactive `hermes model` flow.

-Update:
+Update these in `hermes_cli/main.py`:

-### `hermes_cli/main.py`
-
- `provider_labels`
- provider dispatch inside the `model` command
+- `provider_labels` dict
+- `providers` list in `select_provider_and_model()`
+- provider dispatch (`if selected_provider == ...`)
 - `--provider` argument choices
 - login/logout choices if the provider supports those flows
 - a `_model_flow_<provider>()` function, or reuse `_model_flow_api_key_provider()` if it fits

-### `hermes_cli/setup.py`
-
- `provider_choices`
- auth branch for the provider
- model-selection branch
- any provider-specific explanatory text
- any place where a provider should be excluded from OpenRouter-only prompts or routing settings
-
-If you only update one of these files, `hermes model` and `hermes setup` will drift.
+:::tip
+`hermes_cli/setup.py` does not need changes — it calls `select_provider_and_model()` from `main.py`, so your new provider appears in both `hermes model` and `hermes setup` automatically.
+:::

 ## Step 6: Keep auxiliary calls working

@ -353,8 +349,7 @@ Use this if the provider is standard chat completions.
 - [ ] aliases added in `hermes_cli/auth.py` and `hermes_cli/models.py`
 - [ ] model catalog added in `hermes_cli/models.py`
 - [ ] runtime branch added in `hermes_cli/runtime_provider.py`
- [ ] CLI wiring added in `hermes_cli/main.py`
- [ ] setup wiring added in `hermes_cli/setup.py`
+- [ ] CLI wiring added in `hermes_cli/main.py` (setup.py inherits automatically)
 - [ ] aux model added in `agent/auxiliary_client.py`
 - [ ] context lengths added in `agent/model_metadata.py`
 - [ ] runtime / CLI tests updated
@ -412,7 +407,7 @@ If you are hunting for all the places a provider touches, search these symbols:
 - `_PROVIDER_MODELS`
 - `resolve_runtime_provider`
 - `_model_flow_`
- `provider_choices`
+- `select_provider_and_model`
 - `api_mode`
 - `_API_KEY_PROVIDER_AUX_MODELS`
 - `self.client.`
--- a/website/docs/developer-guide/context-compression-and-caching.md
+++ b/website/docs/developer-guide/context-compression-and-caching.md
@ -1,72 +1,321 @@
---
-sidebar_position: 6
-title: "Context Compression & Prompt Caching"
-description: "How Hermes compresses long conversations and applies provider-side prompt caching"
---
+# Context Compression and Caching

-# Context Compression & Prompt Caching
+Hermes Agent uses a dual compression system and Anthropic prompt caching to
+manage context window usage efficiently across long conversations.

-Hermes manages long conversations with two complementary mechanisms:
+Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
+`gateway/run.py` (session hygiene), `run_agent.py` (lines 1146-1204)

- prompt caching
- context compression

-Primary files:
+## Dual Compression System

- `agent/prompt_caching.py`
- `agent/context_compressor.py`
- `run_agent.py`
+Hermes has two separate compression layers that operate independently:

-## Prompt caching
+```
+                     ┌──────────────────────────┐
+  Incoming message   │   Gateway Session Hygiene │  Fires at 85% of context
+  ─────────────────► │   (pre-agent, rough est.) │  Safety net for large sessions
+                     └─────────────┬────────────┘
+                                   │
+                                   ▼
+                     ┌──────────────────────────┐
+                     │   Agent ContextCompressor │  Fires at 50% of context (default)
+                     │   (in-loop, real tokens)  │  Normal context management
+                     └──────────────────────────┘
+```

-For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
+### 1. Gateway Session Hygiene (85% threshold)

-Current strategy:
+Located in `gateway/run.py` (around line 2220). This is a **safety net** that
+runs before the agent processes a message. It prevents API failures when sessions
+grow too large between turns (e.g., overnight accumulation in Telegram/Discord).

- cache the system prompt
- cache the last 3 non-system messages
- default TTL is 5 minutes unless explicitly extended
+- **Threshold**: Fixed at 85% of model context length
+- **Token source**: Prefers actual API-reported tokens from last turn; falls back
+  to rough character-based estimate (`estimate_messages_tokens_rough`)
+- **Fires**: Only when `len(history) >= 4` and compression is enabled
+- **Purpose**: Catch sessions that escaped the agent's own compressor

-This is implemented in `agent/prompt_caching.py`.
+The gateway hygiene threshold is intentionally higher than the agent's compressor.
+Setting it at 50% (same as the agent) caused premature compression on every turn
+in long gateway sessions.

-## Why prompt stability matters
+### 2. Agent ContextCompressor (50% threshold, configurable)

-Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
+Located in `agent/context_compressor.py`. This is the **primary compression
+system** that runs inside the agent's tool loop with access to accurate,
+API-reported token counts.

-## Compression trigger

-Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
+## Configuration

-## Compression algorithm
+All compression settings are read from `config.yaml` under the `compression` key:

-The compressor protects:
+```yaml
+compression:
+  enabled: true              # Enable/disable compression (default: true)
+  threshold: 0.50            # Fraction of context window (default: 0.50 = 50%)
+  target_ratio: 0.20         # How much of threshold to keep as tail (default: 0.20)
+  protect_last_n: 20         # Minimum protected tail messages (default: 20)
+  summary_model: null        # Override model for summaries (default: uses auxiliary)
+```

- the first N turns
- the last N turns
+### Parameter Details

-and summarizes the middle section.
+| Parameter | Default | Range | Description |
+|-----------|---------|-------|-------------|
+| `threshold` | `0.50` | 0.0-1.0 | Compression triggers when prompt tokens ≥ `threshold × context_length` |
+| `target_ratio` | `0.20` | 0.10-0.80 | Controls tail protection token budget: `threshold_tokens × target_ratio` |
+| `protect_last_n` | `20` | ≥1 | Minimum number of recent messages always preserved |
+| `protect_first_n` | `3` | (hardcoded) | System prompt + first exchange always preserved |

-It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
+### Computed Values (for a 200K context model at defaults)

-## Pre-compression memory flush
+```
+context_length       = 200,000
+threshold_tokens     = 200,000 × 0.50 = 100,000
+tail_token_budget    = 100,000 × 0.20 = 20,000
+max_summary_tokens   = min(200,000 × 0.05, 12,000) = 10,000
+```

-Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.

-## Session lineage after compression
+## Compression Algorithm

-Compression can split the session into a new session ID while preserving parent lineage in the state DB.
+The `ContextCompressor.compress()` method follows a 4-phase algorithm:

-This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
+### Phase 1: Prune Old Tool Results (cheap, no LLM call)

-## Re-injected state after compression
+Old tool results (>200 chars) outside the protected tail are replaced with:
+```
+[Old tool output cleared to save context space]
+```

-After compression, Hermes may re-inject compact operational state such as:
+This is a cheap pre-pass that saves significant tokens from verbose tool
+outputs (file contents, terminal output, search results).

- todo snapshot
- prior-read-files summary
+### Phase 2: Determine Boundaries

-## Related docs
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Message list                                               │
+│                                                             │
+│  [0..2]  ← protect_first_n (system + first exchange)       │
+│  [3..N]  ← middle turns → SUMMARIZED                       │
+│  [N..end] ← tail (by token budget OR protect_last_n)       │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```

- [Prompt Assembly](./prompt-assembly.md)
- [Session Storage](./session-storage.md)
- [Agent Loop Internals](./agent-loop.md)
+Tail protection is **token-budget based**: walks backward from the end,
+accumulating tokens until the budget is exhausted. Falls back to the fixed
+`protect_last_n` count if the budget would protect fewer messages.
+
+Boundaries are aligned to avoid splitting tool_call/tool_result groups.
+The `_align_boundary_backward()` method walks past consecutive tool results
+to find the parent assistant message, keeping groups intact.
+
+### Phase 3: Generate Structured Summary
+
+The middle turns are summarized using the auxiliary LLM with a structured
+template:
+
+```
+## Goal
+[What the user is trying to accomplish]
+
+## Constraints & Preferences
+[User preferences, coding style, constraints, important decisions]
+
+## Progress
+### Done
+[Completed work — specific file paths, commands run, results]
+### In Progress
+[Work currently underway]
+### Blocked
+[Any blockers or issues encountered]
+
+## Key Decisions
+[Important technical decisions and why]
+
+## Relevant Files
+[Files read, modified, or created — with brief note on each]
+
+## Next Steps
+[What needs to happen next]
+
+## Critical Context
+[Specific values, error messages, configuration details]
+```
+
+Summary budget scales with the amount of content being compressed:
+- Formula: `content_tokens × 0.20` (the `_SUMMARY_RATIO` constant)
+- Minimum: 2,000 tokens
+- Maximum: `min(context_length × 0.05, 12,000)` tokens
+
+### Phase 4: Assemble Compressed Messages
+
+The compressed message list is:
+1. Head messages (with a note appended to system prompt on first compression)
+2. Summary message (role chosen to avoid consecutive same-role violations)
+3. Tail messages (unmodified)
+
+Orphaned tool_call/tool_result pairs are cleaned up by `_sanitize_tool_pairs()`:
+- Tool results referencing removed calls → removed
+- Tool calls whose results were removed → stub result injected
+
+### Iterative Re-compression
+
+On subsequent compressions, the previous summary is passed to the LLM with
+instructions to **update** it rather than summarize from scratch. This preserves
+information across multiple compactions — items move from "In Progress" to "Done",
+new progress is added, and obsolete information is removed.
+
+The `_previous_summary` field on the compressor instance stores the last summary
+text for this purpose.
+
+
+## Before/After Example
+
+### Before Compression (45 messages, ~95K tokens)
+
+```
+[0] system:    "You are a helpful assistant..." (system prompt)
+[1] user:      "Help me set up a FastAPI project"
+[2] assistant: <tool_call> terminal: mkdir project </tool_call>
+[3] tool:      "directory created"
+[4] assistant: <tool_call> write_file: main.py </tool_call>
+[5] tool:      "file written (2.3KB)"
+    ... 30 more turns of file editing, testing, debugging ...
+[38] assistant: <tool_call> terminal: pytest </tool_call>
+[39] tool:      "8 passed, 2 failed\n..."  (5KB output)
+[40] user:      "Fix the failing tests"
+[41] assistant: <tool_call> read_file: tests/test_api.py </tool_call>
+[42] tool:      "import pytest\n..."  (3KB)
+[43] assistant: "I see the issue with the test fixtures..."
+[44] user:      "Great, also add error handling"
+```
+
+### After Compression (25 messages, ~45K tokens)
+
+```
+[0] system:    "You are a helpful assistant...
+               [Note: Some earlier conversation turns have been compacted...]"
+[1] user:      "Help me set up a FastAPI project"
+[2] assistant: "[CONTEXT COMPACTION] Earlier turns were compacted...
+
+               ## Goal
+               Set up a FastAPI project with tests and error handling
+
+               ## Progress
+               ### Done
+               - Created project structure: main.py, tests/, requirements.txt
+               - Implemented 5 API endpoints in main.py
+               - Wrote 10 test cases in tests/test_api.py
+               - 8/10 tests passing
+
+               ### In Progress
+               - Fixing 2 failing tests (test_create_user, test_delete_user)
+
+               ## Relevant Files
+               - main.py — FastAPI app with 5 endpoints
+               - tests/test_api.py — 10 test cases
+               - requirements.txt — fastapi, pytest, httpx
+
+               ## Next Steps
+               - Fix failing test fixtures
+               - Add error handling"
+[3] user:      "Fix the failing tests"
+[4] assistant: <tool_call> read_file: tests/test_api.py </tool_call>
+[5] tool:      "import pytest\n..."
+[6] assistant: "I see the issue with the test fixtures..."
+[7] user:      "Great, also add error handling"
+```
+
+
+## Prompt Caching (Anthropic)
+
+Source: `agent/prompt_caching.py`
+
+Reduces input token costs by ~75% on multi-turn conversations by caching the
+conversation prefix. Uses Anthropic's `cache_control` breakpoints.
+
+### Strategy: system_and_3
+
+Anthropic allows a maximum of 4 `cache_control` breakpoints per request. Hermes
+uses the "system_and_3" strategy:
+
+```
+Breakpoint 1: System prompt           (stable across all turns)
+Breakpoint 2: 3rd-to-last non-system message  ─┐
+Breakpoint 3: 2nd-to-last non-system message   ├─ Rolling window
+Breakpoint 4: Last non-system message          ─┘
+```
+
+### How It Works
+
+`apply_anthropic_cache_control()` deep-copies the messages and injects
+`cache_control` markers:
+
+```python
+# Cache marker format
+marker = {"type": "ephemeral"}
+# Or for 1-hour TTL:
+marker = {"type": "ephemeral", "ttl": "1h"}
+```
+
+The marker is applied differently based on content type:
+
+| Content Type | Where Marker Goes |
+|-------------|-------------------|
+| String content | Converted to `[{"type": "text", "text": ..., "cache_control": ...}]` |
+| List content | Added to the last element's dict |
+| None/empty | Added as `msg["cache_control"]` |
+| Tool messages | Added as `msg["cache_control"]` (native Anthropic only) |
+
+### Cache-Aware Design Patterns
+
+1. **Stable system prompt**: The system prompt is breakpoint 1 and cached across
+   all turns. Avoid mutating it mid-conversation (compression appends a note
+   only on the first compaction).
+
+2. **Message ordering matters**: Cache hits require prefix matching. Adding or
+   removing messages in the middle invalidates the cache for everything after.
+
+3. **Compression cache interaction**: After compression, the cache is invalidated
+   for the compressed region but the system prompt cache survives. The rolling
+   3-message window re-establishes caching within 1-2 turns.
+
+4. **TTL selection**: Default is `5m` (5 minutes). Use `1h` for long-running
+   sessions where the user takes breaks between turns.
+
+### Enabling Prompt Caching
+
+Prompt caching is automatically enabled when:
+- The model is an Anthropic Claude model (detected by model name)
+- The provider supports `cache_control` (native Anthropic API or OpenRouter)
+
+```yaml
+# config.yaml — TTL is configurable
+model:
+  cache_ttl: "5m"   # "5m" or "1h"
+```
+
+The CLI shows caching status at startup:
+```
+💾 Prompt caching: ENABLED (Claude via OpenRouter, 5m TTL)
+```
+
+
+## Context Pressure Warnings
+
+The agent emits context pressure warnings at 85% of the compression threshold
+(not 85% of context — 85% of the threshold which is itself 50% of context):
+
+```
+⚠️  Context is 85% to compaction threshold (42,500/50,000 tokens)
+```
+
+After compression, if usage drops below 85% of threshold, the warning state
+is cleared. If compression fails to reduce below the warning level (the
+conversation is too dense), the warning persists but compression won't
+re-trigger until the threshold is exceeded again.
--- a/website/docs/developer-guide/prompt-assembly.md
+++ b/website/docs/developer-guide/prompt-assembly.md
@ -41,6 +41,163 @@ The cached system prompt is assembled in roughly this order:

 When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.

+### Concrete example: assembled system prompt
+
+Here is a simplified view of what the final system prompt looks like when all layers are present (comments show the source of each section):
+
+```
+# Layer 1: Agent Identity (from ~/.hermes/SOUL.md)
+You are Hermes, an AI assistant created by Nous Research.
+You are an expert software engineer and researcher.
+You value correctness, clarity, and efficiency.
+...
+
+# Layer 2: Tool-aware behavior guidance
+You have persistent memory across sessions. Save durable facts using
+the memory tool: user preferences, environment details, tool quirks,
+and stable conventions. Memory is injected into every turn, so keep
+it compact and focused on facts that will still matter later.
+...
+When the user references something from a past conversation or you
+suspect relevant cross-session context exists, use session_search
+to recall it before asking them to repeat themselves.
+
+# Tool-use enforcement (for GPT/Codex models only)
+You MUST use your tools to take action — do not describe what you
+would do or plan to do without actually doing it.
+...
+
+# Layer 3: Honcho static block (when active)
+[Honcho personality/context data]
+
+# Layer 4: Optional system message (from config or API)
+[User-configured system message override]
+
+# Layer 5: Frozen MEMORY snapshot
+## Persistent Memory
+- User prefers Python 3.12, uses pyproject.toml
+- Default editor is nvim
+- Working on project "atlas" in ~/code/atlas
+- Timezone: US/Pacific
+
+# Layer 6: Frozen USER profile snapshot
+## User Profile
+- Name: Alice
+- GitHub: alice-dev
+
+# Layer 7: Skills index
+## Skills (mandatory)
+Before replying, scan the skills below. If one clearly matches
+your task, load it with skill_view(name) and follow its instructions.
+...
+<available_skills>
+  software-development:
+    - code-review: Structured code review workflow
+    - test-driven-development: TDD methodology
+  research:
+    - arxiv: Search and summarize arXiv papers
+</available_skills>
+
+# Layer 8: Context files (from project directory)
+# Project Context
+The following project context files have been loaded and should be followed:
+
+## AGENTS.md
+This is the atlas project. Use pytest for testing. The main
+entry point is src/atlas/main.py. Always run `make lint` before
+committing.
+
+# Layer 9: Timestamp + session
+Current time: 2026-03-30T14:30:00-07:00
+Session: abc123
+
+# Layer 10: Platform hint
+You are a CLI AI Agent. Try not to use markdown but simple text
+renderable inside a terminal.
+```
+
+## How SOUL.md appears in the prompt
+
+`SOUL.md` lives at `~/.hermes/SOUL.md` and serves as the agent's identity — the very first section of the system prompt. The loading logic in `prompt_builder.py` works as follows:
+
+```python
+# From agent/prompt_builder.py (simplified)
+def load_soul_md() -> Optional[str]:
+    soul_path = get_hermes_home() / "SOUL.md"
+    if not soul_path.exists():
+        return None
+    content = soul_path.read_text(encoding="utf-8").strip()
+    content = _scan_context_content(content, "SOUL.md")  # Security scan
+    content = _truncate_content(content, "SOUL.md")       # Cap at 20k chars
+    return content
+```
+
+When `load_soul_md()` returns content, it replaces the hardcoded `DEFAULT_AGENT_IDENTITY`. The `build_context_files_prompt()` function is then called with `skip_soul=True` to prevent SOUL.md from appearing twice (once as identity, once as a context file).
+
+If `SOUL.md` doesn't exist, the system falls back to:
+
+```
+You are Hermes Agent, an intelligent AI assistant created by Nous Research.
+You are helpful, knowledgeable, and direct. You assist users with a wide
+range of tasks including answering questions, writing and editing code,
+analyzing information, creative work, and executing actions via your tools.
+You communicate clearly, admit uncertainty when appropriate, and prioritize
+being genuinely useful over being verbose unless otherwise directed below.
+Be targeted and efficient in your exploration and investigations.
+```
+
+## How context files are injected
+
+`build_context_files_prompt()` uses a **priority system** — only one project context type is loaded (first match wins):
+
+```python
+# From agent/prompt_builder.py (simplified)
+def build_context_files_prompt(cwd=None, skip_soul=False):
+    cwd_path = Path(cwd).resolve()
+
+    # Priority: first match wins — only ONE project context loaded
+    project_context = (
+        _load_hermes_md(cwd_path)       # 1. .hermes.md / HERMES.md (walks to git root)
+        or _load_agents_md(cwd_path)    # 2. AGENTS.md (cwd only)
+        or _load_claude_md(cwd_path)    # 3. CLAUDE.md (cwd only)
+        or _load_cursorrules(cwd_path)  # 4. .cursorrules / .cursor/rules/*.mdc
+    )
+
+    sections = []
+    if project_context:
+        sections.append(project_context)
+
+    # SOUL.md from HERMES_HOME (independent of project context)
+    if not skip_soul:
+        soul_content = load_soul_md()
+        if soul_content:
+            sections.append(soul_content)
+
+    if not sections:
+        return ""
+
+    return (
+        "# Project Context\n\n"
+        "The following project context files have been loaded "
+        "and should be followed:\n\n"
+        + "\n".join(sections)
+    )
+```
+
+### Context file discovery details
+
+| Priority | Files | Search scope | Notes |
+|----------|-------|-------------|-------|
+| 1 | `.hermes.md`, `HERMES.md` | CWD up to git root | Hermes-native project config |
+| 2 | `AGENTS.md` | CWD only | Common agent instruction file |
+| 3 | `CLAUDE.md` | CWD only | Claude Code compatibility |
+| 4 | `.cursorrules`, `.cursor/rules/*.mdc` | CWD only | Cursor compatibility |
+
+All context files are:
+- **Security scanned** — checked for prompt injection patterns (invisible unicode, "ignore previous instructions", credential exfiltration attempts)
+- **Truncated** — capped at 20,000 characters using 70/20 head/tail ratio with a truncation marker
+- **YAML frontmatter stripped** — `.hermes.md` frontmatter is removed (reserved for future config overrides)
+
 ## API-call-time-only layers

 These are intentionally *not* persisted as part of the cached system prompt:
--- a/website/docs/developer-guide/session-storage.md
+++ b/website/docs/developer-guide/session-storage.md
@ -1,66 +1,388 @@
---
-sidebar_position: 8
-title: "Session Storage"
-description: "How Hermes stores sessions in SQLite, maintains lineage, and exposes recall/search"
---
-
 # Session Storage

-Hermes uses a SQLite-backed session store as the main source of truth for historical conversation state.
+Hermes Agent uses a SQLite database (`~/.hermes/state.db`) to persist session
+metadata, full message history, and model configuration across CLI and gateway
+sessions. This replaces the earlier per-session JSONL file approach.

-Primary files:
+Source file: `hermes_state.py`

- `hermes_state.py`
- `gateway/session.py`
- `tools/session_search_tool.py`

-## Main database
+## Architecture Overview

-The primary store lives at:
-
-```text
-~/.hermes/state.db
+```
+~/.hermes/state.db (SQLite, WAL mode)
+├── sessions          — Session metadata, token counts, billing
+├── messages          — Full message history per session
+├── messages_fts      — FTS5 virtual table for full-text search
+└── schema_version    — Single-row table tracking migration state
 ```

-It contains:
+Key design decisions:
+- **WAL mode** for concurrent readers + one writer (gateway multi-platform)
+- **FTS5 virtual table** for fast text search across all session messages
+- **Session lineage** via `parent_session_id` chains (compression-triggered splits)
+- **Source tagging** (`cli`, `telegram`, `discord`, etc.) for platform filtering
+- Batch runner and RL trajectories are NOT stored here (separate systems)

- sessions
- messages
- metadata such as token counts and titles
- lineage relationships
- full-text search indexes

-## What is stored per session
+## SQLite Schema

-Examples of important session metadata:
+### Sessions Table

- session ID
- source/platform
- title
- created/updated timestamps
- token counts
- tool call counts
- stored system prompt snapshot
- parent session ID after compression splits
+```sql
+CREATE TABLE IF NOT EXISTS sessions (
+    id TEXT PRIMARY KEY,
+    source TEXT NOT NULL,
+    user_id TEXT,
+    model TEXT,
+    model_config TEXT,
+    system_prompt TEXT,
+    parent_session_id TEXT,
+    started_at REAL NOT NULL,
+    ended_at REAL,
+    end_reason TEXT,
+    message_count INTEGER DEFAULT 0,
+    tool_call_count INTEGER DEFAULT 0,
+    input_tokens INTEGER DEFAULT 0,
+    output_tokens INTEGER DEFAULT 0,
+    cache_read_tokens INTEGER DEFAULT 0,
+    cache_write_tokens INTEGER DEFAULT 0,
+    reasoning_tokens INTEGER DEFAULT 0,
+    billing_provider TEXT,
+    billing_base_url TEXT,
+    billing_mode TEXT,
+    estimated_cost_usd REAL,
+    actual_cost_usd REAL,
+    cost_status TEXT,
+    cost_source TEXT,
+    pricing_version TEXT,
+    title TEXT,
+    FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
+);

-## Lineage
+CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
+CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
+CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
+CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique
+    ON sessions(title) WHERE title IS NOT NULL;
+```

-When Hermes compresses a conversation, it can continue in a new session ID while preserving ancestry via `parent_session_id`.
+### Messages Table

-This means resuming/searching can follow session families instead of treating each compressed shard as unrelated.
+```sql
+CREATE TABLE IF NOT EXISTS messages (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    session_id TEXT NOT NULL REFERENCES sessions(id),
+    role TEXT NOT NULL,
+    content TEXT,
+    tool_call_id TEXT,
+    tool_calls TEXT,
+    tool_name TEXT,
+    timestamp REAL NOT NULL,
+    token_count INTEGER,
+    finish_reason TEXT,
+    reasoning TEXT,
+    reasoning_details TEXT,
+    codex_reasoning_items TEXT
+);

-## Gateway vs CLI persistence
+CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp);
+```

- CLI uses the state DB directly for resume/history/search
- gateway keeps active-session mappings and may also maintain additional platform transcript/state files
- some legacy JSON/JSONL artifacts still exist for compatibility, but SQLite is the main historical store
+Notes:
+- `tool_calls` is stored as a JSON string (serialized list of tool call objects)
+- `reasoning_details` and `codex_reasoning_items` are stored as JSON strings
+- `reasoning` stores the raw reasoning text for providers that expose it
+- Timestamps are Unix epoch floats (`time.time()`)

-## Session search
+### FTS5 Full-Text Search

-The `session_search` tool uses the session DB's search features to retrieve and summarize relevant past work.
+```sql
+CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
+    content,
+    content=messages,
+    content_rowid=id
+);
+```

-## Related docs
+The FTS5 table is kept in sync via three triggers that fire on INSERT, UPDATE,
+and DELETE of the `messages` table:

- [Gateway Internals](./gateway-internals.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+```sql
+CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
+    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_delete AFTER DELETE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, content)
+        VALUES('delete', old.id, old.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, content)
+        VALUES('delete', old.id, old.content);
+    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
+END;
+```
+
+
+## Schema Version and Migrations
+
+Current schema version: **6**
+
+The `schema_version` table stores a single integer. On initialization,
+`_init_schema()` checks the current version and applies migrations sequentially:
+
+| Version | Change |
+|---------|--------|
+| 1 | Initial schema (sessions, messages, FTS5) |
+| 2 | Add `finish_reason` column to messages |
+| 3 | Add `title` column to sessions |
+| 4 | Add unique index on `title` (NULLs allowed, non-NULL must be unique) |
+| 5 | Add billing columns: `cache_read_tokens`, `cache_write_tokens`, `reasoning_tokens`, `billing_provider`, `billing_base_url`, `billing_mode`, `estimated_cost_usd`, `actual_cost_usd`, `cost_status`, `cost_source`, `pricing_version` |
+| 6 | Add reasoning columns to messages: `reasoning`, `reasoning_details`, `codex_reasoning_items` |
+
+Each migration uses `ALTER TABLE ADD COLUMN` wrapped in try/except to handle
+the column-already-exists case (idempotent). The version number is bumped after
+each successful migration block.
+
+
+## Write Contention Handling
+
+Multiple hermes processes (gateway + CLI sessions + worktree agents) share one
+`state.db`. The `SessionDB` class handles write contention with:
+
+- **Short SQLite timeout** (1 second) instead of the default 30s
+- **Application-level retry** with random jitter (20-150ms, up to 15 retries)
+- **BEGIN IMMEDIATE** transactions to surface lock contention at transaction start
+- **Periodic WAL checkpoints** every 50 successful writes (PASSIVE mode)
+
+This avoids the "convoy effect" where SQLite's deterministic internal backoff
+causes all competing writers to retry at the same intervals.
+
+```
+_WRITE_MAX_RETRIES = 15
+_WRITE_RETRY_MIN_S = 0.020   # 20ms
+_WRITE_RETRY_MAX_S = 0.150   # 150ms
+_CHECKPOINT_EVERY_N_WRITES = 50
+```
+
+
+## Common Operations
+
+### Initialize
+
+```python
+from hermes_state import SessionDB
+
+db = SessionDB()                           # Default: ~/.hermes/state.db
+db = SessionDB(db_path=Path("/tmp/test.db"))  # Custom path
+```
+
+### Create and Manage Sessions
+
+```python
+# Create a new session
+db.create_session(
+    session_id="sess_abc123",
+    source="cli",
+    model="anthropic/claude-sonnet-4.6",
+    user_id="user_1",
+    parent_session_id=None,  # or previous session ID for lineage
+)
+
+# End a session
+db.end_session("sess_abc123", end_reason="user_exit")
+
+# Reopen a session (clear ended_at/end_reason)
+db.reopen_session("sess_abc123")
+```
+
+### Store Messages
+
+```python
+msg_id = db.append_message(
+    session_id="sess_abc123",
+    role="assistant",
+    content="Here's the answer...",
+    tool_calls=[{"id": "call_1", "function": {"name": "terminal", "arguments": "{}"}}],
+    token_count=150,
+    finish_reason="stop",
+    reasoning="Let me think about this...",
+)
+```
+
+### Retrieve Messages
+
+```python
+# Raw messages with all metadata
+messages = db.get_messages("sess_abc123")
+
+# OpenAI conversation format (for API replay)
+conversation = db.get_messages_as_conversation("sess_abc123")
+# Returns: [{"role": "user", "content": "..."}, {"role": "assistant", ...}]
+```
+
+### Session Titles
+
+```python
+# Set a title (must be unique among non-NULL titles)
+db.set_session_title("sess_abc123", "Fix Docker Build")
+
+# Resolve by title (returns most recent in lineage)
+session_id = db.resolve_session_by_title("Fix Docker Build")
+
+# Auto-generate next title in lineage
+next_title = db.get_next_title_in_lineage("Fix Docker Build")
+# Returns: "Fix Docker Build #2"
+```
+
+
+## Full-Text Search
+
+The `search_messages()` method supports FTS5 query syntax with automatic
+sanitization of user input.
+
+### Basic Search
+
+```python
+results = db.search_messages("docker deployment")
+```
+
+### FTS5 Query Syntax
+
+| Syntax | Example | Meaning |
+|--------|---------|---------|
+| Keywords | `docker deployment` | Both terms (implicit AND) |
+| Quoted phrase | `"exact phrase"` | Exact phrase match |
+| Boolean OR | `docker OR kubernetes` | Either term |
+| Boolean NOT | `python NOT java` | Exclude term |
+| Prefix | `deploy*` | Prefix match |
+
+### Filtered Search
+
+```python
+# Search only CLI sessions
+results = db.search_messages("error", source_filter=["cli"])
+
+# Exclude gateway sessions
+results = db.search_messages("bug", exclude_sources=["telegram", "discord"])
+
+# Search only user messages
+results = db.search_messages("help", role_filter=["user"])
+```
+
+### Search Results Format
+
+Each result includes:
+- `id`, `session_id`, `role`, `timestamp`
+- `snippet` — FTS5-generated snippet with `>>>match<<<` markers
+- `context` — 1 message before and after the match (content truncated to 200 chars)
+- `source`, `model`, `session_started` — from the parent session
+
+The `_sanitize_fts5_query()` method handles edge cases:
+- Strips unmatched quotes and special characters
+- Wraps hyphenated terms in quotes (`chat-send` → `"chat-send"`)
+- Removes dangling boolean operators (`hello AND` → `hello`)
+
+
+## Session Lineage
+
+Sessions can form chains via `parent_session_id`. This happens when context
+compression triggers a session split in the gateway.
+
+### Query: Find Session Lineage
+
+```sql
+-- Find all ancestors of a session
+WITH RECURSIVE lineage AS (
+    SELECT * FROM sessions WHERE id = ?
+    UNION ALL
+    SELECT s.* FROM sessions s
+    JOIN lineage l ON s.id = l.parent_session_id
+)
+SELECT id, title, started_at, parent_session_id FROM lineage;
+
+-- Find all descendants of a session
+WITH RECURSIVE descendants AS (
+    SELECT * FROM sessions WHERE id = ?
+    UNION ALL
+    SELECT s.* FROM sessions s
+    JOIN descendants d ON s.parent_session_id = d.id
+)
+SELECT id, title, started_at FROM descendants;
+```
+
+### Query: Recent Sessions with Preview
+
+```sql
+SELECT s.*,
+    COALESCE(
+        (SELECT SUBSTR(m.content, 1, 63)
+         FROM messages m
+         WHERE m.session_id = s.id AND m.role = 'user' AND m.content IS NOT NULL
+         ORDER BY m.timestamp, m.id LIMIT 1),
+        ''
+    ) AS preview,
+    COALESCE(
+        (SELECT MAX(m2.timestamp) FROM messages m2 WHERE m2.session_id = s.id),
+        s.started_at
+    ) AS last_active
+FROM sessions s
+ORDER BY s.started_at DESC
+LIMIT 20;
+```
+
+### Query: Token Usage Statistics
+
+```sql
+-- Total tokens by model
+SELECT model,
+       COUNT(*) as session_count,
+       SUM(input_tokens) as total_input,
+       SUM(output_tokens) as total_output,
+       SUM(estimated_cost_usd) as total_cost
+FROM sessions
+WHERE model IS NOT NULL
+GROUP BY model
+ORDER BY total_cost DESC;
+
+-- Sessions with highest token usage
+SELECT id, title, model, input_tokens + output_tokens AS total_tokens,
+       estimated_cost_usd
+FROM sessions
+ORDER BY total_tokens DESC
+LIMIT 10;
+```
+
+
+## Export and Cleanup
+
+```python
+# Export a single session with messages
+data = db.export_session("sess_abc123")
+
+# Export all sessions (with messages) as list of dicts
+all_data = db.export_all(source="cli")
+
+# Delete old sessions (only ended sessions)
+deleted_count = db.prune_sessions(older_than_days=90)
+deleted_count = db.prune_sessions(older_than_days=30, source="telegram")
+
+# Clear messages but keep the session record
+db.clear_messages("sess_abc123")
+
+# Delete session and all messages
+db.delete_session("sess_abc123")
+```
+
+
+## Database Location
+
+Default path: `~/.hermes/state.db`
+
+This is derived from `hermes_constants.get_hermes_home()` which resolves to
+`~/.hermes/` by default, or the value of `HERMES_HOME` environment variable.
+
+The database file, WAL file (`state.db-wal`), and shared-memory file
+(`state.db-shm`) are all created in the same directory.
--- a/website/docs/developer-guide/tools-runtime.md
+++ b/website/docs/developer-guide/tools-runtime.md
@ -22,6 +22,89 @@ Each tool module calls `registry.register(...)` at import time.

 `model_tools.py` is responsible for importing/discovering tool modules and building the schema list used by the model.

+### How `registry.register()` works
+
+Every tool file in `tools/` calls `registry.register()` at module level to declare itself. The function signature is:
+
+```python
+registry.register(
+    name="terminal",               # Unique tool name (used in API schemas)
+    toolset="terminal",            # Toolset this tool belongs to
+    schema={...},                  # OpenAI function-calling schema (description, parameters)
+    handler=handle_terminal,       # The function that executes when the tool is called
+    check_fn=check_terminal,       # Optional: returns True/False for availability
+    requires_env=["SOME_VAR"],     # Optional: env vars needed (for UI display)
+    is_async=False,                # Whether the handler is an async coroutine
+    description="Run commands",    # Human-readable description
+    emoji="💻",                    # Emoji for spinner/progress display
+)
+```
+
+Each call creates a `ToolEntry` stored in the singleton `ToolRegistry._tools` dict keyed by tool name. If a name collision occurs across toolsets, a warning is logged and the later registration wins.
+
+### Discovery: `_discover_tools()`
+
+When `model_tools.py` is imported, it calls `_discover_tools()` which imports every tool module in order:
+
+```python
+_modules = [
+    "tools.web_tools",
+    "tools.terminal_tool",
+    "tools.file_tools",
+    "tools.vision_tools",
+    "tools.mixture_of_agents_tool",
+    "tools.image_generation_tool",
+    "tools.skills_tool",
+    "tools.browser_tool",
+    "tools.cronjob_tools",
+    "tools.rl_training_tool",
+    "tools.tts_tool",
+    "tools.todo_tool",
+    "tools.memory_tool",
+    "tools.session_search_tool",
+    "tools.clarify_tool",
+    "tools.code_execution_tool",
+    "tools.delegate_tool",
+    "tools.process_registry",
+    "tools.send_message_tool",
+    "tools.honcho_tools",
+    "tools.homeassistant_tool",
+]
+```
+
+Each import triggers the module's `registry.register()` calls. Errors in optional tools (e.g., missing `fal_client` for image generation) are caught and logged — they don't prevent other tools from loading.
+
+After core tool discovery, MCP tools and plugin tools are also discovered:
+
+1. **MCP tools** — `tools.mcp_tool.discover_mcp_tools()` reads MCP server config and registers tools from external servers.
+2. **Plugin tools** — `hermes_cli.plugins.discover_plugins()` loads user/project/pip plugins that may register additional tools.
+
+## Tool availability checking (`check_fn`)
+
+Each tool can optionally provide a `check_fn` — a callable that returns `True` when the tool is available and `False` otherwise. Typical checks include:
+
+- **API key present** — e.g., `lambda: bool(os.environ.get("SERP_API_KEY"))` for web search
+- **Service running** — e.g., checking if the Honcho server is configured
+- **Binary installed** — e.g., verifying `playwright` is available for browser tools
+
+When `registry.get_definitions()` builds the schema list for the model, it runs each tool's `check_fn()`:
+
+```python
+# Simplified from registry.py
+if entry.check_fn:
+    try:
+        available = bool(entry.check_fn())
+    except Exception:
+        available = False   # Exceptions = unavailable
+    if not available:
+        continue            # Skip this tool entirely
+```
+
+Key behaviors:
+- Check results are **cached per-call** — if multiple tools share the same `check_fn`, it only runs once.
+- Exceptions in `check_fn()` are treated as "unavailable" (fail-safe).
+- The `is_toolset_available()` method checks whether a toolset's `check_fn` passes, used for UI display and toolset resolution.
+
 ## Toolset resolution

 Toolsets are named bundles of tools. Hermes resolves them through:
@ -31,10 +114,108 @@ Toolsets are named bundles of tools. Hermes resolves them through:
 - dynamic MCP toolsets
 - curated special-purpose sets like `hermes-acp`

+### How `get_tool_definitions()` filters tools
+
+The main entry point is `model_tools.get_tool_definitions(enabled_toolsets, disabled_toolsets, quiet_mode)`:
+
+1. **If `enabled_toolsets` is provided** — only tools from those toolsets are included. Each toolset name is resolved via `resolve_toolset()` which expands composite toolsets into individual tool names.
+
+2. **If `disabled_toolsets` is provided** — start with ALL toolsets, then subtract the disabled ones.
+
+3. **If neither** — include all known toolsets.
+
+4. **Registry filtering** — the resolved tool name set is passed to `registry.get_definitions()`, which applies `check_fn` filtering and returns OpenAI-format schemas.
+
+5. **Dynamic schema patching** — after filtering, `execute_code` and `browser_navigate` schemas are dynamically adjusted to only reference tools that actually passed filtering (prevents model hallucination of unavailable tools).
+
+### Legacy toolset names
+
+Old toolset names with `_tools` suffixes (e.g., `web_tools`, `terminal_tools`) are mapped to their modern tool names via `_LEGACY_TOOLSET_MAP` for backward compatibility.
+
 ## Dispatch

 At runtime, tools are dispatched through the central registry, with agent-loop exceptions for some agent-level tools such as memory/todo/session-search handling.

+### Dispatch flow: model tool_call → handler execution
+
+When the model returns a `tool_call`, the flow is:
+
+```
+Model response with tool_call
+    ↓
+run_agent.py agent loop
+    ↓
+model_tools.handle_function_call(name, args, task_id, user_task)
+    ↓
+[Agent-loop tools?] → handled directly by agent loop (todo, memory, session_search, delegate_task)
+    ↓
+[Plugin pre-hook] → invoke_hook("pre_tool_call", ...)
+    ↓
+registry.dispatch(name, args, **kwargs)
+    ↓
+Look up ToolEntry by name
+    ↓
+[Async handler?] → bridge via _run_async()
+[Sync handler?]  → call directly
+    ↓
+Return result string (or JSON error)
+    ↓
+[Plugin post-hook] → invoke_hook("post_tool_call", ...)
+```
+
+### Error wrapping
+
+All tool execution is wrapped in error handling at two levels:
+
+1. **`registry.dispatch()`** — catches any exception from the handler and returns `{"error": "Tool execution failed: ExceptionType: message"}` as JSON.
+
+2. **`handle_function_call()`** — wraps the entire dispatch in a secondary try/except that returns `{"error": "Error executing tool_name: message"}`.
+
+This ensures the model always receives a well-formed JSON string, never an unhandled exception.
+
+### Agent-loop tools
+
+Four tools are intercepted before registry dispatch because they need agent-level state (TodoStore, MemoryStore, etc.):
+
+- `todo` — planning/task tracking
+- `memory` — persistent memory writes
+- `session_search` — cross-session recall
+- `delegate_task` — spawns subagent sessions
+
+These tools' schemas are still registered in the registry (for `get_tool_definitions`), but their handlers return a stub error if dispatch somehow reaches them directly.
+
+### Async bridging
+
+When a tool handler is async, `_run_async()` bridges it to the sync dispatch path:
+
+- **CLI path (no running loop)** — uses a persistent event loop to keep cached async clients alive
+- **Gateway path (running loop)** — spins up a disposable thread with `asyncio.run()`
+- **Worker threads (parallel tools)** — uses per-thread persistent loops stored in thread-local storage
+
+## The DANGEROUS_PATTERNS approval flow
+
+The terminal tool integrates a dangerous-command approval system defined in `tools/approval.py`:
+
+1. **Pattern detection** — `DANGEROUS_PATTERNS` is a list of `(regex, description)` tuples covering destructive operations:
+   - Recursive deletes (`rm -rf`)
+   - Filesystem formatting (`mkfs`, `dd`)
+   - SQL destructive operations (`DROP TABLE`, `DELETE FROM` without `WHERE`)
+   - System config overwrites (`> /etc/`)
+   - Service manipulation (`systemctl stop`)
+   - Remote code execution (`curl | sh`)
+   - Fork bombs, process kills, etc.
+
+2. **Detection** — before executing any terminal command, `detect_dangerous_command(command)` checks against all patterns.
+
+3. **Approval prompt** — if a match is found:
+   - **CLI mode** — an interactive prompt asks the user to approve, deny, or allow permanently
+   - **Gateway mode** — an async approval callback sends the request to the messaging platform
+   - **Smart approval** — optionally, an auxiliary LLM can auto-approve low-risk commands that match patterns (e.g., `rm -rf node_modules/` is safe but matches "recursive delete")
+
+4. **Session state** — approvals are tracked per-session. Once you approve "recursive delete" for a session, subsequent `rm -rf` commands don't re-prompt.
+
+5. **Permanent allowlist** — the "allow permanently" option writes the pattern to `config.yaml`'s `command_allowlist`, persisting across sessions.
+
 ## Terminal/runtime environments

 The terminal system supports multiple backends:
--- a/website/docs/developer-guide/trajectory-format.md
+++ b/website/docs/developer-guide/trajectory-format.md
@ -1,56 +1,233 @@
---
-sidebar_position: 10
-title: "Trajectories & Training Format"
-description: "How Hermes saves trajectories, normalizes tool calls, and produces training-friendly outputs"
---
+# Trajectory Format

-# Trajectories & Training Format
+Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
+for use as training data, debugging artifacts, and reinforcement learning datasets.

-Hermes can save conversation trajectories for training, evaluation, and batch data generation workflows.
+Source files: `agent/trajectory.py`, `run_agent.py` (lines 1788-1975), `batch_runner.py`

-Primary files:

- `agent/trajectory.py`
- `run_agent.py`
- `batch_runner.py`
- `trajectory_compressor.py`
+## File Naming Convention

-## What trajectories are for
+Trajectories are written to files in the current working directory:

-Trajectory outputs are used for:
+| File | When |
+|------|------|
+| `trajectory_samples.jsonl` | Conversations that completed successfully (`completed=True`) |
+| `failed_trajectories.jsonl` | Conversations that failed or were interrupted (`completed=False`) |

- SFT data generation
- debugging agent behavior
- benchmark/evaluation artifact capture
- post-processing and compression pipelines
+The batch runner (`batch_runner.py`) writes to a custom output file per batch
+(e.g., `batch_001_output.jsonl`) with additional metadata fields.

-## Normalization strategy
+You can override the filename via the `filename` parameter in `save_trajectory()`.

-Hermes converts live conversation structure into a training-friendly format.

-Important behaviors include:
+## JSONL Entry Format

- representing reasoning in explicit markup
- converting tool calls into structured XML-like regions for dataset compatibility
- grouping tool outputs appropriately
- separating successful and failed trajectories
+Each line in the file is a self-contained JSON object. There are two variants:

-## Persistence boundaries
+### CLI/Interactive Format (from `_save_trajectory`)

-Trajectory files do **not** blindly mirror all runtime prompt state.
+```json
+{
+  "conversations": [ ... ],
+  "timestamp": "2026-03-30T14:22:31.456789",
+  "model": "anthropic/claude-sonnet-4.6",
+  "completed": true
+}
+```

-Some prompt-time-only layers are intentionally excluded from persisted trajectory content so datasets are cleaner and less environment-specific.
+### Batch Runner Format (from `batch_runner.py`)

-## Batch runner
+```json
+{
+  "prompt_index": 42,
+  "conversations": [ ... ],
+  "metadata": { "prompt_source": "gsm8k", "difficulty": "hard" },
+  "completed": true,
+  "partial": false,
+  "api_calls": 7,
+  "toolsets_used": ["code_tools", "file_tools"],
+  "tool_stats": {
+    "terminal": {"count": 3, "success": 3, "failure": 0},
+    "read_file": {"count": 2, "success": 2, "failure": 0},
+    "write_file": {"count": 0, "success": 0, "failure": 0}
+  },
+  "tool_error_counts": {
+    "terminal": 0,
+    "read_file": 0,
+    "write_file": 0
+  }
+}
+```

-`batch_runner.py` emits richer metadata than single-session trajectory saving, including:
+The `tool_stats` and `tool_error_counts` dictionaries are normalized to include
+ALL possible tools (from `model_tools.TOOL_TO_TOOLSET_MAP`) with zero defaults,
+ensuring consistent schema across entries for HuggingFace dataset loading.

- model/provider metadata
- toolset info
- partial/failure markers
- tool statistics

-## Related docs
+## Conversations Array (ShareGPT Format)

- [Environments, Benchmarks & Data Generation](./environments.md)
- [Agent Loop Internals](./agent-loop.md)
+The `conversations` array uses ShareGPT role conventions:
+
+| API Role | ShareGPT `from` |
+|----------|-----------------|
+| system | `"system"` |
+| user | `"human"` |
+| assistant | `"gpt"` |
+| tool | `"tool"` |
+
+### Complete Example
+
+```json
+{
+  "conversations": [
+    {
+      "from": "system",
+      "value": "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. If available tools are not relevant in assisting with user query, just respond in natural conversational language. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. Here are the available tools:\n<tools>\n[{\"name\": \"terminal\", \"description\": \"Execute shell commands\", \"parameters\": {\"type\": \"object\", \"properties\": {\"command\": {\"type\": \"string\"}}}, \"required\": null}]\n</tools>\nFor each function call return a JSON object, with the following pydantic model json schema for each:\n{'title': 'FunctionCall', 'type': 'object', 'properties': {'name': {'title': 'Name', 'type': 'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, 'required': ['name', 'arguments']}\nEach function call should be enclosed within <tool_call> </tool_call> XML tags.\nExample:\n<tool_call>\n{'name': <function-name>,'arguments': <args-dict>}\n</tool_call>"
+    },
+    {
+      "from": "human",
+      "value": "What Python version is installed?"
+    },
+    {
+      "from": "gpt",
+      "value": "<think>\nThe user wants to know the Python version. I should run python3 --version.\n</think>\n<tool_call>\n{\"name\": \"terminal\", \"arguments\": {\"command\": \"python3 --version\"}}\n</tool_call>"
+    },
+    {
+      "from": "tool",
+      "value": "<tool_response>\n{\"tool_call_id\": \"call_abc123\", \"name\": \"terminal\", \"content\": \"Python 3.11.6\"}\n</tool_response>"
+    },
+    {
+      "from": "gpt",
+      "value": "<think>\nGot the version. I can now answer the user.\n</think>\nPython 3.11.6 is installed on this system."
+    }
+  ],
+  "timestamp": "2026-03-30T14:22:31.456789",
+  "model": "anthropic/claude-sonnet-4.6",
+  "completed": true
+}
+```
+
+
+## Normalization Rules
+
+### Reasoning Content Markup
+
+The trajectory converter normalizes ALL reasoning into `<think>` tags, regardless
+of how the model originally produced it:
+
+1. **Native thinking tokens** (`msg["reasoning"]` field from providers like
+   Anthropic, OpenAI o-series): Wrapped as `<think>\n{reasoning}\n</think>\n`
+   and prepended before the content.
+
+2. **REASONING_SCRATCHPAD XML** (when native thinking is disabled and the model
+   reasons via system-prompt-instructed XML): `<REASONING_SCRATCHPAD>` tags are
+   converted to `<think>` via `convert_scratchpad_to_think()`.
+
+3. **Empty think blocks**: Every `gpt` turn is guaranteed to have a `<think>`
+   block. If no reasoning was produced, an empty block is inserted:
+   `<think>\n</think>\n` — this ensures consistent format for training data.
+
+### Tool Call Normalization
+
+Tool calls from the API format (with `tool_call_id`, function name, arguments as
+JSON string) are converted to XML-wrapped JSON:
+
+```
+<tool_call>
+{"name": "terminal", "arguments": {"command": "ls -la"}}
+</tool_call>
+```
+
+- Arguments are parsed from JSON strings back to objects (not double-encoded)
+- If JSON parsing fails (shouldn't happen — validated during conversation),
+  an empty `{}` is used with a warning logged
+- Multiple tool calls in one assistant turn produce multiple `<tool_call>` blocks
+  in a single `gpt` message
+
+### Tool Response Normalization
+
+All tool results following an assistant message are grouped into a single `tool`
+turn with XML-wrapped JSON responses:
+
+```
+<tool_response>
+{"tool_call_id": "call_abc123", "name": "terminal", "content": "output here"}
+</tool_response>
+```
+
+- If tool content looks like JSON (starts with `{` or `[`), it's parsed so the
+  content field contains a JSON object/array rather than a string
+- Multiple tool results are joined with newlines in one message
+- The tool name is matched by position against the parent assistant's `tool_calls`
+  array
+
+### System Message
+
+The system message is generated at save time (not taken from the conversation).
+It follows the Hermes function-calling prompt template with:
+
+- Preamble explaining the function-calling protocol
+- `<tools>` XML block containing the JSON tool definitions
+- Schema reference for `FunctionCall` objects
+- `<tool_call>` example
+
+Tool definitions include `name`, `description`, `parameters`, and `required`
+(set to `null` to match the canonical format).
+
+
+## Loading Trajectories
+
+Trajectories are standard JSONL — load with any JSON-lines reader:
+
+```python
+import json
+
+def load_trajectories(path: str):
+    """Load trajectory entries from a JSONL file."""
+    entries = []
+    with open(path, "r", encoding="utf-8") as f:
+        for line in f:
+            line = line.strip()
+            if line:
+                entries.append(json.loads(line))
+    return entries
+
+# Filter to successful completions only
+successful = [e for e in load_trajectories("trajectory_samples.jsonl")
+              if e.get("completed")]
+
+# Extract just the conversations for training
+training_data = [e["conversations"] for e in successful]
+```
+
+### Loading for HuggingFace Datasets
+
+```python
+from datasets import load_dataset
+
+ds = load_dataset("json", data_files="trajectory_samples.jsonl")
+```
+
+The normalized `tool_stats` schema ensures all entries have the same columns,
+preventing Arrow schema mismatch errors during dataset loading.
+
+
+## Controlling Trajectory Saving
+
+In the CLI, trajectory saving is controlled by:
+
+```yaml
+# config.yaml
+agent:
+  save_trajectories: true  # default: false
+```
+
+Or via the `--save-trajectories` flag. When the agent initializes with
+`save_trajectories=True`, the `_save_trajectory()` method is called at the end
+of each conversation turn.
+
+The batch runner always saves trajectories (that's its primary purpose).
+
+Samples with zero reasoning across all turns are automatically discarded by the
+batch runner to avoid polluting training data with non-reasoning examples.
--- a/website/docs/getting-started/quickstart.md
+++ b/website/docs/getting-started/quickstart.md
@ -54,11 +54,14 @@ hermes setup       # Or configure everything at once
 | **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` |
 | **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` |
 | **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` |
+| **DeepSeek** | Direct DeepSeek API access | Set `DEEPSEEK_API_KEY` |
+| **GitHub Copilot** | GitHub Copilot subscription (GPT-5.x, Claude, Gemini, etc.) | OAuth via `hermes model`, or `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` |
+| **GitHub Copilot ACP** | Copilot ACP agent backend (spawns local `copilot` CLI) | `hermes model` (requires `copilot` CLI + `copilot login`) |
 | **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
 | **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key |

 :::tip
-You can switch providers at any time with `hermes model` — no code changes, no lock-in. When configuring a custom endpoint, Hermes will prompt for the context window size and auto-detect it when possible. See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for details.
+You can switch providers at any time with `hermes model` — no code changes, no lock-in. When configuring a custom endpoint, Hermes will prompt for the context window size and auto-detect it when possible. See [Context Length Detection](../integrations/providers.md#context-length-detection) for details.
 :::

 ## 3. Start Chatting
--- a/website/docs/getting-started/updating.md
+++ b/website/docs/getting-started/updating.md
@ -20,6 +20,43 @@ This pulls the latest code, updates dependencies, and prompts you to configure a
 `hermes update` automatically detects new configuration options and prompts you to add them. If you skipped that prompt, you can manually run `hermes config check` to see missing options, then `hermes config migrate` to interactively add them.
 :::

+### What happens during an update
+
+When you run `hermes update`, the following steps occur:
+
+1. **Git pull** — pulls the latest code from the `main` branch and updates submodules
+2. **Dependency install** — runs `uv pip install -e ".[all]"` to pick up new or changed dependencies
+3. **Config migration** — detects new config options added since your version and prompts you to set them
+4. **Gateway auto-restart** — if the gateway service is running (systemd on Linux, launchd on macOS), it is **automatically restarted** after the update completes so the new code takes effect immediately
+
+Expected output looks like:
+
+```
+$ hermes update
+Updating Hermes Agent...
+📥 Pulling latest code...
+Already up to date.  (or: Updating abc1234..def5678)
+📦 Updating dependencies...
+✅ Dependencies updated
+🔍 Checking for new config options...
+✅ Config is up to date  (or: Found 2 new options — running migration...)
+🔄 Restarting gateway service...
+✅ Gateway restarted
+✅ Hermes Agent updated successfully!
+```
+
+### Checking your current version
+
+```bash
+hermes version
+```
+
+Compare against the latest release at the [GitHub releases page](https://github.com/NousResearch/hermes-agent/releases) or check for available updates:
+
+```bash
+hermes update --check
+```
+
 ### Updating from Messaging Platforms

 You can also update directly from Telegram, Discord, Slack, or WhatsApp by sending:
@ -28,7 +65,7 @@ You can also update directly from Telegram, Discord, Slack, or WhatsApp by sendi
 /update
 ```

-This pulls the latest code, updates dependencies, and restarts the gateway.
+This pulls the latest code, updates dependencies, and restarts the gateway. The bot will briefly go offline during the restart (typically 5–15 seconds) and then resume.

 ### Manual Update

@ -51,6 +88,57 @@ hermes config check
 hermes config migrate   # Interactively add any missing options
 ```

+### Rollback instructions
+
+If an update introduces a problem, you can roll back to a previous version:
+
+```bash
+cd /path/to/hermes-agent
+
+# List recent versions
+git log --oneline -10
+
+# Roll back to a specific commit
+git checkout <commit-hash>
+git submodule update --init --recursive
+uv pip install -e ".[all]"
+
+# Restart the gateway if running
+hermes gateway restart
+```
+
+To roll back to a specific release tag:
+
+```bash
+git checkout v0.6.0
+git submodule update --init --recursive
+uv pip install -e ".[all]"
+```
+
+:::warning
+Rolling back may cause config incompatibilities if new options were added. Run `hermes config check` after rolling back and remove any unrecognized options from `config.yaml` if you encounter errors.
+:::
+
+### Note for Nix users
+
+If you installed via Nix flake, updates are managed through the Nix package manager:
+
+```bash
+# Update the flake input
+nix flake update hermes-agent
+
+# Or rebuild with the latest
+nix profile upgrade hermes-agent
+```
+
+Nix installations are immutable — rollback is handled by Nix's generation system:
+
+```bash
+nix profile rollback
+```
+
+See [Nix Setup](./nix-setup.md) for more details.
+
 ---

 ## Uninstalling
--- a/website/docs/guides/build-a-hermes-plugin.md
+++ b/website/docs/guides/build-a-hermes-plugin.md
@ -1,5 +1,8 @@
 ---
-sidebar_position: 10
+sidebar_position: 8
+sidebar_label: "Build a Plugin"
+title: "Build a Hermes Plugin"
+description: "Step-by-step guide to building a complete Hermes plugin with tools, hooks, data files, and skills"
 ---

 # Build a Hermes Plugin
--- a/website/docs/integrations/index.md
+++ b/website/docs/integrations/index.md
@ -0,0 +1,82 @@
+---
+title: "Integrations"
+sidebar_label: "Overview"
+sidebar_position: 0
+---
+
+# Integrations
+
+Hermes Agent connects to external systems for AI inference, tool servers, IDE workflows, programmatic access, and more. These integrations extend what Hermes can do and where it can run.
+
+## AI Providers & Routing
+
+Hermes supports multiple AI inference providers out of the box. Use `hermes model` to configure interactively, or set them in `config.yaml`.
+
+- **[AI Providers](/docs/user-guide/features/provider-routing)** — OpenRouter, Anthropic, OpenAI, Google, and any OpenAI-compatible endpoint. Hermes auto-detects capabilities like vision, streaming, and tool use per provider.
+- **[Provider Routing](/docs/user-guide/features/provider-routing)** — Fine-grained control over which underlying providers handle your OpenRouter requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and explicit priority ordering.
+- **[Fallback Providers](/docs/user-guide/features/fallback-providers)** — Automatic failover to backup LLM providers when your primary model encounters errors. Includes primary model fallback and independent auxiliary task fallback for vision, compression, and web extraction.
+
+## Tool Servers (MCP)
+
+- **[MCP Servers](/docs/user-guide/features/mcp)** — Connect Hermes to external tool servers via Model Context Protocol. Access tools from GitHub, databases, file systems, browser stacks, internal APIs, and more without writing native Hermes tools. Supports both stdio and SSE transports, per-server tool filtering, and capability-aware resource/prompt registration.
+
+## Web Search Backends
+
+The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers, configured via `config.yaml` or `hermes tools`:
+
+| Backend | Env Var | Search | Extract | Crawl |
+|---------|---------|--------|---------|-------|
+| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
+| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
+| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
+| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |
+
+Quick setup example:
+
+```yaml
+web:
+  backend: firecrawl    # firecrawl | parallel | tavily | exa
+```
+
+If `web.backend` is not set, the backend is auto-detected from whichever API key is available. Self-hosted Firecrawl is also supported via `FIRECRAWL_API_URL`.
+
+## Browser Automation
+
+Hermes includes full browser automation with multiple backend options for navigating websites, filling forms, and extracting information:
+
+- **Browserbase** — Managed cloud browsers with anti-bot tooling, CAPTCHA solving, and residential proxies
+- **Browser Use** — Alternative cloud browser provider
+- **Local Chrome via CDP** — Connect to your running Chrome instance using `/browser connect`
+- **Local Chromium** — Headless local browser via the `agent-browser` CLI
+
+See [Browser Automation](/docs/user-guide/features/browser) for setup and usage.
+
+## Voice & TTS Providers
+
+Text-to-speech and speech-to-text across all messaging platforms:
+
+| Provider | Quality | Cost | API Key |
+|----------|---------|------|---------|
+| **Edge TTS** (default) | Good | Free | None needed |
+| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
+| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
+| **NeuTTS** | Good | Free | None needed |
+
+Speech-to-text uses Whisper for voice message transcription on Telegram, Discord, and WhatsApp. See [Voice & TTS](/docs/user-guide/features/tts) and [Voice Mode](/docs/user-guide/features/voice-mode) for details.
+
+## IDE & Editor Integration
+
+- **[IDE Integration (ACP)](/docs/user-guide/features/acp)** — Use Hermes Agent inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Hermes runs as an ACP server, rendering chat messages, tool activity, file diffs, and terminal commands inside your editor.
+
+## Programmatic Access
+
+- **[API Server](/docs/user-guide/features/api-server)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox — can connect and use Hermes as a backend with its full toolset.
+
+## Memory & Personalization
+
+- **[Honcho Memory](/docs/user-guide/features/honcho)** — AI-native persistent memory for cross-session user modeling and personalization. Honcho adds deep user modeling via dialectic reasoning on top of Hermes's built-in memory system.
+
+## Training & Evaluation
+
+- **[RL Training](/docs/user-guide/features/rl-training)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
+- **[Batch Processing](/docs/user-guide/features/batch-processing)** — Run the agent across hundreds of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.
--- a/website/docs/integrations/providers.md
+++ b/website/docs/integrations/providers.md
@ -0,0 +1,834 @@
+---
+title: "AI Providers"
+sidebar_label: "AI Providers"
+sidebar_position: 1
+---
+
+# AI Providers
+
+This page covers setting up inference providers for Hermes Agent — from cloud APIs like OpenRouter and Anthropic, to self-hosted endpoints like Ollama and vLLM, to advanced routing and fallback configurations. You need at least one provider configured to use Hermes.
+
+## Inference Providers
+
+You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
+
+| Provider | Setup |
+|----------|-------|
+| **Nous Portal** | `hermes model` (OAuth, subscription-based) |
+| **OpenAI Codex** | `hermes model` (ChatGPT OAuth, uses Codex models) |
+| **GitHub Copilot** | `hermes model` (OAuth device code flow, `COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, or `gh auth token`) |
+| **GitHub Copilot ACP** | `hermes model` (spawns local `copilot --acp --stdio`) |
+| **Anthropic** | `hermes model` (Claude Pro/Max via Claude Code auth, Anthropic API key, or manual setup-token) |
+| **OpenRouter** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
+| **AI Gateway** | `AI_GATEWAY_API_KEY` in `~/.hermes/.env` (provider: `ai-gateway`) |
+| **z.ai / GLM** | `GLM_API_KEY` in `~/.hermes/.env` (provider: `zai`) |
+| **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) |
+| **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) |
+| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
+| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
+| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
+| **OpenCode Zen** | `OPENCODE_ZEN_API_KEY` in `~/.hermes/.env` (provider: `opencode-zen`) |
+| **OpenCode Go** | `OPENCODE_GO_API_KEY` in `~/.hermes/.env` (provider: `opencode-go`) |
+| **DeepSeek** | `DEEPSEEK_API_KEY` in `~/.hermes/.env` (provider: `deepseek`) |
+| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) |
+| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
+
+:::tip Model key alias
+In the `model:` config section, you can use either `default:` or `model:` as the key name for your model ID. Both `model: { default: my-model }` and `model: { model: my-model }` work identically.
+:::
+
+:::info Codex Note
+The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under `~/.hermes/auth.json` and can import existing Codex CLI credentials from `~/.codex/auth.json` when present. No Codex CLI installation is required.
+:::
+
+:::warning
+Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An `OPENROUTER_API_KEY` enables these tools automatically. You can also configure which model and provider these tools use — see [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models).
+:::
+
+### Anthropic (Native)
+
+Use Claude models directly through the Anthropic API — no OpenRouter proxy needed. Supports three auth methods:
+
+```bash
+# With an API key (pay-per-token)
+export ANTHROPIC_API_KEY=***
+hermes chat --provider anthropic --model claude-sonnet-4-6
+
+# Preferred: authenticate through `hermes model`
+# Hermes will use Claude Code's credential store directly when available
+hermes model
+
+# Manual override with a setup-token (fallback / legacy)
+export ANTHROPIC_TOKEN=***  # setup-token or manual OAuth token
+hermes chat --provider anthropic
+
+# Auto-detect Claude Code credentials (if you already use Claude Code)
+hermes chat --provider anthropic  # reads Claude Code credential files automatically
+```
+
+When you choose Anthropic OAuth through `hermes model`, Hermes prefers Claude Code's own credential store over copying the token into `~/.hermes/.env`. That keeps refreshable Claude credentials refreshable.
+
+Or set it permanently:
+```yaml
+model:
+  provider: "anthropic"
+  default: "claude-sonnet-4-6"
+```
+
+:::tip Aliases
+`--provider claude` and `--provider claude-code` also work as shorthand for `--provider anthropic`.
+:::
+
+### GitHub Copilot
+
+Hermes supports GitHub Copilot as a first-class provider with two modes:
+
+**`copilot` — Direct Copilot API** (recommended). Uses your GitHub Copilot subscription to access GPT-5.x, Claude, Gemini, and other models through the Copilot API.
+
+```bash
+hermes chat --provider copilot --model gpt-5.4
+```
+
+**Authentication options** (checked in this order):
+
+1. `COPILOT_GITHUB_TOKEN` environment variable
+2. `GH_TOKEN` environment variable
+3. `GITHUB_TOKEN` environment variable
+4. `gh auth token` CLI fallback
+
+If no token is found, `hermes model` offers an **OAuth device code login** — the same flow used by the Copilot CLI and opencode.
+
+:::warning Token types
+The Copilot API does **not** support classic Personal Access Tokens (`ghp_*`). Supported token types:
+
+| Type | Prefix | How to get |
+|------|--------|------------|
+| OAuth token | `gho_` | `hermes model` → GitHub Copilot → Login with GitHub |
+| Fine-grained PAT | `github_pat_` | GitHub Settings → Developer settings → Fine-grained tokens (needs **Copilot Requests** permission) |
+| GitHub App token | `ghu_` | Via GitHub App installation |
+
+If your `gh auth token` returns a `ghp_*` token, use `hermes model` to authenticate via OAuth instead.
+:::
+
+**API routing**: GPT-5+ models (except `gpt-5-mini`) automatically use the Responses API. All other models (GPT-4o, Claude, Gemini, etc.) use Chat Completions. Models are auto-detected from the live Copilot catalog.
+
+**`copilot-acp` — Copilot ACP agent backend**. Spawns the local Copilot CLI as a subprocess:
+
+```bash
+hermes chat --provider copilot-acp --model copilot-acp
+# Requires the GitHub Copilot CLI in PATH and an existing `copilot login` session
+```
+
+**Permanent config:**
+```yaml
+model:
+  provider: "copilot"
+  default: "gpt-5.4"
+```
+
+| Environment variable | Description |
+|---------------------|-------------|
+| `COPILOT_GITHUB_TOKEN` | GitHub token for Copilot API (first priority) |
+| `HERMES_COPILOT_ACP_COMMAND` | Override the Copilot CLI binary path (default: `copilot`) |
+| `HERMES_COPILOT_ACP_ARGS` | Override ACP args (default: `--acp --stdio`) |
+
+### First-Class Chinese AI Providers
+
+These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:
+
+```bash
+# z.ai / ZhipuAI GLM
+hermes chat --provider zai --model glm-4-plus
+# Requires: GLM_API_KEY in ~/.hermes/.env
+
+# Kimi / Moonshot AI
+hermes chat --provider kimi-coding --model moonshot-v1-auto
+# Requires: KIMI_API_KEY in ~/.hermes/.env
+
+# MiniMax (global endpoint)
+hermes chat --provider minimax --model MiniMax-M2.7
+# Requires: MINIMAX_API_KEY in ~/.hermes/.env
+
+# MiniMax (China endpoint)
+hermes chat --provider minimax-cn --model MiniMax-M2.7
+# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
+
+# Alibaba Cloud / DashScope (Qwen models)
+hermes chat --provider alibaba --model qwen3.5-plus
+# Requires: DASHSCOPE_API_KEY in ~/.hermes/.env
+```
+
+Or set the provider permanently in `config.yaml`:
+```yaml
+model:
+  provider: "zai"       # or: kimi-coding, minimax, minimax-cn, alibaba
+  default: "glm-4-plus"
+```
+
+Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
+
+### Hugging Face Inference Providers
+
+[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
+
+```bash
+# Use any available model
+hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507
+# Requires: HF_TOKEN in ~/.hermes/.env
+
+# Short alias
+hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2
+```
+
+Or set it permanently in `config.yaml`:
+```yaml
+model:
+  provider: "huggingface"
+  default: "Qwen/Qwen3-235B-A22B-Thinking-2507"
+```
+
+Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates).
+
+You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend.
+
+The base URL can be overridden with `HF_BASE_URL`.
+
+## Custom & Self-Hosted LLM Providers
+
+Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
+
+### General Setup
+
+Three ways to configure a custom endpoint:
+
+**Interactive setup (recommended):**
+```bash
+hermes model
+# Select "Custom endpoint (self-hosted / VLLM / etc.)"
+# Enter: API base URL, API key, Model name
+```
+
+**Manual config (`config.yaml`):**
+```yaml
+# In ~/.hermes/config.yaml
+model:
+  default: your-model-name
+  provider: custom
+  base_url: http://localhost:8000/v1
+  api_key: your-key-or-leave-empty-for-local
+```
+
+:::warning Legacy env vars
+`OPENAI_BASE_URL` and `LLM_MODEL` in `.env` are **deprecated**. The CLI ignores `LLM_MODEL` entirely (only the gateway reads it). Use `hermes model` or edit `config.yaml` directly — both persist correctly across restarts and Docker containers.
+:::
+
+Both approaches persist to `config.yaml`, which is the source of truth for model, provider, and base URL.
+
+### Switching Models with `/model`
+
+Once a custom endpoint is configured, you can switch models mid-session:
+
+```
+/model custom:qwen-2.5          # Switch to a model on your custom endpoint
+/model custom                    # Auto-detect the model from the endpoint
+/model openrouter:claude-sonnet-4 # Switch back to a cloud provider
+```
+
+If you have **named custom providers** configured (see below), use the triple syntax:
+
+```
+/model custom:local:qwen-2.5    # Use the "local" custom provider with model qwen-2.5
+/model custom:work:llama3       # Use the "work" custom provider with llama3
+```
+
+When switching providers, Hermes persists the base URL and provider to config so the change survives restarts. When switching away from a custom endpoint to a built-in provider, the stale base URL is automatically cleared.
+
+:::tip
+`/model custom` (bare, no model name) queries your endpoint's `/models` API and auto-selects the model if exactly one is loaded. Useful for local servers running a single model.
+:::
+
+Everything below follows this same pattern — just change the URL, key, and model name.
+
+---
+
+### Ollama — Local Models, Zero Config
+
+[Ollama](https://ollama.com/) runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use. Supports tool calling via the OpenAI-compatible API.
+
+```bash
+# Install and run a model
+ollama pull qwen2.5-coder:32b
+ollama serve   # Starts on port 11434
+```
+
+Then configure Hermes:
+
+```bash
+hermes model
+# Select "Custom endpoint (self-hosted / VLLM / etc.)"
+# Enter URL: http://localhost:11434/v1
+# Skip API key (Ollama doesn't need one)
+# Enter model name (e.g. qwen2.5-coder:32b)
+```
+
+Or configure `config.yaml` directly:
+
+```yaml
+model:
+  default: qwen2.5-coder:32b
+  provider: custom
+  base_url: http://localhost:11434/v1
+  context_length: 32768   # See warning below
+```
+
+:::caution Ollama defaults to very low context lengths
+Ollama does **not** use your model's full context window by default. Depending on your VRAM, the default is:
+
+| Available VRAM | Default context |
+|----------------|----------------|
+| Less than 24 GB | **4,096 tokens** |
+| 24–48 GB | 32,768 tokens |
+| 48+ GB | 256,000 tokens |
+
+For agent use with tools, **you need at least 16k–32k context**. At 4k, the system prompt + tool schemas alone can fill the window, leaving no room for conversation.
+
+**How to increase it** (pick one):
+
+```bash
+# Option 1: Set server-wide via environment variable (recommended)
+OLLAMA_CONTEXT_LENGTH=32768 ollama serve
+
+# Option 2: For systemd-managed Ollama
+sudo systemctl edit ollama.service
+# Add: Environment="OLLAMA_CONTEXT_LENGTH=32768"
+# Then: sudo systemctl daemon-reload && sudo systemctl restart ollama
+
+# Option 3: Bake it into a custom model (persistent per-model)
+echo -e "FROM qwen2.5-coder:32b\nPARAMETER num_ctx 32768" > Modelfile
+ollama create qwen2.5-coder-32k -f Modelfile
+```
+
+**You cannot set context length through the OpenAI-compatible API** (`/v1/chat/completions`). It must be configured server-side or via a Modelfile. This is the #1 source of confusion when integrating Ollama with tools like Hermes.
+:::
+
+**Verify your context is set correctly:**
+
+```bash
+ollama ps
+# Look at the CONTEXT column — it should show your configured value
+```
+
+:::tip
+List available models with `ollama list`. Pull any model from the [Ollama library](https://ollama.com/library) with `ollama pull <model>`. Ollama handles GPU offloading automatically — no configuration needed for most setups.
+:::
+
+---
+
+### vLLM — High-Performance GPU Inference
+
+[vLLM](https://docs.vllm.ai/) is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.
+
+```bash
+pip install vllm
+vllm serve meta-llama/Llama-3.1-70B-Instruct \
+  --port 8000 \
+  --max-model-len 65536 \
+  --tensor-parallel-size 2 \
+  --enable-auto-tool-choice \
+  --tool-call-parser hermes
+```
+
+Then configure Hermes:
+
+```bash
+hermes model
+# Select "Custom endpoint (self-hosted / VLLM / etc.)"
+# Enter URL: http://localhost:8000/v1
+# Skip API key (or enter one if you configured vLLM with --api-key)
+# Enter model name: meta-llama/Llama-3.1-70B-Instruct
+```
+
+**Context length:** vLLM reads the model's `max_position_embeddings` by default. If that exceeds your GPU memory, it errors and asks you to set `--max-model-len` lower. You can also use `--max-model-len auto` to automatically find the maximum that fits. Set `--gpu-memory-utilization 0.95` (default 0.9) to squeeze more context into VRAM.
+
+**Tool calling requires explicit flags:**
+
+| Flag | Purpose |
+|------|---------|
+| `--enable-auto-tool-choice` | Required for `tool_choice: "auto"` (the default in Hermes) |
+| `--tool-call-parser <name>` | Parser for the model's tool call format |
+
+Supported parsers: `hermes` (Qwen 2.5, Hermes 2/3), `llama3_json` (Llama 3.x), `mistral`, `deepseek_v3`, `deepseek_v31`, `xlam`, `pythonic`. Without these flags, tool calls won't work — the model will output tool calls as text.
+
+:::tip
+vLLM supports human-readable sizes: `--max-model-len 64k` (lowercase k = 1000, uppercase K = 1024).
+:::
+
+---
+
+### SGLang — Fast Serving with RadixAttention
+
+[SGLang](https://github.com/sgl-project/sglang) is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.
+
+```bash
+pip install "sglang[all]"
+python -m sglang.launch_server \
+  --model meta-llama/Llama-3.1-70B-Instruct \
+  --port 30000 \
+  --context-length 65536 \
+  --tp 2 \
+  --tool-call-parser qwen
+```
+
+Then configure Hermes:
+
+```bash
+hermes model
+# Select "Custom endpoint (self-hosted / VLLM / etc.)"
+# Enter URL: http://localhost:30000/v1
+# Enter model name: meta-llama/Llama-3.1-70B-Instruct
+```
+
+**Context length:** SGLang reads from the model's config by default. Use `--context-length` to override. If you need to exceed the model's declared maximum, set `SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1`.
+
+**Tool calling:** Use `--tool-call-parser` with the appropriate parser for your model family: `qwen` (Qwen 2.5), `llama3`, `llama4`, `deepseekv3`, `mistral`, `glm`. Without this flag, tool calls come back as plain text.
+
+:::caution SGLang defaults to 128 max output tokens
+If responses seem truncated, add `max_tokens` to your requests or set `--default-max-tokens` on the server. SGLang's default is only 128 tokens per response if not specified in the request.
+:::
+
+---
+
+### llama.cpp / llama-server — CPU & Metal Inference
+
+[llama.cpp](https://github.com/ggml-org/llama.cpp) runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.
+
+```bash
+# Build and start llama-server
+cmake -B build && cmake --build build --config Release
+./build/bin/llama-server \
+  --jinja -fa \
+  -c 32768 \
+  -ngl 99 \
+  -m models/qwen2.5-coder-32b-instruct-Q4_K_M.gguf \
+  --port 8080 --host 0.0.0.0
+```
+
+**Context length (`-c`):** Recent builds default to `0` which reads the model's training context from the GGUF metadata. For models with 128k+ training context, this can OOM trying to allocate the full KV cache. Set `-c` explicitly to what you need (32k–64k is a good range for agent use). If using parallel slots (`-np`), the total context is divided among slots — with `-c 32768 -np 4`, each slot only gets 8k.
+
+Then configure Hermes to point at it:
+
+```bash
+hermes model
+# Select "Custom endpoint (self-hosted / VLLM / etc.)"
+# Enter URL: http://localhost:8080/v1
+# Skip API key (local servers don't need one)
+# Enter model name — or leave blank to auto-detect if only one model is loaded
+```
+
+This saves the endpoint to `config.yaml` so it persists across sessions.
+
+:::caution `--jinja` is required for tool calling
+Without `--jinja`, llama-server ignores the `tools` parameter entirely. The model will try to call tools by writing JSON in its response text, but Hermes won't recognize it as a tool call — you'll see raw JSON like `{"name": "web_search", ...}` printed as a message instead of an actual search.
+
+Native tool calling support (best performance): Llama 3.x, Qwen 2.5 (including Coder), Hermes 2/3, Mistral, DeepSeek, Functionary. All other models use a generic handler that works but may be less efficient. See the [llama.cpp function calling docs](https://github.com/ggml-org/llama.cpp/blob/master/docs/function-calling.md) for the full list.
+
+You can verify tool support is active by checking `http://localhost:8080/props` — the `chat_template` field should be present.
+:::
+
+:::tip
+Download GGUF models from [Hugging Face](https://huggingface.co/models?library=gguf). Q4_K_M quantization offers the best balance of quality vs. memory usage.
+:::
+
+---
+
+### LM Studio — Desktop App with Local Models
+
+[LM Studio](https://lmstudio.ai/) is a desktop app for running local models with a GUI. Best for: users who prefer a visual interface, quick model testing, developers on macOS/Windows/Linux.
+
+Start the server from the LM Studio app (Developer tab → Start Server), or use the CLI:
+
+```bash
+lms server start                        # Starts on port 1234
+lms load qwen2.5-coder --context-length 32768
+```
+
+Then configure Hermes:
+
+```bash
+hermes model
+# Select "Custom endpoint (self-hosted / VLLM / etc.)"
+# Enter URL: http://localhost:1234/v1
+# Skip API key (LM Studio doesn't require one)
+# Enter model name
+```
+
+:::caution Context length often defaults to 2048
+LM Studio reads context length from the model's metadata, but many GGUF models report low defaults (2048 or 4096). **Always set context length explicitly** in the LM Studio model settings:
+
+1. Click the gear icon next to the model picker
+2. Set "Context Length" to at least 16384 (preferably 32768)
+3. Reload the model for the change to take effect
+
+Alternatively, use the CLI: `lms load model-name --context-length 32768`
+
+To set persistent per-model defaults: My Models tab → gear icon on the model → set context size.
+:::
+
+**Tool calling:** Supported since LM Studio 0.3.6. Models with native tool-calling training (Qwen 2.5, Llama 3.x, Mistral, Hermes) are auto-detected and shown with a tool badge. Other models use a generic fallback that may be less reliable.
+
+---
+
+### Troubleshooting Local Models
+
+These issues affect **all** local inference servers when used with Hermes.
+
+#### Tool calls appear as text instead of executing
+
+The model outputs something like `{"name": "web_search", "arguments": {...}}` as a message instead of actually calling the tool.
+
+**Cause:** Your server doesn't have tool calling enabled, or the model doesn't support it through the server's tool calling implementation.
+
+| Server | Fix |
+|--------|-----|
+| **llama.cpp** | Add `--jinja` to the startup command |
+| **vLLM** | Add `--enable-auto-tool-choice --tool-call-parser hermes` |
+| **SGLang** | Add `--tool-call-parser qwen` (or appropriate parser) |
+| **Ollama** | Tool calling is enabled by default — make sure your model supports it (check with `ollama show model-name`) |
+| **LM Studio** | Update to 0.3.6+ and use a model with native tool support |
+
+#### Model seems to forget context or give incoherent responses
+
+**Cause:** Context window is too small. When the conversation exceeds the context limit, most servers silently drop older messages. Hermes's system prompt + tool schemas alone can use 4k–8k tokens.
+
+**Diagnosis:**
+
+```bash
+# Check what Hermes thinks the context is
+# Look at startup line: "Context limit: X tokens"
+
+# Check your server's actual context
+# Ollama: ollama ps (CONTEXT column)
+# llama.cpp: curl http://localhost:8080/props | jq '.default_generation_settings.n_ctx'
+# vLLM: check --max-model-len in startup args
+```
+
+**Fix:** Set context to at least **32,768 tokens** for agent use. See each server's section above for the specific flag.
+
+#### "Context limit: 2048 tokens" at startup
+
+Hermes auto-detects context length from your server's `/v1/models` endpoint. If the server reports a low value (or doesn't report one at all), Hermes uses the model's declared limit which may be wrong.
+
+**Fix:** Set it explicitly in `config.yaml`:
+
+```yaml
+model:
+  default: your-model
+  provider: custom
+  base_url: http://localhost:11434/v1
+  context_length: 32768
+```
+
+#### Responses get cut off mid-sentence
+
+**Possible causes:**
+1. **Low `max_tokens` on the server** — SGLang defaults to 128 tokens per response. Set `--default-max-tokens` on the server or configure Hermes with `model.max_tokens` in config.yaml.
+2. **Context exhaustion** — The model filled its context window. Increase context length or enable [context compression](/docs/user-guide/configuration#context-compression) in Hermes.
+
+---
+
+### LiteLLM Proxy — Multi-Provider Gateway
+
+[LiteLLM](https://docs.litellm.ai/) is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.
+
+```bash
+# Install and start
+pip install "litellm[proxy]"
+litellm --model anthropic/claude-sonnet-4 --port 4000
+
+# Or with a config file for multiple models:
+litellm --config litellm_config.yaml --port 4000
+```
+
+Then configure Hermes with `hermes model` → Custom endpoint → `http://localhost:4000/v1`.
+
+Example `litellm_config.yaml` with fallback:
+```yaml
+model_list:
+  - model_name: "best"
+    litellm_params:
+      model: anthropic/claude-sonnet-4
+      api_key: sk-ant-...
+  - model_name: "best"
+    litellm_params:
+      model: openai/gpt-4o
+      api_key: sk-...
+router_settings:
+  routing_strategy: "latency-based-routing"
+```
+
+---
+
+### ClawRouter — Cost-Optimized Routing
+
+[ClawRouter](https://github.com/BlockRunAI/ClawRouter) by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).
+
+```bash
+# Install and start
+npx @blockrun/clawrouter    # Starts on port 8402
+```
+
+Then configure Hermes with `hermes model` → Custom endpoint → `http://localhost:8402/v1` → model name `blockrun/auto`.
+
+Routing profiles:
+| Profile | Strategy | Savings |
+|---------|----------|---------|
+| `blockrun/auto` | Balanced quality/cost | 74-100% |
+| `blockrun/eco` | Cheapest possible | 95-100% |
+| `blockrun/premium` | Best quality models | 0% |
+| `blockrun/free` | Free models only | 100% |
+| `blockrun/agentic` | Optimized for tool use | varies |
+
+:::note
+ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run `npx @blockrun/clawrouter doctor` to check wallet status.
+:::
+
+---
+
+### Other Compatible Providers
+
+Any service with an OpenAI-compatible API works. Some popular options:
+
+| Provider | Base URL | Notes |
+|----------|----------|-------|
+| [Together AI](https://together.ai) | `https://api.together.xyz/v1` | Cloud-hosted open models |
+| [Groq](https://groq.com) | `https://api.groq.com/openai/v1` | Ultra-fast inference |
+| [DeepSeek](https://deepseek.com) | `https://api.deepseek.com/v1` | DeepSeek models |
+| [Fireworks AI](https://fireworks.ai) | `https://api.fireworks.ai/inference/v1` | Fast open model hosting |
+| [Cerebras](https://cerebras.ai) | `https://api.cerebras.ai/v1` | Wafer-scale chip inference |
+| [Mistral AI](https://mistral.ai) | `https://api.mistral.ai/v1` | Mistral models |
+| [OpenAI](https://openai.com) | `https://api.openai.com/v1` | Direct OpenAI access |
+| [Azure OpenAI](https://azure.microsoft.com) | `https://YOUR.openai.azure.com/` | Enterprise OpenAI |
+| [LocalAI](https://localai.io) | `http://localhost:8080/v1` | Self-hosted, multi-model |
+| [Jan](https://jan.ai) | `http://localhost:1337/v1` | Desktop app with local models |
+
+Configure any of these with `hermes model` → Custom endpoint, or in `config.yaml`:
+
+```yaml
+model:
+  default: meta-llama/Llama-3.1-70B-Instruct-Turbo
+  provider: custom
+  base_url: https://api.together.xyz/v1
+  api_key: your-together-key
+```
+
+---
+
+### Context Length Detection
+
+Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
+
+1. **Config override** — `model.context_length` in config.yaml (highest priority)
+2. **Custom provider per-model** — `custom_providers[].models.<id>.context_length`
+3. **Persistent cache** — previously discovered values (survives restarts)
+4. **Endpoint `/models`** — queries your server's API (local/custom endpoints)
+5. **Anthropic `/v1/models`** — queries Anthropic's API for `max_input_tokens` (API-key users only)
+6. **OpenRouter API** — live model metadata from OpenRouter
+7. **Nous Portal** — suffix-matches Nous model IDs against OpenRouter metadata
+8. **[models.dev](https://models.dev)** — community-maintained registry with provider-specific context lengths for 3800+ models across 100+ providers
+9. **Fallback defaults** — broad model family patterns (128K default)
+
+For most setups this works out of the box. The system is provider-aware — the same model can have different context limits depending on who serves it (e.g., `claude-opus-4.6` is 1M on Anthropic direct but 128K on GitHub Copilot).
+
+To set the context length explicitly, add `context_length` to your model config:
+
+```yaml
+model:
+  default: "qwen3.5:9b"
+  base_url: "http://localhost:8080/v1"
+  context_length: 131072  # tokens
+```
+
+For custom endpoints, you can also set context length per model:
+
+```yaml
+custom_providers:
+  - name: "My Local LLM"
+    base_url: "http://localhost:11434/v1"
+    models:
+      qwen3.5:27b:
+        context_length: 32768
+      deepseek-r1:70b:
+        context_length: 65536
+```
+
+`hermes model` will prompt for context length when configuring a custom endpoint. Leave it blank for auto-detection.
+
+:::tip When to set this manually
+- You're using Ollama with a custom `num_ctx` that's lower than the model's maximum
+- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
+- You're running behind a proxy that doesn't expose `/v1/models`
+:::
+
+---
+
+### Named Custom Providers
+
+If you work with multiple custom endpoints (e.g., a local dev server and a remote GPU server), you can define them as named custom providers in `config.yaml`:
+
+```yaml
+custom_providers:
+  - name: local
+    base_url: http://localhost:8080/v1
+    # api_key omitted — Hermes uses "no-key-required" for keyless local servers
+  - name: work
+    base_url: https://gpu-server.internal.corp/v1
+    api_key: corp-api-key
+    api_mode: chat_completions   # optional, auto-detected from URL
+  - name: anthropic-proxy
+    base_url: https://proxy.example.com/anthropic
+    api_key: proxy-key
+    api_mode: anthropic_messages  # for Anthropic-compatible proxies
+```
+
+Switch between them mid-session with the triple syntax:
+
+```
+/model custom:local:qwen-2.5       # Use the "local" endpoint with qwen-2.5
+/model custom:work:llama3-70b      # Use the "work" endpoint with llama3-70b
+/model custom:anthropic-proxy:claude-sonnet-4  # Use the proxy
+```
+
+You can also select named custom providers from the interactive `hermes model` menu.
+
+---
+
+### Choosing the Right Setup
+
+| Use Case | Recommended |
+|----------|-------------|
+| **Just want it to work** | OpenRouter (default) or Nous Portal |
+| **Local models, easy setup** | Ollama |
+| **Production GPU serving** | vLLM or SGLang |
+| **Mac / no GPU** | Ollama or llama.cpp |
+| **Multi-provider routing** | LiteLLM Proxy or OpenRouter |
+| **Cost optimization** | ClawRouter or OpenRouter with `sort: "price"` |
+| **Maximum privacy** | Ollama, vLLM, or llama.cpp (fully local) |
+| **Enterprise / Azure** | Azure OpenAI with custom endpoint |
+| **Chinese AI models** | z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers) |
+
+:::tip
+You can switch between providers at any time with `hermes model` — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use.
+:::
+
+## Optional API Keys
+
+| Feature | Provider | Env Variable |
+|---------|----------|--------------|
+| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY`, `FIRECRAWL_API_URL` |
+| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
+| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
+| Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` |
+| OpenAI TTS + voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `VOICE_TOOLS_OPENAI_KEY` |
+| RL Training | [Tinker](https://tinker-console.thinkingmachines.ai/) + [WandB](https://wandb.ai/) | `TINKER_API_KEY`, `WANDB_API_KEY` |
+| Cross-session user modeling | [Honcho](https://honcho.dev/) | `HONCHO_API_KEY` |
+
+### Self-Hosting Firecrawl
+
+By default, Hermes uses the [Firecrawl cloud API](https://firecrawl.dev/) for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead. See Firecrawl's [SELF_HOST.md](https://github.com/firecrawl/firecrawl/blob/main/SELF_HOST.md) for complete setup instructions.
+
+**What you get:** No API key required, no rate limits, no per-page costs, full data sovereignty.
+
+**What you lose:** The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.
+
+**Setup:**
+
+1. Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):
+   ```bash
+   git clone https://github.com/firecrawl/firecrawl
+   cd firecrawl
+   # In .env, set: USE_DB_AUTHENTICATION=false, HOST=0.0.0.0, PORT=3002
+   docker compose up -d
+   ```
+
+2. Point Hermes at your instance (no API key needed):
+   ```bash
+   hermes config set FIRECRAWL_API_URL http://localhost:3002
+   ```
+
+You can also set both `FIRECRAWL_API_KEY` and `FIRECRAWL_API_URL` if your self-hosted instance has authentication enabled.
+
+## OpenRouter Provider Routing
+
+When using OpenRouter, you can control how requests are routed across providers. Add a `provider_routing` section to `~/.hermes/config.yaml`:
+
+```yaml
+provider_routing:
+  sort: "throughput"          # "price" (default), "throughput", or "latency"
+  # only: ["anthropic"]      # Only use these providers
+  # ignore: ["deepinfra"]    # Skip these providers
+  # order: ["anthropic", "google"]  # Try providers in this order
+  # require_parameters: true  # Only use providers that support all request params
+  # data_collection: "deny"   # Exclude providers that may store/train on data
+```
+
+**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.
+
+## Fallback Model
+
+Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
+
+```yaml
+fallback_model:
+  provider: openrouter                    # required
+  model: anthropic/claude-sonnet-4        # required
+  # base_url: http://localhost:8000/v1    # optional, for custom endpoints
+  # api_key_env: MY_CUSTOM_KEY           # optional, env var name for custom endpoint API key
+```
+
+When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
+
+Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
+
+:::tip
+Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
+:::
+
+## Smart Model Routing
+
+Optional cheap-vs-strong routing lets Hermes keep your main model for complex work while sending very short/simple turns to a cheaper model.
+
+```yaml
+smart_model_routing:
+  enabled: true
+  max_simple_chars: 160
+  max_simple_words: 28
+  cheap_model:
+    provider: openrouter
+    model: google/gemini-2.5-flash
+    # base_url: http://localhost:8000/v1  # optional custom endpoint
+    # api_key_env: MY_CUSTOM_KEY          # optional env var name for that endpoint's API key
+```
+
+How it works:
+- If a turn is short, single-line, and does not look code/tool/debug heavy, Hermes may route it to `cheap_model`
+- If the turn looks complex, Hermes stays on your primary model/provider
+- If the cheap route cannot be resolved cleanly, Hermes falls back to the primary model automatically
+
+This is intentionally conservative. It is meant for quick, low-stakes turns like:
+- short factual questions
+- quick rewrites
+- lightweight summaries
+
+It will avoid routing prompts that look like:
+- coding/debugging work
+- tool-heavy requests
+- long or multi-line analysis asks
+
+Use this when you want lower latency or cost without fully changing your default model.
+
+---
+
+## See Also
+
+- [Configuration](/docs/user-guide/configuration) — General configuration (directory structure, config precedence, terminal backends, memory, compression, and more)
+- [Environment Variables](/docs/reference/environment-variables) — Complete reference of all environment variables
--- a/website/docs/reference/cli-commands.md
+++ b/website/docs/reference/cli-commands.md
@ -21,6 +21,7 @@ hermes [global-options] <command> [subcommand/options]
 | Option | Description |
 |--------|-------------|
 | `--version`, `-V` | Show version and exit. |
+| `--profile <name>`, `-p <name>` | Select which Hermes profile to use for this invocation. Overrides the sticky default set by `hermes profile use`. |
 | `--resume <session>`, `-r <session>` | Resume a previous session by ID or title. |
 | `--continue [name]`, `-c [name]` | Resume the most recent session, or the most recent session matching a title. |
 | `--worktree`, `-w` | Start in an isolated git worktree for parallel-agent workflows. |
@ -37,6 +38,7 @@ hermes [global-options] <command> [subcommand/options]
 | `hermes setup` | Interactive setup wizard for all or part of the configuration. |
 | `hermes whatsapp` | Configure and pair the WhatsApp bridge. |
 | `hermes login` / `logout` | Authenticate with OAuth-backed providers. |
+| `hermes auth` | Manage credential pools — add, list, remove, reset, set strategy. |
 | `hermes status` | Show agent, auth, and platform status. |
 | `hermes cron` | Inspect and tick the cron scheduler. |
 | `hermes webhook` | Manage dynamic webhook subscriptions for event-driven activation. |
@ -46,10 +48,14 @@ hermes [global-options] <command> [subcommand/options]
 | `hermes skills` | Browse, install, publish, audit, and configure skills. |
 | `hermes honcho` | Manage Honcho cross-session memory integration. |
 | `hermes acp` | Run Hermes as an ACP server for editor integration. |
+| `hermes mcp` | Manage MCP server configurations and run Hermes as an MCP server. |
+| `hermes plugins` | Manage Hermes Agent plugins (install, enable, disable, remove). |
 | `hermes tools` | Configure enabled tools per platform. |
 | `hermes sessions` | Browse, export, prune, rename, and delete sessions. |
 | `hermes insights` | Show token/cost/activity analytics. |
 | `hermes claw` | OpenClaw migration helpers. |
+| `hermes profile` | Manage profiles — multiple isolated Hermes instances. |
+| `hermes completion` | Print shell completion scripts (bash/zsh). |
 | `hermes version` | Show version information. |
 | `hermes update` | Pull latest code and reinstall dependencies. |
 | `hermes uninstall` | Remove Hermes from the system. |
@ -67,7 +73,7 @@ Common options:
 | `-q`, `--query "..."` | One-shot, non-interactive prompt. |
 | `-m`, `--model <model>` | Override the model for this run. |
 | `-t`, `--toolsets <csv>` | Enable a comma-separated set of toolsets. |
-| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `alibaba`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`. |
+| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`. |
 | `-s`, `--skills <name>` | Preload one or more skills for the session (can be repeated or comma-separated). |
 | `-v`, `--verbose` | Verbose output. |
 | `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. |
@ -76,6 +82,7 @@ Common options:
 | `--checkpoints` | Enable filesystem checkpoints before destructive file changes. |
 | `--yolo` | Skip approval prompts. |
 | `--pass-session-id` | Pass the session ID into the system prompt. |
+| `--source <tag>` | Session source tag for filtering (default: `cli`). Use `tool` for third-party integrations that should not appear in user session lists. |

 Examples:

@ -186,6 +193,22 @@ Useful options for `login`:
 - `--ca-bundle <pem>`
 - `--insecure`

+## `hermes auth`
+
+Manage credential pools for same-provider key rotation. See [Credential Pools](/docs/user-guide/features/credential-pools) for full documentation.
+
+```bash
+hermes auth                                              # Interactive wizard
+hermes auth list                                         # Show all pools
+hermes auth list openrouter                              # Show specific provider
+hermes auth add openrouter --api-key sk-or-v1-xxx        # Add API key
+hermes auth add anthropic --type oauth                   # Add OAuth credential
+hermes auth remove openrouter 2                          # Remove by index
+hermes auth reset openrouter                             # Clear cooldowns
+```
+
+Subcommands: `add`, `list`, `remove`, `reset`. When called with no subcommand, launches the interactive management wizard.
+
 ## `hermes status`

 ```bash
@ -507,6 +530,56 @@ hermes claw migrate --preset user-data --overwrite
 hermes claw migrate --source /home/user/old-openclaw
 ```

+## `hermes profile`
+
+```bash
+hermes profile <subcommand>
+```
+
+Manage profiles — multiple isolated Hermes instances, each with its own config, sessions, skills, and home directory.
+
+| Subcommand | Description |
+|------------|-------------|
+| `list` | List all profiles. |
+| `use <name>` | Set a sticky default profile. |
+| `create <name> [--clone] [--no-alias]` | Create a new profile. `--clone` copies config, `.env`, and `SOUL.md` from the active profile. |
+| `delete <name> [-y]` | Delete a profile. |
+| `show <name>` | Show profile details (home directory, config, etc.). |
+| `alias <name> [--remove] [--name NAME]` | Manage wrapper scripts for quick profile access. |
+| `rename <old> <new>` | Rename a profile. |
+| `export <name> [-o FILE]` | Export a profile to a `.tar.gz` archive. |
+| `import <archive> [--name NAME]` | Import a profile from a `.tar.gz` archive. |
+
+Examples:
+
+```bash
+hermes profile list
+hermes profile create work --clone
+hermes profile use work
+hermes profile alias work --name h-work
+hermes profile export work -o work-backup.tar.gz
+hermes profile import work-backup.tar.gz --name restored
+hermes -p work chat -q "Hello from work profile"
+```
+
+## `hermes completion`
+
+```bash
+hermes completion [bash|zsh]
+```
+
+Print a shell completion script to stdout. Source the output in your shell profile for tab-completion of Hermes commands, subcommands, and profile names.
+
+Examples:
+
+```bash
+# Bash
+hermes completion bash >> ~/.bashrc
+
+# Zsh
+hermes completion zsh >> ~/.zshrc
+```
+
 ## Maintenance commands

 | Command | Description |
--- a/website/docs/reference/environment-variables.md
+++ b/website/docs/reference/environment-variables.md
@ -63,7 +63,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe

 | Variable | Description |
 |----------|-------------|
-| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) |
+| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba`, `deepseek`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) |
 | `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
 | `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
 | `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
@ -80,10 +80,12 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
 | `FIRECRAWL_API_KEY` | Web scraping ([firecrawl.dev](https://firecrawl.dev/)) |
 | `FIRECRAWL_API_URL` | Custom Firecrawl API endpoint for self-hosted instances (optional) |
 | `TAVILY_API_KEY` | Tavily API key for AI-native web search, extract, and crawl ([app.tavily.com](https://app.tavily.com/home)) |
+| `EXA_API_KEY` | Exa API key for AI-native web search and contents ([exa.ai](https://exa.ai/)) |
 | `BROWSERBASE_API_KEY` | Browser automation ([browserbase.com](https://browserbase.com/)) |
 | `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
 | `BROWSER_USE_API_KEY` | Browser Use cloud browser API key ([browser-use.com](https://browser-use.com/)) |
 | `BROWSER_CDP_URL` | Chrome DevTools Protocol URL for local browser (set via `/browser connect`, e.g. `ws://localhost:9222`) |
+| `CAMOFOX_URL` | Camofox local anti-detection browser URL (default: `http://localhost:9377`) |
 | `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
 | `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
 | `GROQ_API_KEY` | Groq Whisper STT API key ([groq.com](https://groq.com/)) |
@ -154,6 +156,9 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI
 | `TELEGRAM_ALLOWED_USERS` | Comma-separated user IDs allowed to use the bot |
 | `TELEGRAM_HOME_CHANNEL` | Default Telegram chat/channel for cron delivery |
 | `TELEGRAM_HOME_CHANNEL_NAME` | Display name for the Telegram home channel |
+| `TELEGRAM_WEBHOOK_URL` | Public HTTPS URL for webhook mode (enables webhook instead of polling) |
+| `TELEGRAM_WEBHOOK_PORT` | Local listen port for webhook server (default: `8443`) |
+| `TELEGRAM_WEBHOOK_SECRET` | Secret token for verifying updates come from Telegram |
 | `DISCORD_BOT_TOKEN` | Discord bot token |
 | `DISCORD_ALLOWED_USERS` | Comma-separated Discord user IDs allowed to use the bot |
 | `DISCORD_HOME_CHANNEL` | Default Discord channel for cron delivery |
@ -168,7 +173,9 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI
 | `SLACK_HOME_CHANNEL_NAME` | Display name for the Slack home channel |
 | `WHATSAPP_ENABLED` | Enable the WhatsApp bridge (`true`/`false`) |
 | `WHATSAPP_MODE` | `bot` (separate number) or `self-chat` (message yourself) |
-| `WHATSAPP_ALLOWED_USERS` | Comma-separated phone numbers (with country code, no `+`) |
+| `WHATSAPP_ALLOWED_USERS` | Comma-separated phone numbers (with country code, no `+`), or `*` to allow all senders |
+| `WHATSAPP_ALLOW_ALL_USERS` | Allow all WhatsApp senders without an allowlist (`true`/`false`) |
+| `WHATSAPP_DEBUG` | Log raw message events in the bridge for troubleshooting (`true`/`false`) |
 | `SIGNAL_HTTP_URL` | signal-cli daemon HTTP endpoint (for example `http://127.0.0.1:8080`) |
 | `SIGNAL_ACCOUNT` | Bot phone number in E.164 format |
 | `SIGNAL_ALLOWED_USERS` | Comma-separated E.164 phone numbers or UUIDs |
--- a/website/docs/reference/faq.md
+++ b/website/docs/reference/faq.md
@ -254,7 +254,7 @@ custom_providers:
        context_length: 32768
 ```

-See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for how auto-detection works and all override options.
+See [Context Length Detection](../integrations/providers.md#context-length-detection) for how auto-detection works and all override options.

 ---

--- a/website/docs/reference/mcp-config-reference.md
+++ b/website/docs/reference/mcp-config-reference.md
@ -48,6 +48,8 @@ mcp_servers:
 | `timeout` | number | both | Tool call timeout |
 | `connect_timeout` | number | both | Initial connection timeout |
 | `tools` | mapping | both | Filtering and utility-tool policy |
+| `auth` | string | HTTP | Authentication method. Set to `oauth` to enable OAuth 2.1 with PKCE |
+| `sampling` | mapping | both | Server-initiated LLM request policy (see MCP guide) |

 ## `tools` policy keys

@ -213,3 +215,33 @@ Utility tools follow the same prefixing pattern:
 - `mcp_<server>_read_resource`
 - `mcp_<server>_list_prompts`
 - `mcp_<server>_get_prompt`
+
+### Name sanitization
+
+Hyphens (`-`) and dots (`.`) in both server names and tool names are replaced with underscores before registration. This ensures tool names are valid identifiers for LLM function-calling APIs.
+
+For example, a server named `my-api` exposing a tool called `list-items.v2` becomes:
+
+```text
+mcp_my_api_list_items_v2
+```
+
+Keep this in mind when writing `include` / `exclude` filters — use the **original** MCP tool name (with hyphens/dots), not the sanitized version.
+
+## OAuth 2.1 authentication
+
+For HTTP servers that require OAuth, set `auth: oauth` on the server entry:
+
+```yaml
+mcp_servers:
+  protected_api:
+    url: "https://mcp.example.com/mcp"
+    auth: oauth
+```
+
+Behavior:
+- Hermes uses the MCP SDK's OAuth 2.1 PKCE flow (metadata discovery, dynamic client registration, token exchange, and refresh)
+- On first connect, a browser window opens for authorization
+- Tokens are persisted to `~/.hermes/mcp-tokens/<server>.json` and reused across sessions
+- Token refresh is automatic; re-authorization only happens when refresh fails
+- Only applies to HTTP/StreamableHTTP transport (`url`-based servers)
--- a/website/docs/reference/profile-commands.md
+++ b/website/docs/reference/profile-commands.md
@ -78,7 +78,7 @@ Creates a new profile.
 | `<name>` | Name for the new profile. Must be a valid directory name (alphanumeric, hyphens, underscores). |
 | `--clone` | Copy `config.yaml`, `.env`, and `SOUL.md` from the current profile. |
 | `--clone-all` | Copy everything (config, memories, skills, sessions, state) from the current profile. |
-| `--from <profile>` | Clone from a specific profile instead of the current one. Used with `--clone` or `--clone-all`. |
+| `--clone-from <profile>` | Clone from a specific profile instead of the current one. Used with `--clone` or `--clone-all`. |

 **Examples:**

@ -93,7 +93,7 @@ hermes profile create work --clone
 hermes profile create backup --clone-all

 # Clone config from a specific profile
-hermes profile create work2 --clone --from work
+hermes profile create work2 --clone --clone-from work
 ```

 ## `hermes profile delete`
@ -123,14 +123,14 @@ This permanently deletes the profile's entire directory including all config, me
 ## `hermes profile show`

 ```bash
-hermes profile show [name]
+hermes profile show <name>
 ```

 Displays details about a profile including its home directory, configured model, active platforms, and disk usage.

 | Argument | Description |
 |----------|-------------|
-| `[name]` | Profile to inspect. Defaults to the current active profile if omitted. |
+| `<name>` | Profile to inspect. |

 **Example:**

@ -147,20 +147,28 @@ Disk:       48 MB
 ## `hermes profile alias`

 ```bash
-hermes profile alias <name>
+hermes profile alias <name> [options]
 ```

-Regenerates the shell alias script at `~/.local/bin/hermes-<name>`. Useful if the alias was accidentally deleted or if you need to update it after moving your Hermes installation.
+Regenerates the shell alias script at `~/.local/bin/<name>`. Useful if the alias was accidentally deleted or if you need to update it after moving your Hermes installation.

-| Argument | Description |
-|----------|-------------|
+| Argument / Option | Description |
+|-------------------|-------------|
 | `<name>` | Profile to create/update the alias for. |
+| `--remove` | Remove the wrapper script instead of creating it. |
+| `--name <alias>` | Custom alias name (default: profile name). |

 **Example:**

 ```bash
 hermes profile alias work
 # Creates/updates ~/.local/bin/work
+
+hermes profile alias work --name mywork
+# Creates ~/.local/bin/mywork
+
+hermes profile alias work --remove
+# Removes the wrapper script
 ```

 ## `hermes profile rename`
@ -187,39 +195,45 @@ hermes profile rename mybot assistant
 ## `hermes profile export`

 ```bash
-hermes profile export <name> <output-path>
+hermes profile export <name> [options]
 ```

 Exports a profile as a compressed tar.gz archive.

-| Argument | Description |
-|----------|-------------|
+| Argument / Option | Description |
+|-------------------|-------------|
 | `<name>` | Profile to export. |
-| `<output-path>` | Path for the output archive (e.g., `./work-backup.tar.gz`). |
+| `-o`, `--output <path>` | Output file path (default: `<name>.tar.gz`). |

 **Example:**

 ```bash
-hermes profile export work ./work-2026-03-29.tar.gz
+hermes profile export work
+# Creates work.tar.gz in the current directory
+
+hermes profile export work -o ./work-2026-03-29.tar.gz
 ```

 ## `hermes profile import`

 ```bash
-hermes profile import <archive-path> [name]
+hermes profile import <archive> [options]
 ```

 Imports a profile from a tar.gz archive.

-| Argument | Description |
-|----------|-------------|
-| `<archive-path>` | Path to the tar.gz archive to import. |
-| `[name]` | Name for the imported profile. Defaults to the original profile name from the archive. |
+| Argument / Option | Description |
+|-------------------|-------------|
+| `<archive>` | Path to the tar.gz archive to import. |
+| `--name <name>` | Name for the imported profile (default: inferred from archive). |

 **Example:**

 ```bash
-hermes profile import ./work-2026-03-29.tar.gz work-restored
+hermes profile import ./work-2026-03-29.tar.gz
+# Infers profile name from the archive
+
+hermes profile import ./work-2026-03-29.tar.gz --name work-restored
 ```

 ## `hermes -p` / `hermes --profile`
@ -254,7 +268,7 @@ Generates shell completion scripts. Includes completions for profile names and p

 | Argument | Description |
 |----------|-------------|
-| `<shell>` | Shell to generate completions for: `bash`, `zsh`, or `fish`. |
+| `<shell>` | Shell to generate completions for: `bash` or `zsh`. |

 **Examples:**

@ -262,7 +276,6 @@ Generates shell completion scripts. Includes completions for profile names and p
 # Install completions
 hermes completion bash >> ~/.bashrc
 hermes completion zsh >> ~/.zshrc
-hermes completion fish > ~/.config/fish/completions/hermes.fish

 # Reload shell
 source ~/.bashrc
--- a/website/docs/reference/slash-commands.md
+++ b/website/docs/reference/slash-commands.md
@ -31,10 +31,10 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | `/compress` | Manually compress conversation context (flush memories + summarize) |
 | `/rollback` | List or restore filesystem checkpoints (usage: /rollback [number]) |
 | `/stop` | Kill all running background processes |
-| `/queue <prompt>` (alias: `/q`) | Queue a prompt for the next turn (doesn't interrupt the current agent response) |
+| `/queue <prompt>` (alias: `/q`) | Queue a prompt for the next turn (doesn't interrupt the current agent response). **Note:** `/q` is claimed by both `/queue` and `/quit`; the last registration wins, so `/q` resolves to `/quit` in practice. Use `/queue` explicitly. |
 | `/resume [name]` | Resume a previously-named session |
 | `/statusbar` (alias: `/sb`) | Toggle the context/model status bar on or off |
-| `/background <prompt>` | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). |
+| `/background <prompt>` (alias: `/bg`) | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). |
 | `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |

 ### Configuration
@ -50,6 +50,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
 | `/skin` | Show or change the display skin/theme |
 | `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). |
+| `/yolo` | Toggle YOLO mode — skip all dangerous command approval prompts. |

 ### Tools & Skills

@ -60,7 +61,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | `/browser [connect\|disconnect\|status]` | Manage local Chrome CDP connection. `connect` attaches browser tools to a running Chrome instance (default: `ws://localhost:9222`). `disconnect` detaches. `status` shows current connection. Auto-launches Chrome if no debugger is detected. |
 | `/skills` | Search, install, inspect, or manage skills from online registries |
 | `/cron` | Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove) |
-| `/reload-mcp` | Reload MCP servers from config.yaml |
+| `/reload-mcp` (alias: `/reload_mcp`) | Reload MCP servers from config.yaml |
 | `/plugins` | List installed plugins and their status |

 ### Info
@ -70,14 +71,15 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | `/help` | Show this help message |
 | `/usage` | Show token usage, cost breakdown, and session duration |
 | `/insights` | Show usage insights and analytics (last 30 days) |
-| `/platforms` | Show gateway/messaging platform status |
+| `/platforms` (alias: `/gateway`) | Show gateway/messaging platform status |
 | `/paste` | Check clipboard for an image and attach it |
+| `/profile` | Show active profile name and home directory |

 ### Exit

 | Command | Description |
 |---------|-------------|
-| `/quit` | Exit the CLI (also: /exit, /q) |
+| `/quit` | Exit the CLI (also: `/exit`). See note on `/q` under `/queue` above. |

 ### Dynamic CLI slash commands

@ -105,7 +107,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
 | `/personality [name]` | Set a personality overlay for the session. |
 | `/retry` | Retry the last message. |
 | `/undo` | Remove the last exchange. |
-| `/sethome` | Mark the current chat as the platform home channel for deliveries. |
+| `/sethome` (alias: `/set-home`) | Mark the current chat as the platform home channel for deliveries. |
 | `/compress` | Manually compress conversation context. |
 | `/title [name]` | Set or show the session title. |
 | `/resume [name]` | Resume a previously named session. |
@ -116,7 +118,9 @@ The messaging gateway supports the following built-in commands inside Telegram,
 | `/rollback [number]` | List or restore filesystem checkpoints. |
 | `/background <prompt>` | Run a prompt in a separate background session. Results are delivered back to the same chat when the task finishes. See [Messaging Background Sessions](/docs/user-guide/messaging/#background-sessions). |
 | `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
-| `/reload-mcp` | Reload MCP servers from config. |
+| `/reload-mcp` (alias: `/reload_mcp`) | Reload MCP servers from config. |
+| `/yolo` | Toggle YOLO mode — skip all dangerous command approval prompts. |
+| `/commands [page]` | Browse all commands and skills (paginated). |
 | `/approve [session\|always]` | Approve and execute a pending dangerous command. `session` approves for this session only; `always` adds to permanent allowlist. |
 | `/deny` | Reject a pending dangerous command. |
 | `/update` | Update Hermes Agent to the latest version. |
@ -127,6 +131,6 @@ The messaging gateway supports the following built-in commands inside Telegram,

 - `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, `/statusbar`, and `/plugins` are **CLI-only** commands.
 - `/verbose` is **CLI-only by default**, but can be enabled for messaging platforms by setting `display.tool_progress_command: true` in `config.yaml`. When enabled, it cycles the `display.tool_progress` mode and saves to config.
- `/status`, `/sethome`, `/update`, `/approve`, and `/deny` are **messaging-only** commands.
- `/background`, `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
+- `/status`, `/sethome`, `/update`, `/approve`, `/deny`, and `/commands` are **messaging-only** commands.
+- `/background`, `/voice`, `/reload-mcp`, `/rollback`, and `/yolo` work in **both** the CLI and the messaging gateway.
 - `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
--- a/website/docs/reference/tools-reference.md
+++ b/website/docs/reference/tools-reference.md
@ -151,8 +151,8 @@ This page documents the built-in Hermes tool registry as it exists in code. Avai

 | Tool | Description | Requires environment |
 |------|-------------|----------------------|
-| `web_search` | Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions. | PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
-| `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
+| `web_search` | Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
+| `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |

 ## `tts` toolset

--- a/website/docs/reference/toolsets-reference.md
+++ b/website/docs/reference/toolsets-reference.md
@ -19,7 +19,7 @@ Toolsets are named bundles of tools that you can enable with `hermes chat --tool
 | `file` | core | `patch`, `read_file`, `search_files`, `write_file` |
 | `hermes-acp` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `delegate_task`, `execute_code`, `memory`, `patch`, `process`, `read_file`, `search_files`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` |
 | `hermes-cli` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `clarify`, `cronjob`, `delegate_task`, `execute_code`, `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services`, `honcho_conclude`, `honcho_context`, `honcho_profile`, `honcho_search`, `image_generate`, `memory`, `mixture_of_agents`, `patch`, `process`, `read_file`, `search_files`, `send_message`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `text_to_speech`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` |
-| `hermes-api-server` | platform | _(same as hermes-cli)_ |
+| `hermes-api-server` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `cronjob`, `delegate_task`, `execute_code`, `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services`, `honcho_conclude`, `honcho_context`, `honcho_profile`, `honcho_search`, `image_generate`, `memory`, `mixture_of_agents`, `patch`, `process`, `read_file`, `search_files`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` |
 | `hermes-dingtalk` | platform | _(same as hermes-cli)_ |
 | `hermes-feishu` | platform | _(same as hermes-cli)_ |
 | `hermes-wecom` | platform | _(same as hermes-cli)_ |
--- a/website/docs/user-guide/checkpoints-and-rollback.md
+++ b/website/docs/user-guide/checkpoints-and-rollback.md
@ -1,5 +1,6 @@
 ---
 sidebar_position: 8
+sidebar_label: "Checkpoints & Rollback"
 title: "Checkpoints and /rollback"
 description: "Filesystem safety nets for destructive operations using shadow git repos and automatic snapshots"
 ---
--- a/website/docs/user-guide/cli.md
+++ b/website/docs/user-guide/cli.md
@ -94,6 +94,7 @@ When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Pre
 | `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
 | `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
 | `Ctrl+D` | Exit |
+| `Ctrl+Z` | Suspend Hermes to background (Unix only). Run `fg` in the shell to resume. |
 | `Tab` | Accept auto-suggestion (ghost text) or autocomplete slash commands |

 ## Slash Commands
@ -212,6 +213,33 @@ You can interrupt the agent at any point:
 - In-progress terminal commands are killed immediately (SIGTERM, then SIGKILL after 1s)
 - Multiple messages typed during interrupt are combined into one prompt

+### Busy Input Mode
+
+The `display.busy_input_mode` config key controls what happens when you press Enter while the agent is working:
+
+| Mode | Behavior |
+|------|----------|
+| `"interrupt"` (default) | Your message interrupts the current operation and is processed immediately |
+| `"queue"` | Your message is silently queued and sent as the next turn after the agent finishes |
+
+```yaml
+# ~/.hermes/config.yaml
+display:
+  busy_input_mode: "queue"   # or "interrupt" (default)
+```
+
+Queue mode is useful when you want to prepare follow-up messages without accidentally canceling in-flight work. Unknown values fall back to `"interrupt"`.
+
+### Suspending to Background
+
+On Unix systems, press **`Ctrl+Z`** to suspend Hermes to the background — just like any terminal process. The shell prints a confirmation:
+
+```
+Hermes Agent has been suspended. Run `fg` to bring Hermes Agent back.
+```
+
+Type `fg` in your shell to resume the session exactly where you left off. This is not supported on Windows.
+
 ## Tool Progress Display

 The CLI shows animated feedback as the agent works:
@ -232,6 +260,18 @@ The CLI shows animated feedback as the agent works:

 Cycle through display modes with `/verbose`: `off → new → all → verbose`. This command can also be enabled for messaging platforms — see [configuration](/docs/user-guide/configuration#display-settings).

+### Tool Preview Length
+
+The `display.tool_preview_length` config key controls the maximum number of characters shown in tool call preview lines (e.g. file paths, terminal commands). The default is `0`, which means no limit — full paths and commands are shown.
+
+```yaml
+# ~/.hermes/config.yaml
+display:
+  tool_preview_length: 80   # Truncate tool previews to 80 chars (0 = no limit)
+```
+
+This is useful on narrow terminals or when tool arguments contain very long file paths.
+
 ## Session Management

 ### Resuming Sessions
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -71,631 +71,7 @@ delegation:

 Multiple references in a single value work: `url: "${HOST}:${PORT}"`. If a referenced variable is not set, the placeholder is kept verbatim (`${UNDEFINED_VAR}` stays as-is). Only the `${VAR}` syntax is supported — bare `$VAR` is not expanded.

-## Inference Providers
-
-You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
-
-| Provider | Setup |
-|----------|-------|
-| **Nous Portal** | `hermes model` (OAuth, subscription-based) |
-| **OpenAI Codex** | `hermes model` (ChatGPT OAuth, uses Codex models) |
-| **GitHub Copilot** | `hermes model` (OAuth device code flow, `COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, or `gh auth token`) |
-| **GitHub Copilot ACP** | `hermes model` (spawns local `copilot --acp --stdio`) |
-| **Anthropic** | `hermes model` (Claude Pro/Max via Claude Code auth, Anthropic API key, or manual setup-token) |
-| **OpenRouter** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
-| **AI Gateway** | `AI_GATEWAY_API_KEY` in `~/.hermes/.env` (provider: `ai-gateway`) |
-| **z.ai / GLM** | `GLM_API_KEY` in `~/.hermes/.env` (provider: `zai`) |
-| **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) |
-| **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) |
-| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
-| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
-| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
-| **OpenCode Zen** | `OPENCODE_ZEN_API_KEY` in `~/.hermes/.env` (provider: `opencode-zen`) |
-| **OpenCode Go** | `OPENCODE_GO_API_KEY` in `~/.hermes/.env` (provider: `opencode-go`) |
-| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) |
-| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
-
-:::tip Model key alias
-In the `model:` config section, you can use either `default:` or `model:` as the key name for your model ID. Both `model: { default: my-model }` and `model: { model: my-model }` work identically.
-:::
-
-:::info Codex Note
-The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under `~/.hermes/auth.json` and can import existing Codex CLI credentials from `~/.codex/auth.json` when present. No Codex CLI installation is required.
-:::
-
-:::warning
-Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An `OPENROUTER_API_KEY` enables these tools automatically. You can also configure which model and provider these tools use — see [Auxiliary Models](#auxiliary-models) below.
-:::
-
-### Anthropic (Native)
-
-Use Claude models directly through the Anthropic API — no OpenRouter proxy needed. Supports three auth methods:
-
-```bash
-# With an API key (pay-per-token)
-export ANTHROPIC_API_KEY=***
-hermes chat --provider anthropic --model claude-sonnet-4-6
-
-# Preferred: authenticate through `hermes model`
-# Hermes will use Claude Code's credential store directly when available
-hermes model
-
-# Manual override with a setup-token (fallback / legacy)
-export ANTHROPIC_TOKEN=***  # setup-token or manual OAuth token
-hermes chat --provider anthropic
-
-# Auto-detect Claude Code credentials (if you already use Claude Code)
-hermes chat --provider anthropic  # reads Claude Code credential files automatically
-```
-
-When you choose Anthropic OAuth through `hermes model`, Hermes prefers Claude Code's own credential store over copying the token into `~/.hermes/.env`. That keeps refreshable Claude credentials refreshable.
-
-Or set it permanently:
-```yaml
-model:
-  provider: "anthropic"
-  default: "claude-sonnet-4-6"
-```
-
-:::tip Aliases
-`--provider claude` and `--provider claude-code` also work as shorthand for `--provider anthropic`.
-:::
-
-### GitHub Copilot
-
-Hermes supports GitHub Copilot as a first-class provider with two modes:
-
-**`copilot` — Direct Copilot API** (recommended). Uses your GitHub Copilot subscription to access GPT-5.x, Claude, Gemini, and other models through the Copilot API.
-
-```bash
-hermes chat --provider copilot --model gpt-5.4
-```
-
-**Authentication options** (checked in this order):
-
-1. `COPILOT_GITHUB_TOKEN` environment variable
-2. `GH_TOKEN` environment variable
-3. `GITHUB_TOKEN` environment variable
-4. `gh auth token` CLI fallback
-
-If no token is found, `hermes model` offers an **OAuth device code login** — the same flow used by the Copilot CLI and opencode.
-
-:::warning Token types
-The Copilot API does **not** support classic Personal Access Tokens (`ghp_*`). Supported token types:
-
-| Type | Prefix | How to get |
-|------|--------|------------|
-| OAuth token | `gho_` | `hermes model` → GitHub Copilot → Login with GitHub |
-| Fine-grained PAT | `github_pat_` | GitHub Settings → Developer settings → Fine-grained tokens (needs **Copilot Requests** permission) |
-| GitHub App token | `ghu_` | Via GitHub App installation |
-
-If your `gh auth token` returns a `ghp_*` token, use `hermes model` to authenticate via OAuth instead.
-:::
-
-**API routing**: GPT-5+ models (except `gpt-5-mini`) automatically use the Responses API. All other models (GPT-4o, Claude, Gemini, etc.) use Chat Completions. Models are auto-detected from the live Copilot catalog.
-
-**`copilot-acp` — Copilot ACP agent backend**. Spawns the local Copilot CLI as a subprocess:
-
-```bash
-hermes chat --provider copilot-acp --model copilot-acp
-# Requires the GitHub Copilot CLI in PATH and an existing `copilot login` session
-```
-
-**Permanent config:**
-```yaml
-model:
-  provider: "copilot"
-  default: "gpt-5.4"
-```
-
-| Environment variable | Description |
-|---------------------|-------------|
-| `COPILOT_GITHUB_TOKEN` | GitHub token for Copilot API (first priority) |
-| `HERMES_COPILOT_ACP_COMMAND` | Override the Copilot CLI binary path (default: `copilot`) |
-| `HERMES_COPILOT_ACP_ARGS` | Override ACP args (default: `--acp --stdio`) |
-
-### First-Class Chinese AI Providers
-
-These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:
-
-```bash
-# z.ai / ZhipuAI GLM
-hermes chat --provider zai --model glm-4-plus
-# Requires: GLM_API_KEY in ~/.hermes/.env
-
-# Kimi / Moonshot AI
-hermes chat --provider kimi-coding --model moonshot-v1-auto
-# Requires: KIMI_API_KEY in ~/.hermes/.env
-
-# MiniMax (global endpoint)
-hermes chat --provider minimax --model MiniMax-M2.7
-# Requires: MINIMAX_API_KEY in ~/.hermes/.env
-
-# MiniMax (China endpoint)
-hermes chat --provider minimax-cn --model MiniMax-M2.7
-# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
-
-# Alibaba Cloud / DashScope (Qwen models)
-hermes chat --provider alibaba --model qwen3.5-plus
-# Requires: DASHSCOPE_API_KEY in ~/.hermes/.env
-```
-
-Or set the provider permanently in `config.yaml`:
-```yaml
-model:
-  provider: "zai"       # or: kimi-coding, minimax, minimax-cn, alibaba
-  default: "glm-4-plus"
-```
-
-Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
-
-### Hugging Face Inference Providers
-
-[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
-
-```bash
-# Use any available model
-hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507
-# Requires: HF_TOKEN in ~/.hermes/.env
-
-# Short alias
-hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2
-```
-
-Or set it permanently in `config.yaml`:
-```yaml
-model:
-  provider: "huggingface"
-  default: "Qwen/Qwen3-235B-A22B-Thinking-2507"
-```
-
-Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates).
-
-You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend.
-
-The base URL can be overridden with `HF_BASE_URL`.
-
-## Custom & Self-Hosted LLM Providers
-
-Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
-
-### General Setup
-
-Three ways to configure a custom endpoint:
-
-**Interactive setup (recommended):**
-```bash
-hermes model
-# Select "Custom endpoint (self-hosted / VLLM / etc.)"
-# Enter: API base URL, API key, Model name
-```
-
-**Manual config (`config.yaml`):**
-```yaml
-# In ~/.hermes/config.yaml
-model:
-  default: your-model-name
-  provider: custom
-  base_url: http://localhost:8000/v1
-  api_key: your-key-or-leave-empty-for-local
-```
-
-**Environment variables (`.env` file):**
-```bash
-# Add to ~/.hermes/.env
-OPENAI_BASE_URL=http://localhost:8000/v1
-OPENAI_API_KEY=your-key     # Any non-empty string for local servers
-LLM_MODEL=your-model-name
-```
-
-All three approaches end up in the same runtime path. `hermes model` persists provider, model, and base URL to `config.yaml` so later sessions keep using that endpoint even if env vars are not set.
-
-### Switching Models with `/model`
-
-Once a custom endpoint is configured, you can switch models mid-session:
-
-```
-/model custom:qwen-2.5          # Switch to a model on your custom endpoint
-/model custom                    # Auto-detect the model from the endpoint
-/model openrouter:claude-sonnet-4 # Switch back to a cloud provider
-```
-
-If you have **named custom providers** configured (see below), use the triple syntax:
-
-```
-/model custom:local:qwen-2.5    # Use the "local" custom provider with model qwen-2.5
-/model custom:work:llama3       # Use the "work" custom provider with llama3
-```
-
-When switching providers, Hermes persists the base URL and provider to config so the change survives restarts. When switching away from a custom endpoint to a built-in provider, the stale base URL is automatically cleared.
-
-:::tip
-`/model custom` (bare, no model name) queries your endpoint's `/models` API and auto-selects the model if exactly one is loaded. Useful for local servers running a single model.
-:::
-
-Everything below follows this same pattern — just change the URL, key, and model name.
-
---
-
-### Ollama — Local Models, Zero Config
-
-[Ollama](https://ollama.com/) runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use.
-
-```bash
-# Install and run a model
-ollama pull llama3.1:70b
-ollama serve   # Starts on port 11434
-
-# Configure Hermes
-OPENAI_BASE_URL=http://localhost:11434/v1
-OPENAI_API_KEY=ollama           # Any non-empty string
-LLM_MODEL=llama3.1:70b
-```
-
-Ollama's OpenAI-compatible endpoint supports chat completions, streaming, and tool calling (for supported models). No GPU required for smaller models — Ollama handles CPU inference automatically.
-
-:::tip
-List available models with `ollama list`. Pull any model from the [Ollama library](https://ollama.com/library) with `ollama pull <model>`.
-:::
-
---
-
-### vLLM — High-Performance GPU Inference
-
-[vLLM](https://docs.vllm.ai/) is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.
-
-```bash
-# Start vLLM server
-pip install vllm
-vllm serve meta-llama/Llama-3.1-70B-Instruct \
-  --port 8000 \
-  --tensor-parallel-size 2    # Multi-GPU
-
-# Configure Hermes
-OPENAI_BASE_URL=http://localhost:8000/v1
-OPENAI_API_KEY=dummy
-LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
-```
-
-vLLM supports tool calling, structured output, and multi-modal models. Use `--enable-auto-tool-choice` and `--tool-call-parser hermes` for Hermes-format tool calling with NousResearch models.
-
---
-
-### SGLang — Fast Serving with RadixAttention
-
-[SGLang](https://github.com/sgl-project/sglang) is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.
-
-```bash
-# Start SGLang server
-pip install "sglang[all]"
-python -m sglang.launch_server \
-  --model meta-llama/Llama-3.1-70B-Instruct \
-  --port 8000 \
-  --tp 2
-
-# Configure Hermes
-OPENAI_BASE_URL=http://localhost:8000/v1
-OPENAI_API_KEY=dummy
-LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
-```
-
---
-
-### llama.cpp / llama-server — CPU & Metal Inference
-
-[llama.cpp](https://github.com/ggml-org/llama.cpp) runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.
-
-```bash
-# Build and start llama-server
-cmake -B build && cmake --build build --config Release
-./build/bin/llama-server \
-  -m models/llama-3.1-8b-instruct-Q4_K_M.gguf \
-  --port 8080 --host 0.0.0.0
-
-# Configure Hermes
-OPENAI_BASE_URL=http://localhost:8080/v1
-OPENAI_API_KEY=dummy
-LLM_MODEL=llama-3.1-8b-instruct
-```
-
-:::tip
-Download GGUF models from [Hugging Face](https://huggingface.co/models?library=gguf). Q4_K_M quantization offers the best balance of quality vs. memory usage.
-:::
-
---
-
-### LiteLLM Proxy — Multi-Provider Gateway
-
-[LiteLLM](https://docs.litellm.ai/) is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.
-
-```bash
-# Install and start
-pip install "litellm[proxy]"
-litellm --model anthropic/claude-sonnet-4 --port 4000
-
-# Or with a config file for multiple models:
-litellm --config litellm_config.yaml --port 4000
-
-# Configure Hermes
-OPENAI_BASE_URL=http://localhost:4000/v1
-OPENAI_API_KEY=sk-your-litellm-key
-LLM_MODEL=anthropic/claude-sonnet-4
-```
-
-Example `litellm_config.yaml` with fallback:
-```yaml
-model_list:
-  - model_name: "best"
-    litellm_params:
-      model: anthropic/claude-sonnet-4
-      api_key: sk-ant-...
-  - model_name: "best"
-    litellm_params:
-      model: openai/gpt-4o
-      api_key: sk-...
-router_settings:
-  routing_strategy: "latency-based-routing"
-```
-
---
-
-### ClawRouter — Cost-Optimized Routing
-
-[ClawRouter](https://github.com/BlockRunAI/ClawRouter) by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).
-
-```bash
-# Install and start
-npx @blockrun/clawrouter    # Starts on port 8402
-
-# Configure Hermes
-OPENAI_BASE_URL=http://localhost:8402/v1
-OPENAI_API_KEY=dummy
-LLM_MODEL=blockrun/auto     # or: blockrun/eco, blockrun/premium, blockrun/agentic
-```
-
-Routing profiles:
-| Profile | Strategy | Savings |
-|---------|----------|---------|
-| `blockrun/auto` | Balanced quality/cost | 74-100% |
-| `blockrun/eco` | Cheapest possible | 95-100% |
-| `blockrun/premium` | Best quality models | 0% |
-| `blockrun/free` | Free models only | 100% |
-| `blockrun/agentic` | Optimized for tool use | varies |
-
-:::note
-ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run `npx @blockrun/clawrouter doctor` to check wallet status.
-:::
-
---
-
-### Other Compatible Providers
-
-Any service with an OpenAI-compatible API works. Some popular options:
-
-| Provider | Base URL | Notes |
-|----------|----------|-------|
-| [Together AI](https://together.ai) | `https://api.together.xyz/v1` | Cloud-hosted open models |
-| [Groq](https://groq.com) | `https://api.groq.com/openai/v1` | Ultra-fast inference |
-| [DeepSeek](https://deepseek.com) | `https://api.deepseek.com/v1` | DeepSeek models |
-| [Fireworks AI](https://fireworks.ai) | `https://api.fireworks.ai/inference/v1` | Fast open model hosting |
-| [Cerebras](https://cerebras.ai) | `https://api.cerebras.ai/v1` | Wafer-scale chip inference |
-| [Mistral AI](https://mistral.ai) | `https://api.mistral.ai/v1` | Mistral models |
-| [OpenAI](https://openai.com) | `https://api.openai.com/v1` | Direct OpenAI access |
-| [Azure OpenAI](https://azure.microsoft.com) | `https://YOUR.openai.azure.com/` | Enterprise OpenAI |
-| [LocalAI](https://localai.io) | `http://localhost:8080/v1` | Self-hosted, multi-model |
-| [Jan](https://jan.ai) | `http://localhost:1337/v1` | Desktop app with local models |
-
-```bash
-# Example: Together AI
-OPENAI_BASE_URL=https://api.together.xyz/v1
-OPENAI_API_KEY=your-together-key
-LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
-```
-
---
-
-### Context Length Detection
-
-Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
-
-1. **Config override** — `model.context_length` in config.yaml (highest priority)
-2. **Custom provider per-model** — `custom_providers[].models.<id>.context_length`
-3. **Persistent cache** — previously discovered values (survives restarts)
-4. **Endpoint `/models`** — queries your server's API (local/custom endpoints)
-5. **Anthropic `/v1/models`** — queries Anthropic's API for `max_input_tokens` (API-key users only)
-6. **OpenRouter API** — live model metadata from OpenRouter
-7. **Nous Portal** — suffix-matches Nous model IDs against OpenRouter metadata
-8. **[models.dev](https://models.dev)** — community-maintained registry with provider-specific context lengths for 3800+ models across 100+ providers
-9. **Fallback defaults** — broad model family patterns (128K default)
-
-For most setups this works out of the box. The system is provider-aware — the same model can have different context limits depending on who serves it (e.g., `claude-opus-4.6` is 1M on Anthropic direct but 128K on GitHub Copilot).
-
-To set the context length explicitly, add `context_length` to your model config:
-
-```yaml
-model:
-  default: "qwen3.5:9b"
-  base_url: "http://localhost:8080/v1"
-  context_length: 131072  # tokens
-```
-
-For custom endpoints, you can also set context length per model:
-
-```yaml
-custom_providers:
-  - name: "My Local LLM"
-    base_url: "http://localhost:11434/v1"
-    models:
-      qwen3.5:27b:
-        context_length: 32768
-      deepseek-r1:70b:
-        context_length: 65536
-```
-
-`hermes model` will prompt for context length when configuring a custom endpoint. Leave it blank for auto-detection.
-
-:::tip When to set this manually
- You're using Ollama with a custom `num_ctx` that's lower than the model's maximum
- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
- You're running behind a proxy that doesn't expose `/v1/models`
-:::
-
---
-
-### Named Custom Providers
-
-If you work with multiple custom endpoints (e.g., a local dev server and a remote GPU server), you can define them as named custom providers in `config.yaml`:
-
-```yaml
-custom_providers:
-  - name: local
-    base_url: http://localhost:8080/v1
-    # api_key omitted — Hermes uses "no-key-required" for keyless local servers
-  - name: work
-    base_url: https://gpu-server.internal.corp/v1
-    api_key: corp-api-key
-    api_mode: chat_completions   # optional, auto-detected from URL
-  - name: anthropic-proxy
-    base_url: https://proxy.example.com/anthropic
-    api_key: proxy-key
-    api_mode: anthropic_messages  # for Anthropic-compatible proxies
-```
-
-Switch between them mid-session with the triple syntax:
-
-```
-/model custom:local:qwen-2.5       # Use the "local" endpoint with qwen-2.5
-/model custom:work:llama3-70b      # Use the "work" endpoint with llama3-70b
-/model custom:anthropic-proxy:claude-sonnet-4  # Use the proxy
-```
-
-You can also select named custom providers from the interactive `hermes model` menu.
-
---
-
-### Choosing the Right Setup
-
-| Use Case | Recommended |
-|----------|-------------|
-| **Just want it to work** | OpenRouter (default) or Nous Portal |
-| **Local models, easy setup** | Ollama |
-| **Production GPU serving** | vLLM or SGLang |
-| **Mac / no GPU** | Ollama or llama.cpp |
-| **Multi-provider routing** | LiteLLM Proxy or OpenRouter |
-| **Cost optimization** | ClawRouter or OpenRouter with `sort: "price"` |
-| **Maximum privacy** | Ollama, vLLM, or llama.cpp (fully local) |
-| **Enterprise / Azure** | Azure OpenAI with custom endpoint |
-| **Chinese AI models** | z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers) |
-
-:::tip
-You can switch between providers at any time with `hermes model` — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use.
-:::
-
-## Optional API Keys
-
-| Feature | Provider | Env Variable |
-|---------|----------|--------------|
-| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY`, `FIRECRAWL_API_URL` |
-| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
-| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
-| Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` |
-| OpenAI TTS + voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `VOICE_TOOLS_OPENAI_KEY` |
-| RL Training | [Tinker](https://tinker-console.thinkingmachines.ai/) + [WandB](https://wandb.ai/) | `TINKER_API_KEY`, `WANDB_API_KEY` |
-| Cross-session user modeling | [Honcho](https://honcho.dev/) | `HONCHO_API_KEY` |
-
-### Self-Hosting Firecrawl
-
-By default, Hermes uses the [Firecrawl cloud API](https://firecrawl.dev/) for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead. See Firecrawl's [SELF_HOST.md](https://github.com/firecrawl/firecrawl/blob/main/SELF_HOST.md) for complete setup instructions.
-
-**What you get:** No API key required, no rate limits, no per-page costs, full data sovereignty.
-
-**What you lose:** The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.
-
-**Setup:**
-
-1. Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):
-   ```bash
-   git clone https://github.com/firecrawl/firecrawl
-   cd firecrawl
-   # In .env, set: USE_DB_AUTHENTICATION=false, HOST=0.0.0.0, PORT=3002
-   docker compose up -d
-   ```
-
-2. Point Hermes at your instance (no API key needed):
-   ```bash
-   hermes config set FIRECRAWL_API_URL http://localhost:3002
-   ```
-
-You can also set both `FIRECRAWL_API_KEY` and `FIRECRAWL_API_URL` if your self-hosted instance has authentication enabled.
-
-## OpenRouter Provider Routing
-
-When using OpenRouter, you can control how requests are routed across providers. Add a `provider_routing` section to `~/.hermes/config.yaml`:
-
-```yaml
-provider_routing:
-  sort: "throughput"          # "price" (default), "throughput", or "latency"
-  # only: ["anthropic"]      # Only use these providers
-  # ignore: ["deepinfra"]    # Skip these providers
-  # order: ["anthropic", "google"]  # Try providers in this order
-  # require_parameters: true  # Only use providers that support all request params
-  # data_collection: "deny"   # Exclude providers that may store/train on data
-```
-
-**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.
-
-## Fallback Model
-
-Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
-
-```yaml
-fallback_model:
-  provider: openrouter                    # required
-  model: anthropic/claude-sonnet-4        # required
-  # base_url: http://localhost:8000/v1    # optional, for custom endpoints
-  # api_key_env: MY_CUSTOM_KEY           # optional, env var name for custom endpoint API key
-```
-
-When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
-
-Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
-
-:::tip
-Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
-:::
-
-## Smart Model Routing
-
-Optional cheap-vs-strong routing lets Hermes keep your main model for complex work while sending very short/simple turns to a cheaper model.
-
-```yaml
-smart_model_routing:
-  enabled: true
-  max_simple_chars: 160
-  max_simple_words: 28
-  cheap_model:
-    provider: openrouter
-    model: google/gemini-2.5-flash
-    # base_url: http://localhost:8000/v1  # optional custom endpoint
-    # api_key_env: MY_CUSTOM_KEY          # optional env var name for that endpoint's API key
-```
-
-How it works:
- If a turn is short, single-line, and does not look code/tool/debug heavy, Hermes may route it to `cheap_model`
- If the turn looks complex, Hermes stays on your primary model/provider
- If the cheap route cannot be resolved cleanly, Hermes falls back to the primary model automatically
-
-This is intentionally conservative. It is meant for quick, low-stakes turns like:
- short factual questions
- quick rewrites
- lightweight summaries
-
-It will avoid routing prompts that look like:
- coding/debugging work
- tool-heavy requests
- long or multi-line analysis asks
-
-Use this when you want lower latency or cost without fully changing your default model.
+For AI provider setup (OpenRouter, Anthropic, Copilot, custom endpoints, self-hosted LLMs, fallback models, etc.), see [AI Providers](/docs/integrations/providers).

 ## Terminal Backend Configuration

@ -706,6 +82,10 @@ terminal:
  backend: local    # local | docker | ssh | modal | daytona | singularity
  cwd: "."          # Working directory ("." = current dir for local, "/root" for containers)
  timeout: 180      # Per-command timeout in seconds
+  env_passthrough: []  # Env var names to forward to sandboxed execution (terminal + execute_code)
+  singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"  # Container image for Singularity backend
+  modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"                 # Container image for Modal backend
+  daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20"               # Container image for Daytona backend
 ```

 For cloud sandboxes such as Modal and Daytona, `container_persistent: true` means Hermes will try to preserve filesystem state across sandbox recreation. It does not promise that the same live sandbox, PID space, or background processes will still be running later.
@ -982,6 +362,26 @@ memory:
  user_char_limit: 1375     # ~500 tokens
 ```

+## File Read Safety
+
+Controls how much content a single `read_file` call can return. Reads that exceed the limit are rejected with an error telling the agent to use `offset` and `limit` for a smaller range. This prevents a single read of a minified JS bundle or large data file from flooding the context window.
+
+```yaml
+file_read_max_chars: 100000  # default — ~25-35K tokens
+```
+
+Raise it if you're on a model with a large context window and frequently read big files. Lower it for small-context models to keep reads efficient:
+
+```yaml
+# Large context model (200K+)
+file_read_max_chars: 200000
+
+# Small local model (16K context)
+file_read_max_chars: 30000
+```
+
+The agent also deduplicates file reads automatically — if the same file region is read twice and the file hasn't changed, a lightweight stub is returned instead of re-sending the content. This resets on context compression so the agent can re-read files after their content is summarized away.
+
 ## Git Worktree Isolation

 Enable isolated git worktrees for running multiple agents in parallel on the same repo:
@ -1014,6 +414,8 @@ All compression settings live in `config.yaml` (no environment variables).
 compression:
  enabled: true                                     # Toggle compression on/off
  threshold: 0.50                                   # Compress at this % of context limit
+  target_ratio: 0.20                                # Fraction of threshold to preserve as recent tail
+  protect_last_n: 20                                # Min recent messages to keep uncompressed
  summary_model: "google/gemini-3-flash-preview"    # Model for summarization
  summary_provider: "auto"                          # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
  summary_base_url: null                            # Custom OpenAI-compatible endpoint (overrides provider)
@ -1098,6 +500,18 @@ If auto-compression is disabled, the warning tells you context may be truncated

 Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context.

+## Credential Pool Strategies
+
+When you have multiple API keys or OAuth tokens for the same provider, configure the rotation strategy:
+
+```yaml
+credential_pool_strategies:
+  openrouter: round_robin    # cycle through keys evenly
+  anthropic: least_used      # always pick the least-used key
+```
+
+Options: `fill_first` (default), `round_robin`, `least_used`, `random`. See [Credential Pools](/docs/user-guide/features/credential-pools) for full documentation.
+
 ## Auxiliary Models

 Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via auto-detection — you don't need to configure anything.
@ -1148,6 +562,38 @@ auxiliary:
  # Context compression timeout (separate from compression.* config)
  compression:
    timeout: 120               # seconds — compression summarizes long conversations, needs more time
+
+  # Session search — summarizes past session matches
+  session_search:
+    provider: "auto"
+    model: ""
+    base_url: ""
+    api_key: ""
+    timeout: 30
+
+  # Skills hub — skill matching and search
+  skills_hub:
+    provider: "auto"
+    model: ""
+    base_url: ""
+    api_key: ""
+    timeout: 30
+
+  # MCP tool dispatch
+  mcp:
+    provider: "auto"
+    model: ""
+    base_url: ""
+    api_key: ""
+    timeout: 30
+
+  # Memory flush — summarizes conversation for persistent memory
+  flush_memories:
+    provider: "auto"
+    model: ""
+    base_url: ""
+    api_key: ""
+    timeout: 30
 ```

 :::tip
@ -1155,7 +601,7 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
 :::

 :::info
-Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](#fallback-model) above. All three follow the same provider/model/base_url pattern.
+Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
 :::

 ### Changing the Vision Model
@ -1342,6 +788,7 @@ display:
  streaming: false        # Stream tokens to terminal as they arrive (real-time output)
  background_process_notifications: all  # all | result | error | off (gateway only)
  show_cost: false        # Show estimated $ cost in the CLI status bar
+  tool_preview_length: 0  # Max chars for tool call previews (0 = no limit, show full paths/commands)
 ```

 ### Theme mode
@ -1447,12 +894,15 @@ When enabled, responses appear token-by-token inside a streaming box. Tool calls
 ```yaml
 streaming:
  enabled: true           # Enable progressive message editing
+  transport: edit         # "edit" (progressive message editing) or "off"
  edit_interval: 0.3      # Seconds between message edits
  buffer_threshold: 40    # Characters before forcing an edit flush
  cursor: " ▉"            # Cursor shown during streaming
 ```

-When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email) gracefully skip streaming and deliver the final response normally.
+When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
+
+**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.

 :::note
 Streaming is disabled by default. Enable it in `~/.hermes/config.yaml` to try the streaming UX.
@ -1516,23 +966,6 @@ Usage: type `/status`, `/disk`, `/update`, or `/gpu` in the CLI or any messaging
 - **Type** — only `exec` is supported (runs a shell command); other types show an error
 - **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant

-## Gateway Streaming
-
-Enable progressive token delivery on messaging platforms. When streaming is enabled, responses appear character-by-character in Telegram, Discord, and Slack via message editing, rather than waiting for the full response.
-
-```yaml
-streaming:
-  enabled: false              # Enable streaming token delivery (default: off)
-  transport: edit             # "edit" (progressive message editing) or "off"
-  edit_interval: 0.3          # Min seconds between message edits
-  buffer_threshold: 40        # Characters accumulated before forcing an edit
-  cursor: " ▉"               # Cursor character shown during streaming
-```
-
-**Platform support:** Telegram, Discord, and Slack support edit-based streaming. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
-
-**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
-
 ## Human Delay

 Simulate human-like response pacing in messaging platforms:
@ -1556,11 +989,11 @@ code_execution:

 ## Web Search Backends

-The `web_search`, `web_extract`, and `web_crawl` tools support three backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
+The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers. Configure the backend in `config.yaml` or via `hermes tools`:

 ```yaml
 web:
-  backend: firecrawl    # firecrawl | parallel | tavily
+  backend: firecrawl    # firecrawl | parallel | tavily | exa
 ```

 | Backend | Env Var | Search | Extract | Crawl |
@ -1568,8 +1001,9 @@ web:
 | **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
 | **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
 | **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
+| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |

-**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
+**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `EXA_API_KEY` is set, Exa is used. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.

 **Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth).

@ -1582,11 +1016,62 @@ Configure browser automation behavior:
 ```yaml
 browser:
  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
+  command_timeout: 30             # Timeout in seconds for browser commands (screenshot, navigate, etc.)
  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
+  camofox:
+    managed_persistence: false   # When true, Camofox sessions persist cookies/logins across restarts
 ```

 The browser toolset supports multiple providers. See the [Browser feature page](/docs/user-guide/features/browser) for details on Browserbase, Browser Use, and local Chrome CDP setup.

+## Timezone
+
+Override the server-local timezone with an IANA timezone string. Affects timestamps in logs, cron scheduling, and system prompt time injection.
+
+```yaml
+timezone: "America/New_York"   # IANA timezone (default: "" = server-local time)
+```
+
+Supported values: any IANA timezone identifier (e.g. `America/New_York`, `Europe/London`, `Asia/Kolkata`, `UTC`). Leave empty or omit for server-local time.
+
+## Discord
+
+Configure Discord-specific behavior for the messaging gateway:
+
+```yaml
+discord:
+  require_mention: true          # Require @mention to respond in server channels
+  free_response_channels: ""     # Comma-separated channel IDs where bot responds without @mention
+  auto_thread: true              # Auto-create threads on @mention in channels
+```
+
+- `require_mention` — when `true` (default), the bot only responds in server channels when mentioned with `@BotName`. DMs always work without mention.
+- `free_response_channels` — comma-separated list of channel IDs where the bot responds to every message without requiring a mention.
+- `auto_thread` — when `true` (default), mentions in channels automatically create a thread for the conversation, keeping channels clean (similar to Slack threading).
+
+## Security
+
+Pre-execution security scanning and secret redaction:
+
+```yaml
+security:
+  redact_secrets: true           # Redact API key patterns in tool output and logs
+  tirith_enabled: true           # Enable Tirith security scanning for terminal commands
+  tirith_path: "tirith"          # Path to tirith binary (default: "tirith" in $PATH)
+  tirith_timeout: 5              # Seconds to wait for tirith scan before timing out
+  tirith_fail_open: true         # Allow command execution if tirith is unavailable
+  website_blocklist:             # See Website Blocklist section below
+    enabled: false
+    domains: []
+    shared_files: []
+```
+
+- `redact_secrets` — automatically detects and redacts patterns that look like API keys, tokens, and passwords in tool output before it enters the conversation context and logs.
+- `tirith_enabled` — when `true`, terminal commands are scanned by [Tirith](https://github.com/StackGuardian/tirith) before execution to detect potentially dangerous operations.
+- `tirith_path` — path to the tirith binary. Set this if tirith is installed in a non-standard location.
+- `tirith_timeout` — maximum seconds to wait for a tirith scan. Commands proceed if the scan times out.
+- `tirith_fail_open` — when `true` (default), commands are allowed to execute if tirith is unavailable or fails. Set to `false` to block commands when tirith cannot verify them.
+
 ## Website Blocklist

 Block specific domains from being accessed by the agent's web and browser tools:
@ -1637,7 +1122,7 @@ Setting `approvals.mode: off` disables all safety checks for terminal commands.

 ## Checkpoints

-Automatic filesystem snapshots before destructive file operations. See the [Checkpoints feature page](/docs/user-guide/features/checkpoints) for details.
+Automatic filesystem snapshots before destructive file operations. See the [Checkpoints & Rollback](/docs/user-guide/checkpoints-and-rollback) for details.

 ```yaml
 checkpoints:
--- a/website/docs/user-guide/docker.md
+++ b/website/docs/user-guide/docker.md
@ -1,10 +1,17 @@
+---
+sidebar_position: 7
+title: "Docker"
+description: "Running Hermes Agent in Docker and using Docker as a terminal backend"
+---
+
 # Hermes Agent — Docker

-Want to run Hermes Agent, but without installing packages on your host? This'll sort you out.
+There are two distinct ways Docker intersects with Hermes Agent:

-This will let you run the agent in a container, with the most relevant modes outlined below.
+1. **Running Hermes IN Docker** — the agent itself runs inside a container (this page's primary focus)
+2. **Docker as a terminal backend** — the agent runs on your host but executes commands inside a Docker sandbox (see [Configuration → terminal.backend](./configuration.md))

-The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.
+This page covers option 1. The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.

 ## Quick start

@ -41,6 +48,110 @@ docker run -it --rm \
  nousresearch/hermes-agent
 ```

+## Persistent volumes
+
+The `/opt/data` volume is the single source of truth for all Hermes state. It maps to your host's `~/.hermes/` directory and contains:
+
+| Path | Contents |
+|------|----------|
+| `.env` | API keys and secrets |
+| `config.yaml` | All Hermes configuration |
+| `SOUL.md` | Agent personality/identity |
+| `sessions/` | Conversation history |
+| `memories/` | Persistent memory store |
+| `skills/` | Installed skills |
+| `cron/` | Scheduled job definitions |
+| `hooks/` | Event hooks |
+| `logs/` | Runtime logs |
+| `skins/` | Custom CLI skins |
+
+:::warning
+Never run two Hermes containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent access.
+:::
+
+## Environment variable forwarding
+
+API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
+
+```sh
+docker run -it --rm \
+  -v ~/.hermes:/opt/data \
+  -e ANTHROPIC_API_KEY="sk-ant-..." \
+  -e OPENAI_API_KEY="sk-..." \
+  nousresearch/hermes-agent
+```
+
+Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
+
+## Docker Compose example
+
+For persistent gateway deployment, a `docker-compose.yaml` is convenient:
+
+```yaml
+version: "3.8"
+services:
+  hermes:
+    image: nousresearch/hermes-agent:latest
+    container_name: hermes
+    restart: unless-stopped
+    command: gateway run
+    volumes:
+      - ~/.hermes:/opt/data
+    # Uncomment to forward specific env vars instead of using .env file:
+    # environment:
+    #   - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+    #   - OPENAI_API_KEY=${OPENAI_API_KEY}
+    #   - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
+    deploy:
+      resources:
+        limits:
+          memory: 4G
+          cpus: "2.0"
+```
+
+Start with `docker compose up -d` and view logs with `docker compose logs -f hermes`.
+
+## Resource limits
+
+The Hermes container needs moderate resources. Recommended minimums:
+
+| Resource | Minimum | Recommended |
+|----------|---------|-------------|
+| Memory | 1 GB | 2–4 GB |
+| CPU | 1 core | 2 cores |
+| Disk (data volume) | 500 MB | 2+ GB (grows with sessions/skills) |
+
+Browser automation (Playwright/Chromium) is the most memory-hungry feature. If you don't need browser tools, 1 GB is sufficient. With browser tools active, allocate at least 2 GB.
+
+Set limits in Docker:
+
+```sh
+docker run -d \
+  --name hermes \
+  --restart unless-stopped \
+  --memory=4g --cpus=2 \
+  -v ~/.hermes:/opt/data \
+  nousresearch/hermes-agent gateway run
+```
+
+## What the Dockerfile does
+
+The official image is based on `debian:13.4` and includes:
+
+- Python 3 with all Hermes dependencies (`pip install -e ".[all]"`)
+- Node.js + npm (for browser automation and WhatsApp bridge)
+- Playwright with Chromium (`npx playwright install --with-deps chromium`)
+- ripgrep and ffmpeg as system utilities
+- The WhatsApp bridge (`scripts/whatsapp-bridge/`)
+
+The entrypoint script (`docker/entrypoint.sh`) bootstraps the data volume on first run:
+- Creates the directory structure (`sessions/`, `memories/`, `skills/`, etc.)
+- Copies `.env.example` → `.env` if no `.env` exists
+- Copies default `config.yaml` if missing
+- Copies default `SOUL.md` if missing
+- Syncs bundled skills using a manifest-based approach (preserves user edits)
+- Then runs `hermes` with whatever arguments you pass
+
 ## Upgrading

 Pull the latest image and recreate the container. Your data directory is untouched.
@ -52,5 +163,62 @@ docker run -d \
  --name hermes \
  --restart unless-stopped \
  -v ~/.hermes:/opt/data \
-  nousresearch/hermes-agent
+  nousresearch/hermes-agent gateway run
+```
+
+Or with Docker Compose:
+
+```sh
+docker compose pull
+docker compose up -d
+```
+
+## Skills and credential files
+
+When using Docker as the execution environment (not the methods above, but when the agent runs commands inside a Docker sandbox), Hermes automatically bind-mounts the skills directory (`~/.hermes/skills/`) and any credential files declared by skills into the container as read-only volumes. This means skill scripts, templates, and references are available inside the sandbox without manual configuration.
+
+The same syncing happens for SSH and Modal backends — skills and credential files are uploaded via rsync or the Modal mount API before each command.
+
+## Troubleshooting
+
+### Container exits immediately
+
+Check logs: `docker logs hermes`. Common causes:
+- Missing or invalid `.env` file — run interactively first to complete setup
+- Port conflicts if running with exposed ports
+
+### "Permission denied" errors
+
+The container runs as root by default. If your host `~/.hermes/` was created by a non-root user, permissions should work. If you get errors, ensure the data directory is writable:
+
+```sh
+chmod -R 755 ~/.hermes
+```
+
+### Browser tools not working
+
+Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command:
+
+```sh
+docker run -d \
+  --name hermes \
+  --shm-size=1g \
+  -v ~/.hermes:/opt/data \
+  nousresearch/hermes-agent gateway run
+```
+
+### Gateway not reconnecting after network issues
+
+The `--restart unless-stopped` flag handles most transient failures. If the gateway is stuck, restart the container:
+
+```sh
+docker restart hermes
+```
+
+### Checking container health
+
+```sh
+docker logs --tail 50 hermes          # Recent logs
+docker exec hermes hermes version     # Verify version
+docker stats hermes                    # Resource usage
 ```
--- a/website/docs/user-guide/features/api-server.md
+++ b/website/docs/user-guide/features/api-server.md
@ -8,7 +8,7 @@ description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"

 The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.

-Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.
+Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. When streaming, tool progress indicators appear inline so frontends can show what the agent is doing.

 ## Quick Start

@ -85,6 +85,8 @@ Standard OpenAI Chat Completions format. Stateless — the full conversation is

 **Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.

+**Tool progress in streams**: When the agent calls tools during a streaming request, brief progress indicators are injected into the content stream as the tools start executing (e.g. `` `💻 pwd` ``, `` `🔍 Python docs` ``). These appear as inline markdown before the agent's response text, giving frontends like Open WebUI real-time visibility into tool execution.
+
 ### POST /v1/responses

 OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
--- a/website/docs/user-guide/features/browser.md
+++ b/website/docs/user-guide/features/browser.md
@ -11,6 +11,7 @@ Hermes Agent includes a full browser automation toolset with multiple backend op

 - **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
 - **Browser Use cloud mode** via [Browser Use](https://browser-use.com) as an alternative cloud browser provider
+- **Camofox local mode** via [Camofox](https://github.com/jo-inc/camofox-browser) for local anti-detection browsing (Firefox-based fingerprint spoofing)
 - **Local Chrome via CDP** — connect browser tools to your own Chrome instance using `/browser connect`
 - **Local browser mode** via the `agent-browser` CLI and a local Chromium installation

@ -54,6 +55,50 @@ BROWSER_USE_API_KEY=***

 Get your API key at [browser-use.com](https://browser-use.com). Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.

+### Camofox local mode
+
+[Camofox](https://github.com/jo-inc/camofox-browser) is a self-hosted Node.js server wrapping Camoufox (a Firefox fork with C++ fingerprint spoofing). It provides local anti-detection browsing without cloud dependencies.
+
+```bash
+# Install and run
+git clone https://github.com/jo-inc/camofox-browser && cd camofox-browser
+npm install && npm start   # downloads Camoufox (~300MB) on first run
+
+# Or via Docker
+docker run -d --network host -e CAMOFOX_PORT=9377 jo-inc/camofox-browser
+```
+
+Then set in `~/.hermes/.env`:
+
+```bash
+CAMOFOX_URL=http://localhost:9377
+```
+
+Or configure via `hermes tools` → Browser Automation → Camofox.
+
+When `CAMOFOX_URL` is set, all browser tools automatically route through Camofox instead of Browserbase or agent-browser.
+
+#### Persistent browser sessions
+
+By default, each Camofox session gets a random identity — cookies and logins don't survive across agent restarts. To enable persistent browser sessions:
+
+```yaml
+# In ~/.hermes/config.yaml
+browser:
+  camofox:
+    managed_persistence: true
+```
+
+When enabled, Hermes sends a stable profile-scoped identity to Camofox. The Camofox server maps this identity to a persistent browser profile directory, so cookies, logins, and localStorage survive across restarts. Different Hermes profiles get different browser profiles (profile isolation).
+
+:::note
+The Camofox server must also be configured with `CAMOFOX_PROFILE_DIR` on the server side for persistence to work.
+:::
+
+#### VNC live view
+
+When Camofox runs in headed mode (with a visible browser window), it exposes a VNC port in its health check response. Hermes automatically discovers this and includes the VNC URL in navigation responses, so the agent can share a link for you to watch the browser live.
+
 ### Local Chrome via CDP (`/browser connect`)

 Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
--- a/website/docs/user-guide/features/checkpoints.md
+++ b/website/docs/user-guide/features/checkpoints.md
@ -1,30 +0,0 @@
-# Filesystem Checkpoints
-
-Hermes automatically snapshots your working directory before making file changes, giving you a safety net to roll back if something goes wrong. Checkpoints are **enabled by default**.
-
-## Quick Reference
-
-| Command | Description |
-|---------|-------------|
-| `/rollback` | List all checkpoints with change stats |
-| `/rollback <N>` | Restore to checkpoint N (also undoes last chat turn) |
-| `/rollback diff <N>` | Preview diff between checkpoint N and current state |
-| `/rollback <N> <file>` | Restore a single file from checkpoint N |
-
-## What Triggers Checkpoints
-
- **File tools** — `write_file` and `patch`
- **Destructive terminal commands** — `rm`, `mv`, `sed -i`, output redirects (`>`), `git reset`/`clean`
-
-## Configuration
-
-```yaml
-# ~/.hermes/config.yaml
-checkpoints:
-  enabled: true          # default: true
-  max_snapshots: 50      # max checkpoints per directory
-```
-
-## Learn More
-
-For the full guide — how shadow repos work, diff previews, file-level restore, conversation undo, safety guards, and best practices — see **[Checkpoints and /rollback](../checkpoints-and-rollback.md)**.
--- a/website/docs/user-guide/features/context-references.md
+++ b/website/docs/user-guide/features/context-references.md
@ -1,5 +1,6 @@
 ---
 sidebar_position: 9
+sidebar_label: "Context References"
 title: "Context References"
 description: "Inline @-syntax for attaching files, folders, git diffs, and URLs directly into your messages"
 ---
--- a/website/docs/user-guide/features/credential-pools.md
+++ b/website/docs/user-guide/features/credential-pools.md
@ -0,0 +1,230 @@
+---
+title: Credential Pools
+description: Pool multiple API keys or OAuth tokens per provider for automatic rotation and rate limit recovery.
+sidebar_label: Credential Pools
+sidebar_position: 9
+---
+
+# Credential Pools
+
+Credential pools let you register multiple API keys or OAuth tokens for the same provider. When one key hits a rate limit or billing quota, Hermes automatically rotates to the next healthy key — keeping your session alive without switching providers.
+
+This is different from [fallback providers](./fallback-providers.md), which switch to a *different* provider entirely. Credential pools are same-provider rotation; fallback providers are cross-provider failover. Pools are tried first — if all pool keys are exhausted, *then* the fallback provider activates.
+
+## How It Works
+
+```
+Your request
+  → Pick key from pool (round_robin / least_used / fill_first / random)
+  → Send to provider
+  → 429 rate limit?
+      → Retry same key once (transient blip)
+      → Second 429 → rotate to next pool key
+      → All keys exhausted → fallback_model (different provider)
+  → 402 billing error?
+      → Immediately rotate to next pool key (24h cooldown)
+  → 401 auth expired?
+      → Try refreshing the token (OAuth)
+      → Refresh failed → rotate to next pool key
+  → Success → continue normally
+```
+
+## Quick Start
+
+If you already have an API key set in `.env`, Hermes auto-discovers it as a 1-key pool. To benefit from pooling, add more keys:
+
+```bash
+# Add a second OpenRouter key
+hermes auth add openrouter --api-key sk-or-v1-your-second-key
+
+# Add a second Anthropic key
+hermes auth add anthropic --type api-key --api-key sk-ant-api03-your-second-key
+
+# Add an Anthropic OAuth credential (Claude Code subscription)
+hermes auth add anthropic --type oauth
+# Opens browser for OAuth login
+```
+
+Check your pools:
+
+```bash
+hermes auth list
+```
+
+Output:
+```
+openrouter (2 credentials):
+  #1  OPENROUTER_API_KEY   api_key env:OPENROUTER_API_KEY ←
+  #2  backup-key           api_key manual
+
+anthropic (3 credentials):
+  #1  hermes_pkce          oauth   hermes_pkce ←
+  #2  claude_code          oauth   claude_code
+  #3  ANTHROPIC_API_KEY    api_key env:ANTHROPIC_API_KEY
+```
+
+The `←` marks the currently selected credential.
+
+## Interactive Management
+
+Run `hermes auth` with no subcommand for an interactive wizard:
+
+```bash
+hermes auth
+```
+
+This shows your full pool status and offers a menu:
+
+```
+What would you like to do?
+  1. Add a credential
+  2. Remove a credential
+  3. Reset cooldowns for a provider
+  4. Set rotation strategy for a provider
+  5. Exit
+```
+
+For providers that support both API keys and OAuth (Anthropic, Nous, Codex), the add flow asks which type:
+
+```
+anthropic supports both API keys and OAuth login.
+  1. API key (paste a key from the provider dashboard)
+  2. OAuth login (authenticate via browser)
+Type [1/2]:
+```
+
+## CLI Commands
+
+| Command | Description |
+|---------|-------------|
+| `hermes auth` | Interactive pool management wizard |
+| `hermes auth list` | Show all pools and credentials |
+| `hermes auth list <provider>` | Show a specific provider's pool |
+| `hermes auth add <provider>` | Add a credential (prompts for type and key) |
+| `hermes auth add <provider> --type api-key --api-key <key>` | Add an API key non-interactively |
+| `hermes auth add <provider> --type oauth` | Add an OAuth credential via browser login |
+| `hermes auth remove <provider> <index>` | Remove credential by 1-based index |
+| `hermes auth reset <provider>` | Clear all cooldowns/exhaustion status |
+
+## Rotation Strategies
+
+Configure via `hermes auth` → "Set rotation strategy" or in `config.yaml`:
+
+```yaml
+credential_pool_strategies:
+  openrouter: round_robin
+  anthropic: least_used
+```
+
+| Strategy | Behavior |
+|----------|----------|
+| `fill_first` (default) | Use the first healthy key until it's exhausted, then move to the next |
+| `round_robin` | Cycle through keys evenly, rotating after each selection |
+| `least_used` | Always pick the key with the lowest request count |
+| `random` | Random selection among healthy keys |
+
+## Error Recovery
+
+The pool handles different errors differently:
+
+| Error | Behavior | Cooldown |
+|-------|----------|----------|
+| **429 Rate Limit** | Retry same key once (transient). Second consecutive 429 rotates to next key | 1 hour |
+| **402 Billing/Quota** | Immediately rotate to next key | 24 hours |
+| **401 Auth Expired** | Try refreshing the OAuth token first. Rotate only if refresh fails | — |
+| **All keys exhausted** | Fall through to `fallback_model` if configured | — |
+
+The `has_retried_429` flag resets on every successful API call, so a single transient 429 doesn't trigger rotation.
+
+## Custom Endpoint Pools
+
+Custom OpenAI-compatible endpoints (Together.ai, RunPod, local servers) get their own pools, keyed by the endpoint name from `custom_providers` in config.yaml.
+
+When you set up a custom endpoint via `hermes model`, it auto-generates a name like "Together.ai" or "Local (localhost:8080)". This name becomes the pool key.
+
+```bash
+# After setting up a custom endpoint via hermes model:
+hermes auth list
+# Shows:
+#   Together.ai (1 credential):
+#     #1  config key    api_key config:Together.ai ←
+
+# Add a second key for the same endpoint:
+hermes auth add Together.ai --api-key sk-together-second-key
+```
+
+Custom endpoint pools are stored in `auth.json` under `credential_pool` with a `custom:` prefix:
+
+```json
+{
+  "credential_pool": {
+    "openrouter": [...],
+    "custom:together.ai": [...]
+  }
+}
+```
+
+## Auto-Discovery
+
+Hermes automatically discovers credentials from multiple sources and seeds the pool on startup:
+
+| Source | Example | Auto-seeded? |
+|--------|---------|-------------|
+| Environment variables | `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY` | Yes |
+| OAuth tokens (auth.json) | Codex device code, Nous device code | Yes |
+| Claude Code credentials | `~/.claude/.credentials.json` | Yes (Anthropic) |
+| Hermes PKCE OAuth | `~/.hermes/auth.json` | Yes (Anthropic) |
+| Custom endpoint config | `model.api_key` in config.yaml | Yes (custom endpoints) |
+| Manual entries | Added via `hermes auth add` | Persisted in auth.json |
+
+Auto-seeded entries are updated on each pool load — if you remove an env var, its pool entry is automatically pruned. Manual entries (added via `hermes auth add`) are never auto-pruned.
+
+## Thread Safety
+
+The credential pool uses a threading lock for all state mutations (`select()`, `mark_exhausted_and_rotate()`, `try_refresh_current()`, `mark_used()`). This ensures safe concurrent access when the gateway handles multiple chat sessions simultaneously.
+
+## Architecture
+
+For the full data flow diagram, see [`docs/credential-pool-flow.excalidraw`](https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g) in the repository.
+
+The credential pool integrates at the provider resolution layer:
+
+1. **`agent/credential_pool.py`** — Pool manager: storage, selection, rotation, cooldowns
+2. **`hermes_cli/auth_commands.py`** — CLI commands and interactive wizard
+3. **`hermes_cli/runtime_provider.py`** — Pool-aware credential resolution
+4. **`run_agent.py`** — Error recovery: 429/402/401 → pool rotation → fallback
+
+## Storage
+
+Pool state is stored in `~/.hermes/auth.json` under the `credential_pool` key:
+
+```json
+{
+  "version": 1,
+  "credential_pool": {
+    "openrouter": [
+      {
+        "id": "abc123",
+        "label": "OPENROUTER_API_KEY",
+        "auth_type": "api_key",
+        "priority": 0,
+        "source": "env:OPENROUTER_API_KEY",
+        "access_token": "sk-or-v1-...",
+        "last_status": "ok",
+        "request_count": 142
+      }
+    ]
+  },
+  "credential_pool_strategies": {
+    "openrouter": "round_robin"
+  }
+}
+```
+
+Strategies are stored in `config.yaml` (not `auth.json`):
+
+```yaml
+credential_pool_strategies:
+  openrouter: round_robin
+  anthropic: least_used
+```
--- a/website/docs/user-guide/features/cron.md
+++ b/website/docs/user-guide/features/cron.md
@ -193,6 +193,40 @@ When scheduling jobs, you specify where the output goes:

 The agent's final response is automatically delivered. You do not need to call `send_message` in the cron prompt.

+### Response wrapping
+
+By default, delivered cron output is wrapped with a header and footer so the recipient knows it came from a scheduled task:
+
+```
+Cronjob Response: Morning feeds
+-------------
+
+<agent output here>
+
+Note: The agent cannot see this message, and therefore cannot respond to it.
+```
+
+To deliver the raw agent output without the wrapper, set `cron.wrap_response` to `false`:
+
+```yaml
+# ~/.hermes/config.yaml
+cron:
+  wrap_response: false
+```
+
+### Silent suppression
+
+If the agent's final response starts with `[SILENT]`, delivery is suppressed entirely. The output is still saved locally for audit (in `~/.hermes/cron/output/`), but no message is sent to the delivery target.
+
+This is useful for monitoring jobs that should only report when something is wrong:
+
+```text
+Check if nginx is running. If everything is healthy, respond with only [SILENT].
+Otherwise, report the issue.
+```
+
+Failed jobs always deliver regardless of the `[SILENT]` marker — only successful runs can be silenced.
+
 ## Schedule formats

 The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
--- a/website/docs/user-guide/features/fallback-providers.md
+++ b/website/docs/user-guide/features/fallback-providers.md
@ -7,12 +7,13 @@ sidebar_position: 8

 # Fallback Providers

-Hermes Agent has two separate fallback systems that keep your sessions running when providers hit issues:
+Hermes Agent has three layers of resilience that keep your sessions running when providers hit issues:

-1. **Primary model fallback** — automatically switches to a backup provider:model when your main model fails
-2. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction
+1. **[Credential pools](./credential-pools.md)** — rotate across multiple API keys for the *same* provider (tried first)
+2. **Primary model fallback** — automatically switches to a *different* provider:model when your main model fails
+3. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction

-Both are optional and work independently.
+Credential pools handle same-provider rotation (e.g., multiple OpenRouter keys). This page covers cross-provider fallback. Both are optional and work independently.

 ## Primary Model Fallback

--- a/website/docs/user-guide/features/mcp.md
+++ b/website/docs/user-guide/features/mcp.md
@ -168,9 +168,7 @@ So a server that exposes callable tools but no resources/prompts will not get th

 ## Per-server filtering

-This is the main feature added by the PR work.
-
-You can now control which tools each MCP server contributes to Hermes.
+You can control which tools each MCP server contributes to Hermes, allowing fine-grained management of your tool namespace.

 ### Disable a server entirely

@ -277,6 +275,14 @@ That keeps the tool list clean.

 Hermes discovers MCP servers at startup and registers their tools into the normal tool registry.

+### Dynamic Tool Discovery
+
+MCP servers can notify Hermes when their available tools change at runtime by sending a `notifications/tools/list_changed` notification. When Hermes receives this notification, it automatically re-fetches the server's tool list and updates the registry — no manual `/reload-mcp` required.
+
+This is useful for MCP servers whose capabilities change dynamically (e.g. a server that adds tools when a new database schema is loaded, or removes tools when a service goes offline).
+
+The refresh is lock-protected so rapid-fire notifications from the same server don't cause overlapping refreshes. Prompt and resource change notifications (`prompts/list_changed`, `resources/list_changed`) are received but not yet acted on.
+
 ### Reloading

 If you change MCP config, use:
@ -285,7 +291,7 @@ If you change MCP config, use:
 /reload-mcp
 ```

-This reloads MCP servers from config and refreshes the available tool list.
+This reloads MCP servers from config and refreshes the available tool list. For runtime tool changes pushed by the server itself, see [Dynamic Tool Discovery](#dynamic-tool-discovery) above.

 ### Toolsets

@ -403,6 +409,39 @@ Because Hermes now only registers those wrappers when both are true:

 This is intentional and keeps the tool list honest.

+## MCP Sampling Support
+
+MCP servers can request LLM inference from Hermes via the `sampling/createMessage` protocol. This allows an MCP server to ask Hermes to generate text on its behalf — useful for servers that need LLM capabilities but don't have their own model access.
+
+Sampling is **enabled by default** for all MCP servers (when the MCP SDK supports it). Configure it per-server under the `sampling` key:
+
+```yaml
+mcp_servers:
+  my_server:
+    command: "my-mcp-server"
+    sampling:
+      enabled: true            # Enable sampling (default: true)
+      model: "openai/gpt-4o"  # Override model for sampling requests (optional)
+      max_tokens_cap: 4096     # Max tokens per sampling response (default: 4096)
+      timeout: 30              # Timeout in seconds per request (default: 30)
+      max_rpm: 10              # Rate limit: max requests per minute (default: 10)
+      max_tool_rounds: 5       # Max tool-use rounds in sampling loops (default: 5)
+      allowed_models: []       # Allowlist of model names the server may request (empty = any)
+      log_level: "info"        # Audit log level: debug, info, or warning (default: info)
+```
+
+The sampling handler includes a sliding-window rate limiter, per-request timeouts, and tool-loop depth limits to prevent runaway usage. Metrics (request count, errors, tokens used) are tracked per server instance.
+
+To disable sampling for a specific server:
+
+```yaml
+mcp_servers:
+  untrusted_server:
+    url: "https://mcp.example.com"
+    sampling:
+      enabled: false
+```
+
 ## Running Hermes as an MCP server

 In addition to connecting **to** MCP servers, Hermes can also **be** an MCP server. This lets other MCP-capable agents (Claude Code, Cursor, Codex, or any MCP client) use Hermes's messaging capabilities — list conversations, read message history, and send messages across all your connected platforms.
--- a/website/docs/user-guide/features/overview.md
+++ b/website/docs/user-guide/features/overview.md
@ -0,0 +1,49 @@
+---
+title: "Features Overview"
+sidebar_label: "Overview"
+sidebar_position: 1
+---
+
+# Features Overview
+
+Hermes Agent includes a rich set of capabilities that extend far beyond basic chat. From persistent memory and file-aware context to browser automation and voice conversations, these features work together to make Hermes a powerful autonomous assistant.
+
+## Core
+
+- **[Tools & Toolsets](tools.md)** — Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform, covering web search, terminal execution, file editing, memory, delegation, and more.
+- **[Skills System](skills.md)** — On-demand knowledge documents the agent can load when needed. Skills follow a progressive disclosure pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
+- **[Persistent Memory](memory.md)** — Bounded, curated memory that persists across sessions. Hermes remembers your preferences, projects, environment, and things it has learned via `MEMORY.md` and `USER.md`.
+- **[Context Files](context-files.md)** — Hermes automatically discovers and loads project context files (`.hermes.md`, `AGENTS.md`, `CLAUDE.md`, `SOUL.md`, `.cursorrules`) that shape how it behaves in your project.
+- **[Context References](context-references.md)** — Type `@` followed by a reference to inject files, folders, git diffs, and URLs directly into your messages. Hermes expands the reference inline and appends the content automatically.
+- **[Checkpoints](../checkpoints-and-rollback.md)** — Hermes automatically snapshots your working directory before making file changes, giving you a safety net to roll back with `/rollback` if something goes wrong.
+
+## Automation
+
+- **[Scheduled Tasks (Cron)](cron.md)** — Schedule tasks to run automatically with natural language or cron expressions. Jobs can attach skills, deliver results to any platform, and support pause/resume/edit operations.
+- **[Subagent Delegation](delegation.md)** — The `delegate_task` tool spawns child agent instances with isolated context, restricted toolsets, and their own terminal sessions. Run up to 3 concurrent subagents for parallel workstreams.
+- **[Code Execution](code-execution.md)** — The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn via sandboxed RPC execution.
+- **[Event Hooks](hooks.md)** — Run custom code at key lifecycle points. Gateway hooks handle logging, alerts, and webhooks; plugin hooks handle tool interception, metrics, and guardrails.
+- **[Batch Processing](batch-processing.md)** — Run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.
+
+## Media & Web
+
+- **[Voice Mode](voice-mode.md)** — Full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
+- **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
+- **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
+- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai's FLUX 2 Pro model with automatic 2x upscaling via the Clarity Upscaler.
+- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with four provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, and NeuTTS.
+
+## Integrations
+
+- **[Provider Routing](provider-routing.md)** — Fine-grained control over which AI providers handle your requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and priority ordering.
+- **[Fallback Providers](fallback-providers.md)** — Automatic failover to backup LLM providers when your primary model encounters errors, including independent fallback for auxiliary tasks like vision and compression.
+- **[API Server](api-server.md)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Connect any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, and more.
+- **[IDE Integration (ACP)](acp.md)** — Use Hermes inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Chat, tool activity, file diffs, and terminal commands render inside your editor.
+- **[Honcho Memory](honcho.md)** — AI-native persistent memory for cross-session user modeling and personalization via dialectic reasoning.
+- **[RL Training](rl-training.md)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
+
+## Customization
+
+- **[Personality & SOUL.md](personality.md)** — Fully customizable agent personality. `SOUL.md` is the primary identity file — the first thing in the system prompt — and you can swap in built-in or custom `/personality` presets per session.
+- **[Skins & Themes](skins.md)** — Customize the CLI's visual presentation: banner colors, spinner faces and verbs, response-box labels, branding text, and the tool activity prefix.
+- **[Plugins](plugins.md)** — Add custom tools, hooks, and integrations without modifying core code. Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code.
--- a/website/docs/user-guide/features/plugins.md
+++ b/website/docs/user-guide/features/plugins.md
@ -1,10 +1,13 @@
 ---
-sidebar_position: 20
+sidebar_position: 11
+sidebar_label: "Plugins"
+title: "Plugins"
+description: "Extend Hermes with custom tools, hooks, and integrations via the plugin system"
 ---

 # Plugins

-Hermes has a plugin system for adding custom tools, hooks, slash commands, and integrations without modifying core code.
+Hermes has a plugin system for adding custom tools, hooks, and integrations without modifying core code.

 **→ [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin)** — step-by-step guide with a complete working example.

@ -22,6 +25,56 @@ Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:

 Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.

+### Minimal working example
+
+Here is a complete plugin that adds a `hello_world` tool and logs every tool call via a hook.
+
+**`~/.hermes/plugins/hello-world/plugin.yaml`**
+
+```yaml
+name: hello-world
+version: "1.0"
+description: A minimal example plugin
+```
+
+**`~/.hermes/plugins/hello-world/__init__.py`**
+
+```python
+"""Minimal Hermes plugin — registers a tool and a hook."""
+
+
+def register(ctx):
+    # --- Tool: hello_world ---
+    schema = {
+        "name": "hello_world",
+        "description": "Returns a friendly greeting for the given name.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "name": {
+                    "type": "string",
+                    "description": "Name to greet",
+                }
+            },
+            "required": ["name"],
+        },
+    }
+
+    def handle_hello(params):
+        name = params.get("name", "World")
+        return f"Hello, {name}! 👋  (from the hello-world plugin)"
+
+    ctx.register_tool("hello_world", schema, handle_hello)
+
+    # --- Hook: log every tool call ---
+    def on_tool_call(tool_name, params, result):
+        print(f"[hello-world] tool called: {tool_name}")
+
+    ctx.register_hook("post_tool_call", on_tool_call)
+```
+
+Drop both files into `~/.hermes/plugins/hello-world/`, restart Hermes, and the model can immediately call `hello_world`. The hook prints a log line after every tool invocation.
+
 Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.

 ## What plugins can do
@ -30,7 +83,7 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable
 |-----------|-----|
 | Add tools | `ctx.register_tool(name, schema, handler)` |
 | Add hooks | `ctx.register_hook("post_tool_call", callback)` |
-| Add slash commands | `ctx.register_command("mycommand", handler)` |
+| Inject messages | `ctx.inject_message(content, role="user")` — see [Injecting Messages](#injecting-messages) |
 | Ship data files | `Path(__file__).parent / "data" / "file.yaml"` |
 | Bundle skills | Copy `skill.md` to `~/.hermes/skills/` at load time |
 | Gate on env vars | `requires_env: [API_KEY]` in plugin.yaml |
@ -57,34 +110,6 @@ Plugins can register callbacks for these lifecycle events. See the **[Event Hook
 | `on_session_start` | New session created (first turn only) |
 | `on_session_end` | End of every `run_conversation` call |

-## Slash commands
-
-Plugins can register slash commands that work in both CLI and messaging platforms:
-
-```python
-def register(ctx):
-    ctx.register_command(
-        name="greet",
-        handler=lambda args: f"Hello, {args or 'world'}!",
-        description="Greet someone",
-        args_hint="[name]",
-        aliases=("hi",),
-    )
-```
-
-The handler receives the argument string (everything after `/greet`) and returns a string to display. Registered commands automatically appear in `/help`, tab autocomplete, Telegram bot menu, and Slack subcommand mapping.
-
-| Parameter | Description |
-|-----------|-------------|
-| `name` | Command name without slash |
-| `handler` | Callable that takes `args: str` and returns `str | None` |
-| `description` | Shown in `/help` |
-| `args_hint` | Usage hint, e.g. `"[name]"` |
-| `aliases` | Tuple of alternative names |
-| `cli_only` | Only available in CLI |
-| `gateway_only` | Only available in messaging platforms |
-| `gateway_config_gate` | Config dotpath (e.g. `"display.my_option"`). When set on a `cli_only` command, the command becomes available in the gateway if the config value is truthy. |
-
 ## Managing plugins

 ```bash
@ -109,4 +134,27 @@ plugins:

 In a running session, `/plugins` shows which plugins are currently loaded.

+## Injecting Messages
+
+Plugins can inject messages into the active conversation using `ctx.inject_message()`:
+
+```python
+ctx.inject_message("New data arrived from the webhook", role="user")
+```
+
+**Signature:** `ctx.inject_message(content: str, role: str = "user") -> bool`
+
+How it works:
+
+- If the agent is **idle** (waiting for user input), the message is queued as the next input and starts a new turn.
+- If the agent is **mid-turn** (actively running), the message interrupts the current operation — the same as a user typing a new message and pressing Enter.
+- For non-`"user"` roles, the content is prefixed with `[role]` (e.g. `[system] ...`).
+- Returns `True` if the message was queued successfully, `False` if no CLI reference is available (e.g. in gateway mode).
+
+This enables plugins like remote control viewers, messaging bridges, or webhook receivers to feed messages into the conversation from external sources.
+
+:::note
+`inject_message` is only available in CLI mode. In gateway mode, there is no CLI reference and the method returns `False`.
+:::
+
 See the **[full guide](/docs/guides/build-a-hermes-plugin)** for handler contracts, schema format, hook behavior, error handling, and common mistakes.
--- a/website/docs/user-guide/features/skins.md
+++ b/website/docs/user-guide/features/skins.md
@ -30,28 +30,150 @@ display:

 ## Built-in skins

-| Skin | Description | Agent branding |
-|------|-------------|----------------|
-| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` |
-| `ares` | War-god theme — crimson and bronze | `Ares Agent` |
-| `mono` | Monochrome — clean grayscale | `Hermes Agent` |
-| `slate` | Cool blue — developer-focused | `Hermes Agent` |
-| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` |
-| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` |
-| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` |
+| Skin | Description | Agent branding | Visual character |
+|------|-------------|----------------|------------------|
+| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` | Warm gold borders, cornsilk text, kawaii faces in spinners. The familiar caduceus banner. Clean and inviting. |
+| `ares` | War-god theme — crimson and bronze | `Ares Agent` | Deep crimson borders with bronze accents. Aggressive spinner verbs ("forging", "marching", "tempering steel"). Custom sword-and-shield ASCII art banner. |
+| `mono` | Monochrome — clean grayscale | `Hermes Agent` | All grays — no color. Borders are `#555555`, text is `#c9d1d9`. Ideal for minimal terminal setups or screen recordings. |
+| `slate` | Cool blue — developer-focused | `Hermes Agent` | Royal blue borders (`#4169e1`), soft blue text. Calm and professional. No custom spinner — uses default faces. |
+| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` | Deep blue to seafoam gradient. Ocean-themed spinners ("charting currents", "sounding the depth"). Trident ASCII art banner. |
+| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` | Light grays with stark contrast. Boulder-themed spinners ("pushing uphill", "resetting the boulder", "enduring the loop"). Boulder-and-hill ASCII art banner. |
+| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` | Warm burnt orange to ember gradient. Fire-themed spinners ("banking into the draft", "measuring burn"). Dragon-silhouette ASCII art banner. |

-## What a skin can customize
+## Complete list of configurable keys

-| Area | Keys |
-|------|------|
-| Banner + response colors | `colors.banner_*`, `colors.response_border` |
-| Spinner animation | `spinner.waiting_faces`, `spinner.thinking_faces`, `spinner.thinking_verbs`, `spinner.wings` |
-| Branding text | `branding.agent_name`, `branding.welcome`, `branding.response_label`, `branding.prompt_symbol` |
-| Tool activity prefix | `tool_prefix` |
+### Colors (`colors:`)
+
+Controls all color values throughout the CLI. Values are hex color strings.
+
+| Key | Description | Default (`default` skin) |
+|-----|-------------|--------------------------|
+| `banner_border` | Panel border around the startup banner | `#CD7F32` (bronze) |
+| `banner_title` | Title text color in the banner | `#FFD700` (gold) |
+| `banner_accent` | Section headers in the banner (Available Tools, etc.) | `#FFBF00` (amber) |
+| `banner_dim` | Muted text in the banner (separators, secondary labels) | `#B8860B` (dark goldenrod) |
+| `banner_text` | Body text in the banner (tool names, skill names) | `#FFF8DC` (cornsilk) |
+| `ui_accent` | General UI accent color (highlights, active elements) | `#FFBF00` |
+| `ui_label` | UI labels and tags | `#4dd0e1` (teal) |
+| `ui_ok` | Success indicators (checkmarks, completion) | `#4caf50` (green) |
+| `ui_error` | Error indicators (failures, blocked) | `#ef5350` (red) |
+| `ui_warn` | Warning indicators (caution, approval prompts) | `#ffa726` (orange) |
+| `prompt` | Interactive prompt text color | `#FFF8DC` |
+| `input_rule` | Horizontal rule above the input area | `#CD7F32` |
+| `response_border` | Border around the agent's response box (ANSI escape) | `#FFD700` |
+| `session_label` | Session label color | `#DAA520` |
+| `session_border` | Session ID dim border color | `#8B8682` |
+
+### Spinner (`spinner:`)
+
+Controls the animated spinner shown while waiting for API responses.
+
+| Key | Type | Description | Example |
+|-----|------|-------------|---------|
+| `waiting_faces` | list of strings | Faces cycled while waiting for API response | `["(⚔)", "(⛨)", "(▲)"]` |
+| `thinking_faces` | list of strings | Faces cycled during model reasoning | `["(⚔)", "(⌁)", "(<>)"]` |
+| `thinking_verbs` | list of strings | Verbs shown in spinner messages | `["forging", "plotting", "hammering plans"]` |
+| `wings` | list of [left, right] pairs | Decorative brackets around the spinner | `[["⟪⚔", "⚔⟫"], ["⟪▲", "▲⟫"]]` |
+
+When spinner values are empty (like in `default` and `mono`), hardcoded defaults from `display.py` are used.
+
+### Branding (`branding:`)
+
+Text strings used throughout the CLI interface.
+
+| Key | Description | Default |
+|-----|-------------|---------|
+| `agent_name` | Name shown in banner title and status display | `Hermes Agent` |
+| `welcome` | Welcome message shown at CLI startup | `Welcome to Hermes Agent! Type your message or /help for commands.` |
+| `goodbye` | Message shown on exit | `Goodbye! ⚕` |
+| `response_label` | Label on the response box header | ` ⚕ Hermes ` |
+| `prompt_symbol` | Symbol before the user input prompt | `❯ ` |
+| `help_header` | Header text for the `/help` command output | `(^_^)? Available Commands` |
+
+### Other top-level keys
+
+| Key | Type | Description | Default |
+|-----|------|-------------|---------|
+| `tool_prefix` | string | Character prefixed to tool output lines in the CLI | `┊` |
+| `tool_emojis` | dict | Per-tool emoji overrides for spinners and progress (`{tool_name: emoji}`) | `{}` |
+| `banner_logo` | string | Rich-markup ASCII art logo (replaces the default HERMES_AGENT banner) | `""` |
+| `banner_hero` | string | Rich-markup hero art (replaces the default caduceus art) | `""` |

 ## Custom skins

-Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin.
+Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin, so you only need to specify the keys you want to change.
+
+### Full custom skin YAML template
+
+```yaml
+# ~/.hermes/skins/mytheme.yaml
+# Complete skin template — all keys shown. Delete any you don't need;
+# missing values automatically inherit from the 'default' skin.
+
+name: mytheme
+description: My custom theme
+
+colors:
+  banner_border: "#CD7F32"
+  banner_title: "#FFD700"
+  banner_accent: "#FFBF00"
+  banner_dim: "#B8860B"
+  banner_text: "#FFF8DC"
+  ui_accent: "#FFBF00"
+  ui_label: "#4dd0e1"
+  ui_ok: "#4caf50"
+  ui_error: "#ef5350"
+  ui_warn: "#ffa726"
+  prompt: "#FFF8DC"
+  input_rule: "#CD7F32"
+  response_border: "#FFD700"
+  session_label: "#DAA520"
+  session_border: "#8B8682"
+
+spinner:
+  waiting_faces:
+    - "(⚔)"
+    - "(⛨)"
+    - "(▲)"
+  thinking_faces:
+    - "(⚔)"
+    - "(⌁)"
+    - "(<>)"
+  thinking_verbs:
+    - "processing"
+    - "analyzing"
+    - "computing"
+    - "evaluating"
+  wings:
+    - ["⟪⚡", "⚡⟫"]
+    - ["⟪●", "●⟫"]
+
+branding:
+  agent_name: "My Agent"
+  welcome: "Welcome to My Agent! Type your message or /help for commands."
+  goodbye: "See you later! ⚡"
+  response_label: " ⚡ My Agent "
+  prompt_symbol: "⚡ ❯ "
+  help_header: "(⚡) Available Commands"
+
+tool_prefix: "┊"
+
+# Per-tool emoji overrides (optional)
+tool_emojis:
+  terminal: "⚔"
+  web_search: "🔮"
+  read_file: "📄"
+
+# Custom ASCII art banners (optional, Rich markup supported)
+# banner_logo: |
+#   [bold #FFD700] MY AGENT [/]
+# banner_hero: |
+#   [#FFD700]  Custom art here  [/]
+```
+
+### Minimal custom skin example
+
+Since everything inherits from `default`, a minimal skin only needs to change what's different:

 ```yaml
 name: cyberpunk
@ -78,4 +200,7 @@ tool_prefix: "▏"

 - Built-in skins load from `hermes_cli/skin_engine.py`.
 - Unknown skins automatically fall back to `default`.
- `/skin` updates the active CLI theme immediately for the current session.
+- `/skin` updates the active CLI theme immediately for the current session.
+- User skins in `~/.hermes/skins/` take precedence over built-in skins with the same name.
+- Skin changes via `/skin` are session-only. To make a skin your permanent default, set it in `config.yaml`.
+- The `banner_logo` and `banner_hero` fields support Rich console markup (e.g., `[bold #FF0000]text[/]`) for colored ASCII art.
--- a/website/docs/user-guide/git-worktrees.md
+++ b/website/docs/user-guide/git-worktrees.md
@ -1,5 +1,6 @@
 ---
-sidebar_position: 9
+sidebar_position: 3
+sidebar_label: "Git Worktrees"
 title: "Git Worktrees"
 description: "Run multiple Hermes agents safely on the same repository using git worktrees and isolated checkouts"
 ---
--- a/website/docs/user-guide/messaging/discord.md
+++ b/website/docs/user-guide/messaging/discord.md
@ -19,6 +19,7 @@ Before setup, here's the part most people want to know: how Hermes behaves once
 | **Free-response channels** | You can make specific channels mention-free with `DISCORD_FREE_RESPONSE_CHANNELS`, or disable mentions globally with `DISCORD_REQUIRE_MENTION=false`. |
 | **Threads** | Hermes replies in the same thread. Mention rules still apply unless that thread or its parent channel is configured as free-response. Threads stay isolated from the parent channel for session history. |
 | **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel for safety and clarity. Two people talking in the same channel do not share one transcript unless you explicitly disable that. |
+| **Messages mentioning other users** | When `DISCORD_IGNORE_NO_MENTION` is `true` (the default), Hermes stays silent if a message @mentions other users but does **not** mention the bot. This prevents the bot from jumping into conversations directed at other people. Set to `false` if you want the bot to respond to all messages regardless of who is mentioned. This only applies in server channels, not DMs. |

 :::tip
 If you want a normal bot-help channel where people can talk to Hermes without tagging it every time, add that channel to `DISCORD_FREE_RESPONSE_CHANNELS`.
@ -253,6 +254,9 @@ DISCORD_ALLOWED_USERS=284102345871466496

 # Optional: channels where bot responds without @mention (comma-separated channel IDs)
 # DISCORD_FREE_RESPONSE_CHANNELS=1234567890,9876543210
+
+# Optional: ignore messages that @mention other users but NOT the bot (default: true)
+# DISCORD_IGNORE_NO_MENTION=true
 ```

 Optional behavior settings in `~/.hermes/config.yaml`:
--- a/website/docs/user-guide/messaging/feishu.md
+++ b/website/docs/user-guide/messaging/feishu.md
@ -18,7 +18,7 @@ The integration supports both connection modes:
 | Context | Behavior |
 |---------|----------|
 | Direct messages | Hermes responds to every message. |
-| Group chats | Hermes responds when the bot is addressed in the chat. |
+| Group chats | Hermes responds only when the bot is @mentioned in the chat. |
 | Shared group chats | By default, session history is isolated per user inside a shared chat. |

 This shared-chat behavior is controlled by `config.yaml`:
@ -46,12 +46,16 @@ Keep the App Secret private. Anyone with it can impersonate your app.

 ### Recommended: WebSocket mode

-Use WebSocket mode when Hermes runs on your laptop, workstation, or a private server. No public URL is required.
+Use WebSocket mode when Hermes runs on your laptop, workstation, or a private server. No public URL is required. The official Lark SDK opens and maintains a persistent outbound WebSocket connection with automatic reconnection.

 ```bash
 FEISHU_CONNECTION_MODE=websocket
 ```

+**Requirements:** The `websockets` Python package must be installed. The SDK handles connection lifecycle, heartbeats, and auto-reconnection internally.
+
+**How it works:** The adapter runs the Lark SDK's WebSocket client in a background executor thread. Inbound events (messages, reactions, card actions) are dispatched to the main asyncio loop. On disconnect, the SDK will attempt to reconnect automatically.
+
 ### Optional: Webhook mode

 Use webhook mode only when you already run Hermes behind a reachable HTTP endpoint.
@ -60,12 +64,24 @@ Use webhook mode only when you already run Hermes behind a reachable HTTP endpoi
 FEISHU_CONNECTION_MODE=webhook
 ```

-In webhook mode, Hermes serves a Feishu endpoint at:
+In webhook mode, Hermes starts an HTTP server (via `aiohttp`) and serves a Feishu endpoint at:

 ```text
 /feishu/webhook
 ```

+**Requirements:** The `aiohttp` Python package must be installed.
+
+You can customize the webhook server bind address and path:
+
+```bash
+FEISHU_WEBHOOK_HOST=127.0.0.1   # default: 127.0.0.1
+FEISHU_WEBHOOK_PORT=8765         # default: 8765
+FEISHU_WEBHOOK_PATH=/feishu/webhook  # default: /feishu/webhook
+```
+
+When Feishu sends a URL verification challenge (`type: url_verification`), the webhook responds automatically so you can complete the subscription setup in the Feishu developer console.
+
 ## Step 3: Configure Hermes

 ### Option A: Interactive Setup
@ -116,13 +132,233 @@ FEISHU_HOME_CHANNEL=oc_xxx

 ## Security

-For production use, set an allowlist:
+### User Allowlist
+
+For production use, set an allowlist of Feishu Open IDs:

 ```bash
 FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy
 ```

-If you leave the allowlist empty, anyone who can reach the bot may be able to use it.
+If you leave the allowlist empty, anyone who can reach the bot may be able to use it. In group chats, the allowlist is checked against the sender's open_id before the message is processed.
+
+### Webhook Encryption Key
+
+When running in webhook mode, set an encryption key to enable signature verification of inbound webhook payloads:
+
+```bash
+FEISHU_ENCRYPT_KEY=your-encrypt-key
+```
+
+This key is found in the **Event Subscriptions** section of your Feishu app configuration. When set, the adapter verifies every webhook request using the signature algorithm:
+
+```
+SHA256(timestamp + nonce + encrypt_key + body)
+```
+
+The computed hash is compared against the `x-lark-signature` header using timing-safe comparison. Requests with invalid or missing signatures are rejected with HTTP 401.
+
+:::tip
+In WebSocket mode, signature verification is handled by the SDK itself, so `FEISHU_ENCRYPT_KEY` is optional. In webhook mode, it is strongly recommended for production.
+:::
+
+### Verification Token
+
+An additional layer of authentication that checks the `token` field inside webhook payloads:
+
+```bash
+FEISHU_VERIFICATION_TOKEN=your-verification-token
+```
+
+This token is also found in the **Event Subscriptions** section of your Feishu app. When set, every inbound webhook payload must contain a matching `token` in its `header` object. Mismatched tokens are rejected with HTTP 401.
+
+Both `FEISHU_ENCRYPT_KEY` and `FEISHU_VERIFICATION_TOKEN` can be used together for defense in depth.
+
+## Group Message Policy
+
+The `FEISHU_GROUP_POLICY` environment variable controls whether and how Hermes responds in group chats:
+
+```bash
+FEISHU_GROUP_POLICY=allowlist   # default
+```
+
+| Value | Behavior |
+|-------|----------|
+| `open` | Hermes responds to @mentions from any user in any group. |
+| `allowlist` | Hermes only responds to @mentions from users listed in `FEISHU_ALLOWED_USERS`. |
+| `disabled` | Hermes ignores all group messages entirely. |
+
+In all modes, the bot must be explicitly @mentioned (or @all) in the group before the message is processed. Direct messages bypass this gate.
+
+### Bot Identity for @Mention Gating
+
+For precise @mention detection in groups, the adapter needs to know the bot's identity. It can be provided explicitly:
+
+```bash
+FEISHU_BOT_OPEN_ID=ou_xxx
+FEISHU_BOT_USER_ID=xxx
+FEISHU_BOT_NAME=MyBot
+```
+
+If none of these are set, the adapter will attempt to auto-discover the bot name via the Application Info API on startup. For this to work, grant the `admin:app.info:readonly` or `application:application:self_manage` permission scope.
+
+## Interactive Card Actions
+
+When users click buttons or interact with interactive cards sent by the bot, the adapter routes these as synthetic `/card` command events:
+
+- Button clicks become: `/card button {"key": "value", ...}`
+- The action's `value` payload from the card definition is included as JSON.
+- Card actions are deduplicated with a 15-minute window to prevent double processing.
+
+Card action events are dispatched with `MessageType.COMMAND`, so they flow through the normal command processing pipeline.
+
+To use this feature, enable the **Interactive Card** event in your Feishu app's event subscriptions (`card.action.trigger`).
+
+## Media Support
+
+### Inbound (receiving)
+
+The adapter receives and caches the following media types from users:
+
+| Type | Extensions | How it's processed |
+|------|-----------|-------------------|
+| **Images** | .jpg, .jpeg, .png, .gif, .webp, .bmp | Downloaded via Feishu API and cached locally |
+| **Audio** | .ogg, .mp3, .wav, .m4a, .aac, .flac, .opus, .webm | Downloaded and cached; small text files are auto-extracted |
+| **Video** | .mp4, .mov, .avi, .mkv, .webm, .m4v, .3gp | Downloaded and cached as documents |
+| **Files** | .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, and more | Downloaded and cached as documents |
+
+Media from rich-text (post) messages, including inline images and file attachments, is also extracted and cached.
+
+For small text-based documents (.txt, .md), the file content is automatically injected into the message text so the agent can read it directly without needing tools.
+
+### Outbound (sending)
+
+| Method | What it sends |
+|--------|--------------|
+| `send` | Text or rich post messages (auto-detected based on markdown content) |
+| `send_image` / `send_image_file` | Uploads image to Feishu, then sends as native image bubble (with optional caption) |
+| `send_document` | Uploads file to Feishu API, then sends as file attachment |
+| `send_voice` | Uploads audio file as a Feishu file attachment |
+| `send_video` | Uploads video and sends as native media message |
+| `send_animation` | GIFs are downgraded to file attachments (Feishu has no native GIF bubble) |
+
+File upload routing is automatic based on extension:
+
+- `.ogg`, `.opus` → uploaded as `opus` audio
+- `.mp4`, `.mov`, `.avi`, `.m4v` → uploaded as `mp4` media
+- `.pdf`, `.doc(x)`, `.xls(x)`, `.ppt(x)` → uploaded with their document type
+- Everything else → uploaded as a generic stream file
+
+## Markdown Rendering and Post Fallback
+
+When outbound text contains markdown formatting (headings, bold, lists, code blocks, links, etc.), the adapter automatically sends it as a Feishu **post** message with an embedded `md` tag rather than as plain text. This enables rich rendering in the Feishu client.
+
+If the Feishu API rejects the post payload (e.g., due to unsupported markdown constructs), the adapter automatically falls back to sending as plain text with markdown stripped. This two-stage fallback ensures messages are always delivered.
+
+Plain text messages (no markdown detected) are sent as the simple `text` message type.
+
+## ACK Emoji Reactions
+
+When the adapter receives an inbound message, it immediately adds an ✅ (OK) emoji reaction to signal that the message was received and is being processed. This provides visual feedback before the agent completes its response.
+
+The reaction is persistent — it remains on the message after the response is sent, serving as a receipt marker.
+
+User reactions on bot messages are also tracked. If a user adds or removes an emoji reaction on a message sent by the bot, it is routed as a synthetic text event (`reaction:added:EMOJI_TYPE` or `reaction:removed:EMOJI_TYPE`) so the agent can respond to feedback.
+
+## Burst Protection and Batching
+
+The adapter includes debouncing for rapid message bursts to avoid overwhelming the agent:
+
+### Text Batching
+
+When a user sends multiple text messages in quick succession, they are merged into a single event before being dispatched:
+
+| Setting | Env Var | Default |
+|---------|---------|---------|
+| Quiet period | `HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS` | 0.6s |
+| Max messages per batch | `HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES` | 8 |
+| Max characters per batch | `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | 4000 |
+
+### Media Batching
+
+Multiple media attachments sent in quick succession (e.g., dragging several images) are merged into a single event:
+
+| Setting | Env Var | Default |
+|---------|---------|---------|
+| Quiet period | `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | 0.8s |
+
+### Per-Chat Serialization
+
+Messages within the same chat are processed serially (one at a time) to maintain conversation coherence. Each chat has its own lock, so messages in different chats are processed concurrently.
+
+## Rate Limiting (Webhook Mode)
+
+In webhook mode, the adapter enforces per-IP rate limiting to protect against abuse:
+
+- **Window:** 60-second sliding window
+- **Limit:** 120 requests per window per (app_id, path, IP) triple
+- **Tracking cap:** Up to 4096 unique keys tracked (prevents unbounded memory growth)
+
+Requests that exceed the limit receive HTTP 429 (Too Many Requests).
+
+### Webhook Anomaly Tracking
+
+The adapter tracks consecutive error responses per IP address. After 25 consecutive errors from the same IP within a 6-hour window, a warning is logged. This helps detect misconfigured clients or probing attempts.
+
+Additional webhook protections:
+- **Body size limit:** 1 MB maximum
+- **Body read timeout:** 30 seconds
+- **Content-Type enforcement:** Only `application/json` is accepted
+
+## Deduplication
+
+Inbound messages are deduplicated using message IDs with a 24-hour TTL. The dedup state is persisted across restarts to `~/.hermes/feishu_seen_message_ids.json`.
+
+| Setting | Env Var | Default |
+|---------|---------|---------|
+| Cache size | `HERMES_FEISHU_DEDUP_CACHE_SIZE` | 2048 entries |
+
+## All Environment Variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `FEISHU_APP_ID` | ✅ | — | Feishu/Lark App ID |
+| `FEISHU_APP_SECRET` | ✅ | — | Feishu/Lark App Secret |
+| `FEISHU_DOMAIN` | — | `feishu` | `feishu` (China) or `lark` (international) |
+| `FEISHU_CONNECTION_MODE` | — | `websocket` | `websocket` or `webhook` |
+| `FEISHU_ALLOWED_USERS` | — | _(empty)_ | Comma-separated open_id list for user allowlist |
+| `FEISHU_HOME_CHANNEL` | — | — | Chat ID for cron/notification output |
+| `FEISHU_ENCRYPT_KEY` | — | _(empty)_ | Encrypt key for webhook signature verification |
+| `FEISHU_VERIFICATION_TOKEN` | — | _(empty)_ | Verification token for webhook payload auth |
+| `FEISHU_GROUP_POLICY` | — | `allowlist` | Group message policy: `open`, `allowlist`, `disabled` |
+| `FEISHU_BOT_OPEN_ID` | — | _(empty)_ | Bot's open_id (for @mention detection) |
+| `FEISHU_BOT_USER_ID` | — | _(empty)_ | Bot's user_id (for @mention detection) |
+| `FEISHU_BOT_NAME` | — | _(empty)_ | Bot's display name (for @mention detection) |
+| `FEISHU_WEBHOOK_HOST` | — | `127.0.0.1` | Webhook server bind address |
+| `FEISHU_WEBHOOK_PORT` | — | `8765` | Webhook server port |
+| `FEISHU_WEBHOOK_PATH` | — | `/feishu/webhook` | Webhook endpoint path |
+| `HERMES_FEISHU_DEDUP_CACHE_SIZE` | — | `2048` | Max deduplicated message IDs to track |
+| `HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS` | — | `0.6` | Text burst debounce quiet period |
+| `HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES` | — | `8` | Max messages merged per text batch |
+| `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | — | `4000` | Max characters merged per text batch |
+| `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | — | `0.8` | Media burst debounce quiet period |
+
+## Troubleshooting
+
+| Problem | Fix |
+|---------|-----|
+| `lark-oapi not installed` | Install the SDK: `pip install lark-oapi` |
+| `websockets not installed; websocket mode unavailable` | Install websockets: `pip install websockets` |
+| `aiohttp not installed; webhook mode unavailable` | Install aiohttp: `pip install aiohttp` |
+| `FEISHU_APP_ID or FEISHU_APP_SECRET not set` | Set both env vars or configure via `hermes gateway setup` |
+| `Another local Hermes gateway is already using this Feishu app_id` | Only one Hermes instance can use the same app_id at a time. Stop the other gateway first. |
+| Bot doesn't respond in groups | Ensure the bot is @mentioned, check `FEISHU_GROUP_POLICY`, and verify the sender is in `FEISHU_ALLOWED_USERS` if policy is `allowlist` |
+| `Webhook rejected: invalid verification token` | Ensure `FEISHU_VERIFICATION_TOKEN` matches the token in your Feishu app's Event Subscriptions config |
+| `Webhook rejected: invalid signature` | Ensure `FEISHU_ENCRYPT_KEY` matches the encrypt key in your Feishu app config |
+| Post messages show as plain text | The Feishu API rejected the post payload; this is normal fallback behavior. Check logs for details. |
+| Images/files not received by bot | Grant `im:message` and `im:resource` permission scopes to your Feishu app |
+| Bot identity not auto-detected | Grant `admin:app.info:readonly` scope, or set `FEISHU_BOT_OPEN_ID` / `FEISHU_BOT_NAME` manually |
+| `Webhook rate limit exceeded` | More than 120 requests/minute from the same IP. This is usually a misconfiguration or loop. |

 ## Toolset

--- a/website/docs/user-guide/messaging/index.md
+++ b/website/docs/user-guide/messaging/index.md
@ -10,6 +10,26 @@ Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Ho

 For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).

+## Platform Comparison
+
+| Platform | Voice | Images | Files | Threads | Reactions | Typing | Streaming |
+|----------|:-----:|:------:|:-----:|:-------:|:---------:|:------:|:---------:|
+| Telegram | ✅ | ✅ | ✅ | ✅ | — | ✅ | ✅ |
+| Discord | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Slack | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| WhatsApp | — | ✅ | ✅ | — | — | ✅ | ✅ |
+| Signal | — | ✅ | ✅ | — | — | ✅ | ✅ |
+| SMS | — | — | — | — | — | — | — |
+| Email | — | ✅ | ✅ | ✅ | — | — | — |
+| Home Assistant | — | — | — | — | — | — | — |
+| Mattermost | ✅ | ✅ | ✅ | ✅ | — | ✅ | ✅ |
+| Matrix | ✅ | ✅ | ✅ | ✅ | — | ✅ | ✅ |
+| DingTalk | — | — | — | — | — | ✅ | ✅ |
+| Feishu/Lark | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| WeCom | ✅ | ✅ | ✅ | — | — | ✅ | ✅ |
+
+**Voice** = TTS audio replies and/or voice message transcription. **Images** = send/receive images. **Files** = send/receive file attachments. **Threads** = threaded conversations. **Reactions** = emoji reactions on messages. **Typing** = typing indicator while processing. **Streaming** = progressive message updates via editing.
+
 ## Architecture

 ```mermaid
--- a/website/docs/user-guide/messaging/matrix.md
+++ b/website/docs/user-guide/messaging/matrix.md
@ -352,3 +352,4 @@ For more information on securing your Hermes Agent deployment, see the [Security
 - **Federation**: If you're on a federated homeserver, the bot can communicate with users from other servers — just add their full `@user:server` IDs to `MATRIX_ALLOWED_USERS`.
 - **Auto-join**: The bot automatically accepts room invites and joins. It starts responding immediately after joining.
 - **Media support**: Hermes can send and receive images, audio, video, and file attachments. Media is uploaded to your homeserver using the Matrix content repository API.
+- **Native voice messages (MSC3245)**: The Matrix adapter automatically tags outgoing voice messages with the `org.matrix.msc3245.voice` flag. This means TTS responses and voice audio are rendered as **native voice bubbles** in Element and other clients that support MSC3245, rather than as generic audio file attachments. Incoming voice messages with the MSC3245 flag are also correctly identified and routed to speech-to-text transcription. No configuration is needed — this works automatically.
--- a/website/docs/user-guide/messaging/open-webui.md
+++ b/website/docs/user-guide/messaging/open-webui.md
@ -147,12 +147,16 @@ When you send a message in Open WebUI:
 1. Open WebUI sends a `POST /v1/chat/completions` request with your message and conversation history
 2. Hermes Agent creates an AIAgent instance with its full toolset
 3. The agent processes your request — it may call tools (terminal, file operations, web search, etc.)
-4. Tool calls happen invisibly server-side
-5. The agent's final text response is returned to Open WebUI
+4. As tools execute, **inline progress messages stream to the UI** so you can see what the agent is doing (e.g. `` `💻 ls -la` ``, `` `🔍 Python 3.12 release` ``)
+5. The agent's final text response streams back to Open WebUI
 6. Open WebUI displays the response in its chat interface

 Your agent has access to all the same tools and capabilities as when using the CLI or Telegram — the only difference is the frontend.

+:::tip Tool Progress
+With streaming enabled (the default), you'll see brief inline indicators as tools run — the tool emoji and its key argument. These appear in the response stream before the agent's final answer, giving you visibility into what's happening behind the scenes.
+:::
+
 ## Configuration Reference

 ### Hermes Agent (API server)
--- a/website/docs/user-guide/messaging/slack.md
+++ b/website/docs/user-guide/messaging/slack.md
@ -237,6 +237,60 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).

 ---

+## Multi-Workspace Support
+
+Hermes can connect to **multiple Slack workspaces** simultaneously using a single gateway instance. Each workspace is authenticated independently with its own bot user ID.
+
+### Configuration
+
+Provide multiple bot tokens as a **comma-separated list** in `SLACK_BOT_TOKEN`:
+
+```bash
+# Multiple bot tokens — one per workspace
+SLACK_BOT_TOKEN=xoxb-workspace1-token,xoxb-workspace2-token,xoxb-workspace3-token
+
+# A single app-level token is still used for Socket Mode
+SLACK_APP_TOKEN=xapp-your-app-token
+```
+
+Or in `~/.hermes/config.yaml`:
+
+```yaml
+platforms:
+  slack:
+    token: "xoxb-workspace1-token,xoxb-workspace2-token"
+```
+
+### OAuth Token File
+
+In addition to tokens in the environment or config, Hermes also loads tokens from an **OAuth token file** at:
+
+```
+~/.hermes/platforms/slack/slack_tokens.json
+```
+
+This file is a JSON object mapping team IDs to token entries:
+
+```json
+{
+  "T01ABC2DEF3": {
+    "token": "xoxb-workspace-token-here",
+    "team_name": "My Workspace"
+  }
+}
+```
+
+Tokens from this file are merged with any tokens specified via `SLACK_BOT_TOKEN`. Duplicate tokens are automatically deduplicated.
+
+### How it works
+
+- The **first token** in the list is the primary token, used for the Socket Mode connection (AsyncApp).
+- Each token is authenticated via `auth.test` on startup. The gateway maps each `team_id` to its own `WebClient` and `bot_user_id`.
+- When a message arrives, Hermes uses the correct workspace-specific client to respond.
+- The primary `bot_user_id` (from the first token) is used for backward compatibility with features that expect a single bot identity.
+
+---
+
 ## Voice Messages

 Hermes supports voice on Slack:
--- a/website/docs/user-guide/messaging/sms.md
+++ b/website/docs/user-guide/messaging/sms.md
@ -1,5 +1,6 @@
 ---
 sidebar_position: 8
+sidebar_label: "SMS (Twilio)"
 title: "SMS (Twilio)"
 description: "Set up Hermes Agent as an SMS chatbot via Twilio"
 ---
--- a/website/docs/user-guide/messaging/telegram.md
+++ b/website/docs/user-guide/messaging/telegram.md
@ -112,6 +112,66 @@ hermes gateway

 The bot should come online within seconds. Send it a message on Telegram to verify.

+## Webhook Mode
+
+By default, Hermes connects to Telegram using **long polling** — the gateway makes outbound requests to Telegram's servers to fetch new updates. This works well for local and always-on deployments.
+
+For **cloud deployments** (Fly.io, Railway, Render, etc.), **webhook mode** is more cost-effective. These platforms can auto-wake suspended machines on inbound HTTP traffic, but not on outbound connections. Since polling is outbound, a polling bot can never sleep. Webhook mode flips the direction — Telegram pushes updates to your bot's HTTPS URL, enabling sleep-when-idle deployments.
+
+| | Polling (default) | Webhook |
+|---|---|---|
+| Direction | Gateway → Telegram (outbound) | Telegram → Gateway (inbound) |
+| Best for | Local, always-on servers | Cloud platforms with auto-wake |
+| Setup | No extra config | Set `TELEGRAM_WEBHOOK_URL` |
+| Idle cost | Machine must stay running | Machine can sleep between messages |
+
+### Configuration
+
+Add the following to `~/.hermes/.env`:
+
+```bash
+TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
+# TELEGRAM_WEBHOOK_PORT=8443        # optional, default 8443
+# TELEGRAM_WEBHOOK_SECRET=mysecret  # optional, recommended
+```
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `TELEGRAM_WEBHOOK_URL` | Yes | Public HTTPS URL where Telegram will send updates. The URL path is auto-extracted (e.g., `/telegram` from the example above). |
+| `TELEGRAM_WEBHOOK_PORT` | No | Local port the webhook server listens on (default: `8443`). |
+| `TELEGRAM_WEBHOOK_SECRET` | No | Secret token for verifying that updates actually come from Telegram. **Strongly recommended** for production deployments. |
+
+When `TELEGRAM_WEBHOOK_URL` is set, the gateway starts an HTTP webhook server instead of polling. When unset, polling mode is used — no behavior change from previous versions.
+
+### Cloud deployment example (Fly.io)
+
+1. Add the env vars to your Fly.io app secrets:
+
+```bash
+fly secrets set TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
+fly secrets set TELEGRAM_WEBHOOK_SECRET=$(openssl rand -hex 32)
+```
+
+2. Expose the webhook port in your `fly.toml`:
+
+```toml
+[[services]]
+  internal_port = 8443
+  protocol = "tcp"
+
+  [[services.ports]]
+    handlers = ["tls", "http"]
+    port = 443
+```
+
+3. Deploy:
+
+```bash
+fly deploy
+```
+
+The gateway log should show: `[telegram] Connected to Telegram (webhook mode)`.
+
 ## Home Channel

 Use the `/sethome` command in any Telegram chat (DM or group) to designate it as the **home channel**. Scheduled tasks (cron jobs) deliver their results to this channel.
@ -258,6 +318,73 @@ Topics created outside of the config (e.g., by manually calling the Telegram API
 - **Privacy policy:** Telegram now requires bots to have a privacy policy. Set one via BotFather with `/setprivacy_policy`, or Telegram may auto-generate a placeholder. This is particularly important if your bot is public-facing.
 - **Message streaming:** Bot API 9.x added support for streaming long responses, which can improve perceived latency for lengthy agent replies.

+## Webhook Mode
+
+By default, the Telegram adapter connects via **long polling** — the gateway makes outbound connections to Telegram's servers. This works everywhere but keeps a persistent connection open.
+
+**Webhook mode** is an alternative where Telegram pushes updates to your server over HTTPS. This is ideal for **serverless and cloud deployments** (Fly.io, Railway, etc.) where inbound HTTP can wake a suspended machine.
+
+### Configuration
+
+Set the `TELEGRAM_WEBHOOK_URL` environment variable to enable webhook mode:
+
+```bash
+# Required — your public HTTPS endpoint
+TELEGRAM_WEBHOOK_URL=https://app.fly.dev/telegram
+
+# Optional — local listen port (default: 8443)
+TELEGRAM_WEBHOOK_PORT=8443
+
+# Optional — secret token for update verification (auto-generated if not set)
+TELEGRAM_WEBHOOK_SECRET=my-secret-token
+```
+
+Or in `~/.hermes/config.yaml`:
+
+```yaml
+telegram:
+  webhook_mode: true
+```
+
+When `TELEGRAM_WEBHOOK_URL` is set, the gateway starts an HTTP server listening on `0.0.0.0:<port>` and registers the webhook URL with Telegram. The URL path is extracted from the webhook URL (defaults to `/telegram`).
+
+:::warning
+Telegram requires a **valid TLS certificate** on the webhook endpoint. Self-signed certificates will be rejected. Use a reverse proxy (nginx, Caddy) or a platform that provides TLS termination (Fly.io, Railway, Cloudflare Tunnel).
+:::
+
+## DNS-over-HTTPS Fallback IPs
+
+In some restricted networks, `api.telegram.org` may resolve to an IP that is unreachable. The Telegram adapter includes a **fallback IP** mechanism that transparently retries connections against alternative IPs while preserving the correct TLS hostname and SNI.
+
+### How it works
+
+1. If `TELEGRAM_FALLBACK_IPS` is set, those IPs are used directly.
+2. Otherwise, the adapter automatically queries **Google DNS** and **Cloudflare DNS** via DNS-over-HTTPS (DoH) to discover alternative IPs for `api.telegram.org`.
+3. IPs returned by DoH that differ from the system DNS result are used as fallbacks.
+4. If DoH is also blocked, a hardcoded seed IP (`149.154.167.220`) is used as a last resort.
+5. Once a fallback IP succeeds, it becomes "sticky" — subsequent requests use it directly without retrying the primary path first.
+
+### Configuration
+
+```bash
+# Explicit fallback IPs (comma-separated)
+TELEGRAM_FALLBACK_IPS=149.154.167.220,149.154.167.221
+```
+
+Or in `~/.hermes/config.yaml`:
+
+```yaml
+platforms:
+  telegram:
+    extra:
+      fallback_ips:
+        - "149.154.167.220"
+```
+
+:::tip
+You usually don't need to configure this manually. The auto-discovery via DoH handles most restricted-network scenarios. The `TELEGRAM_FALLBACK_IPS` env var is only needed if DoH is also blocked on your network.
+:::
+
 ## Troubleshooting

 | Problem | Solution |
@ -268,6 +395,7 @@ Topics created outside of the config (e.g., by manually calling the Telegram API
 | Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
 | Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
 | Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
+| Webhook not receiving updates | Verify `TELEGRAM_WEBHOOK_URL` is publicly reachable (test with `curl`). Ensure your platform/reverse proxy routes inbound HTTPS traffic from the URL's port to the local listen port configured by `TELEGRAM_WEBHOOK_PORT` (they do not need to be the same number). Ensure SSL/TLS is active — Telegram only sends to HTTPS URLs. Check firewall rules. |

 ## Exec Approval

--- a/website/docs/user-guide/messaging/wecom.md
+++ b/website/docs/user-guide/messaging/wecom.md
@ -13,6 +13,7 @@ Connect Hermes to [WeCom](https://work.weixin.qq.com/) (企业微信), Tencent's
 - A WeCom organization account
 - An AI Bot created in the WeCom Admin Console
 - The Bot ID and Secret from the bot's credentials page
+- Python packages: `aiohttp` and `httpx`

 ## Setup

@ -56,10 +57,12 @@ hermes gateway start

 - **WebSocket transport** — persistent connection, no public endpoint needed
 - **DM and group messaging** — configurable access policies
+- **Per-group sender allowlists** — fine-grained control over who can interact in each group
 - **Media support** — images, files, voice, video upload and download
 - **AES-encrypted media** — automatic decryption for inbound attachments
 - **Quote context** — preserves reply threading
 - **Markdown rendering** — rich text responses
+- **Reply-mode streaming** — correlates responses to inbound message context
 - **Auto-reconnect** — exponential backoff on connection drops

 ## Configuration Options
@ -75,12 +78,187 @@ Set these in `config.yaml` under `platforms.wecom.extra`:
 | `group_policy` | `open` | Group access: `open`, `allowlist`, `disabled` |
 | `allow_from` | `[]` | User IDs allowed for DMs (when dm_policy=allowlist) |
 | `group_allow_from` | `[]` | Group IDs allowed (when group_policy=allowlist) |
+| `groups` | `{}` | Per-group configuration (see below) |
+
+## Access Policies
+
+### DM Policy
+
+Controls who can send direct messages to the bot:
+
+| Value | Behavior |
+|-------|----------|
+| `open` | Anyone can DM the bot (default) |
+| `allowlist` | Only user IDs in `allow_from` can DM |
+| `disabled` | All DMs are ignored |
+| `pairing` | Pairing mode (for initial setup) |
+
+```bash
+WECOM_DM_POLICY=allowlist
+```
+
+### Group Policy
+
+Controls which groups the bot responds in:
+
+| Value | Behavior |
+|-------|----------|
+| `open` | Bot responds in all groups (default) |
+| `allowlist` | Bot only responds in group IDs listed in `group_allow_from` |
+| `disabled` | All group messages are ignored |
+
+```bash
+WECOM_GROUP_POLICY=allowlist
+```
+
+### Per-Group Sender Allowlists
+
+For fine-grained control, you can restrict which users are allowed to interact with the bot within specific groups. This is configured in `config.yaml`:
+
+```yaml
+platforms:
+  wecom:
+    enabled: true
+    extra:
+      bot_id: "your-bot-id"
+      secret: "your-secret"
+      group_policy: "allowlist"
+      group_allow_from:
+        - "group_id_1"
+        - "group_id_2"
+      groups:
+        group_id_1:
+          allow_from:
+            - "user_alice"
+            - "user_bob"
+        group_id_2:
+          allow_from:
+            - "user_charlie"
+        "*":
+          allow_from:
+            - "user_admin"
+```
+
+**How it works:**
+
+1. The `group_policy` and `group_allow_from` controls determine whether a group is allowed at all.
+2. If a group passes the top-level check, the `groups.<group_id>.allow_from` list (if present) further restricts which senders within that group can interact with the bot.
+3. A wildcard `"*"` group entry serves as a default for groups not explicitly listed.
+4. Allowlist entries support the `*` wildcard to allow all users, and entries are case-insensitive.
+5. Entries can optionally use the `wecom:user:` or `wecom:group:` prefix format — the prefix is stripped automatically.
+
+If no `allow_from` is configured for a group, all users in that group are allowed (assuming the group itself passes the top-level policy check).
+
+## Media Support
+
+### Inbound (receiving)
+
+The adapter receives media attachments from users and caches them locally for agent processing:
+
+| Type | How it's handled |
+|------|-----------------|
+| **Images** | Downloaded and cached locally. Supports both URL-based and base64-encoded images. |
+| **Files** | Downloaded and cached. Filename is preserved from the original message. |
+| **Voice** | Voice message text transcription is extracted if available. |
+| **Mixed messages** | WeCom mixed-type messages (text + images) are parsed and all components extracted. |
+
+**Quoted messages:** Media from quoted (replied-to) messages is also extracted, so the agent has context about what the user is replying to.
+
+### AES-Encrypted Media Decryption
+
+WeCom encrypts some inbound media attachments with AES-256-CBC. The adapter handles this automatically:
+
+- When an inbound media item includes an `aeskey` field, the adapter downloads the encrypted bytes and decrypts them using AES-256-CBC with PKCS#7 padding.
+- The AES key is the base64-decoded value of the `aeskey` field (must be exactly 32 bytes).
+- The IV is derived from the first 16 bytes of the key.
+- This requires the `cryptography` Python package (`pip install cryptography`).
+
+No configuration is needed — decryption happens transparently when encrypted media is received.
+
+### Outbound (sending)
+
+| Method | What it sends | Size limit |
+|--------|--------------|------------|
+| `send` | Markdown text messages | 4000 chars |
+| `send_image` / `send_image_file` | Native image messages | 10 MB |
+| `send_document` | File attachments | 20 MB |
+| `send_voice` | Voice messages (AMR format only for native voice) | 2 MB |
+| `send_video` | Video messages | 10 MB |
+
+**Chunked upload:** Files are uploaded in 512 KB chunks through a three-step protocol (init → chunks → finish). The adapter handles this automatically.
+
+**Automatic downgrade:** When media exceeds the native type's size limit but is under the absolute 20 MB file limit, it is automatically sent as a generic file attachment instead:
+
+- Images > 10 MB → sent as file
+- Videos > 10 MB → sent as file
+- Voice > 2 MB → sent as file
+- Non-AMR audio → sent as file (WeCom only supports AMR for native voice)
+
+Files exceeding the absolute 20 MB limit are rejected with an informational message sent to the chat.
+
+## Reply-Mode Stream Responses
+
+When the bot receives a message via the WeCom callback, the adapter remembers the inbound request ID. If a response is sent while the request context is still active, the adapter uses WeCom's reply-mode (`aibot_respond_msg`) with streaming to correlate the response directly to the inbound message. This provides a more natural conversation experience in the WeCom client.
+
+If the inbound request context has expired or is unavailable, the adapter falls back to proactive message sending via `aibot_send_msg`.
+
+Reply-mode also works for media: uploaded media can be sent as a reply to the originating message.
+
+## Connection and Reconnection
+
+The adapter maintains a persistent WebSocket connection to WeCom's gateway at `wss://openws.work.weixin.qq.com`.
+
+### Connection Lifecycle
+
+1. **Connect:** Opens a WebSocket connection and sends an `aibot_subscribe` authentication frame with the bot_id and secret.
+2. **Heartbeat:** Sends application-level ping frames every 30 seconds to keep the connection alive.
+3. **Listen:** Continuously reads inbound frames and dispatches message callbacks.
+
+### Reconnection Behavior
+
+On connection loss, the adapter uses exponential backoff to reconnect:
+
+| Attempt | Delay |
+|---------|-------|
+| 1st retry | 2 seconds |
+| 2nd retry | 5 seconds |
+| 3rd retry | 10 seconds |
+| 4th retry | 30 seconds |
+| 5th+ retry | 60 seconds |
+
+After each successful reconnection, the backoff counter resets to zero. All pending request futures are failed on disconnect so callers don't hang indefinitely.
+
+### Deduplication
+
+Inbound messages are deduplicated using message IDs with a 5-minute window and a maximum cache of 1000 entries. This prevents double-processing of messages during reconnection or network hiccups.
+
+## All Environment Variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `WECOM_BOT_ID` | ✅ | — | WeCom AI Bot ID |
+| `WECOM_SECRET` | ✅ | — | WeCom AI Bot Secret |
+| `WECOM_ALLOWED_USERS` | — | _(empty)_ | Comma-separated user IDs for the gateway-level allowlist |
+| `WECOM_HOME_CHANNEL` | — | — | Chat ID for cron/notification output |
+| `WECOM_WEBSOCKET_URL` | — | `wss://openws.work.weixin.qq.com` | WebSocket gateway URL |
+| `WECOM_DM_POLICY` | — | `open` | DM access policy |
+| `WECOM_GROUP_POLICY` | — | `open` | Group access policy |

 ## Troubleshooting

 | Problem | Fix |
 |---------|-----|
-| "WECOM_BOT_ID and WECOM_SECRET are required" | Set both env vars or configure in setup wizard |
-| "invalid secret (errcode=40013)" | Verify the secret matches your bot's credentials |
-| "Timed out waiting for subscribe acknowledgement" | Check network connectivity to `openws.work.weixin.qq.com` |
-| Bot doesn't respond in groups | Check `group_policy` setting and group allowlist |
+| `WECOM_BOT_ID and WECOM_SECRET are required` | Set both env vars or configure in setup wizard |
+| `WeCom startup failed: aiohttp not installed` | Install aiohttp: `pip install aiohttp` |
+| `WeCom startup failed: httpx not installed` | Install httpx: `pip install httpx` |
+| `invalid secret (errcode=40013)` | Verify the secret matches your bot's credentials |
+| `Timed out waiting for subscribe acknowledgement` | Check network connectivity to `openws.work.weixin.qq.com` |
+| Bot doesn't respond in groups | Check `group_policy` setting and ensure the group ID is in `group_allow_from` |
+| Bot ignores certain users in a group | Check per-group `allow_from` lists in the `groups` config section |
+| Media decryption fails | Install `cryptography`: `pip install cryptography` |
+| `cryptography is required for WeCom media decryption` | The inbound media is AES-encrypted. Install: `pip install cryptography` |
+| Voice messages sent as files | WeCom only supports AMR format for native voice. Other formats are auto-downgraded to file. |
+| `File too large` error | WeCom has a 20 MB absolute limit on all file uploads. Compress or split the file. |
+| Images sent as files | Images > 10 MB exceed the native image limit and are auto-downgraded to file attachments. |
+| `Timeout sending message to WeCom` | The WebSocket may have disconnected. Check logs for reconnection messages. |
+| `WeCom websocket closed during authentication` | Network issue or incorrect credentials. Verify bot_id and secret. |
--- a/website/docs/user-guide/messaging/whatsapp.md
+++ b/website/docs/user-guide/messaging/whatsapp.md
@ -94,9 +94,20 @@ Add the following to your `~/.hermes/.env` file:
 # Required
 WHATSAPP_ENABLED=true
 WHATSAPP_MODE=bot                          # "bot" or "self-chat"
+
+# Access control — pick ONE of these options:
 WHATSAPP_ALLOWED_USERS=15551234567         # Comma-separated phone numbers (with country code, no +)
+# WHATSAPP_ALLOWED_USERS=*                 # OR use * to allow everyone
+# WHATSAPP_ALLOW_ALL_USERS=true            # OR set this flag instead (same effect as *)
 ```

+:::tip Allow-all shorthand
+Setting `WHATSAPP_ALLOWED_USERS=*` allows **all** senders (equivalent to `WHATSAPP_ALLOW_ALL_USERS=true`).
+This is consistent with [Signal group allowlists](/docs/reference/environment-variables).
+To use the pairing flow instead, remove both variables and rely on the
+[DM pairing system](/docs/user-guide/security#dm-pairing-system).
+:::
+
 Optional behavior settings in `~/.hermes/config.yaml`:

 ```yaml
@ -174,7 +185,7 @@ whatsapp:
 | **Bridge crashes or reconnect loops** | Restart the gateway, update Hermes, and re-pair if the session was invalidated by a WhatsApp protocol change. |
 | **Bot stops working after WhatsApp update** | Update Hermes to get the latest bridge version, then re-pair. |
 | **macOS: "Node.js not installed" but node works in terminal** | launchd services don't inherit your shell PATH. Run `hermes gateway install` to re-snapshot your current PATH into the plist, then `hermes gateway start`. See the [Gateway Service docs](./index.md#macos-launchd) for details. |
-| **Messages not being received** | Verify `WHATSAPP_ALLOWED_USERS` includes the sender's number (with country code, no `+` or spaces). |
+| **Messages not being received** | Verify `WHATSAPP_ALLOWED_USERS` includes the sender's number (with country code, no `+` or spaces), or set it to `*` to allow everyone. Set `WHATSAPP_DEBUG=true` in `.env` and restart the gateway to see raw message events in `bridge.log`. |
 | **Bot replies to strangers with a pairing code** | Set `whatsapp.unauthorized_dm_behavior: ignore` in `~/.hermes/config.yaml` if you want unauthorized DMs to be silently ignored instead. |

 ---
@ -182,9 +193,10 @@ whatsapp:
 ## Security

 :::warning
-**Always set `WHATSAPP_ALLOWED_USERS`** with phone numbers (including country code, without the `+`)
-of authorized users. Without this setting, the gateway will **deny all incoming messages** as a
-safety measure.
+**Configure access control** before going live. Set `WHATSAPP_ALLOWED_USERS` with specific
+phone numbers (including country code, without the `+`), use `*` to allow everyone, or set
+`WHATSAPP_ALLOW_ALL_USERS=true`. Without any of these, the gateway **denies all incoming
+messages** as a safety measure.
 :::

 By default, unauthorized DMs still receive a pairing code reply. If you want a private WhatsApp number to stay completely silent to strangers, set:
--- a/website/docs/user-guide/security.md
+++ b/website/docs/user-guide/security.md
@ -22,6 +22,61 @@ The security model has five layers:

 Before executing any command, Hermes checks it against a curated list of dangerous patterns. If a match is found, the user must explicitly approve it.

+### Approval Modes
+
+The approval system supports three modes, configured via `approvals.mode` in `~/.hermes/config.yaml`:
+
+```yaml
+approvals:
+  mode: manual    # manual | smart | off
+  timeout: 60     # seconds to wait for user response (default: 60)
+```
+
+| Mode | Behavior |
+|------|----------|
+| **manual** (default) | Always prompt the user for approval on dangerous commands |
+| **smart** | Use an auxiliary LLM to assess risk. Low-risk commands (e.g., `python -c "print('hello')"`) are auto-approved. Genuinely dangerous commands are auto-denied. Uncertain cases escalate to a manual prompt. |
+| **off** | Disable all approval checks — equivalent to running with `--yolo`. All commands execute without prompts. |
+
+:::warning
+Setting `approvals.mode: off` disables all safety prompts. Use only in trusted environments (CI/CD, containers, etc.).
+:::
+
+### YOLO Mode
+
+YOLO mode bypasses **all** dangerous command approval prompts for the current session. It can be activated three ways:
+
+1. **CLI flag**: Start a session with `hermes --yolo` or `hermes chat --yolo`
+2. **Slash command**: Type `/yolo` during a session to toggle it on/off
+3. **Environment variable**: Set `HERMES_YOLO_MODE=1`
+
+The `/yolo` command is a **toggle** — each use flips the mode on or off:
+
+```
+> /yolo
+  ⚡ YOLO mode ON — all commands auto-approved. Use with caution.
+
+> /yolo
+  ⚠ YOLO mode OFF — dangerous commands will require approval.
+```
+
+YOLO mode is available in both CLI and gateway sessions. Internally, it sets the `HERMES_YOLO_MODE` environment variable which is checked before every command execution.
+
+:::danger
+YOLO mode disables **all** dangerous command safety checks for the session. Use only when you fully trust the commands being generated (e.g., well-tested automation scripts in disposable environments).
+:::
+
+### Approval Timeout
+
+When a dangerous command prompt appears, the user has a configurable amount of time to respond. If no response is given within the timeout, the command is **denied** by default (fail-closed).
+
+Configure the timeout in `~/.hermes/config.yaml`:
+
+```yaml
+approvals:
+  timeout: 60  # seconds (default: 60)
+```
+
 ### What Triggers Approval

 The following patterns trigger approval prompts (defined in `tools/approval.py`):
@ -30,21 +85,32 @@ The following patterns trigger approval prompts (defined in `tools/approval.py`)
 |---------|-------------|
 | `rm -r` / `rm --recursive` | Recursive delete |
 | `rm ... /` | Delete in root path |
-| `chmod 777` | World-writable permissions |
+| `chmod 777/666` / `o+w` / `a+w` | World/other-writable permissions |
+| `chmod --recursive` with unsafe perms | Recursive world/other-writable (long flag) |
+| `chown -R root` / `chown --recursive root` | Recursive chown to root |
 | `mkfs` | Format filesystem |
 | `dd if=` | Disk copy |
+| `> /dev/sd` | Write to block device |
 | `DROP TABLE/DATABASE` | SQL DROP |
 | `DELETE FROM` (without WHERE) | SQL DELETE without WHERE |
 | `TRUNCATE TABLE` | SQL TRUNCATE |
 | `> /etc/` | Overwrite system config |
 | `systemctl stop/disable/mask` | Stop/disable system services |
 | `kill -9 -1` | Kill all processes |
-| `curl ... \| sh` | Pipe remote content to shell |
-| `bash -c`, `python -e` | Shell/script execution via flags |
-| `find -exec rm`, `find -delete` | Find with destructive actions |
+| `pkill -9` | Force kill processes |
 | Fork bomb patterns | Fork bombs |
+| `bash -c` / `sh -c` / `zsh -c` / `ksh -c` | Shell command execution via `-c` flag (including combined flags like `-lc`) |
+| `python -e` / `perl -e` / `ruby -e` / `node -c` | Script execution via `-e`/`-c` flag |
+| `curl ... \| sh` / `wget ... \| sh` | Pipe remote content to shell |
+| `bash <(curl ...)` / `sh <(wget ...)` | Execute remote script via process substitution |
+| `tee` to `/etc/`, `~/.ssh/`, `~/.hermes/.env` | Overwrite sensitive file via tee |
+| `>` / `>>` to `/etc/`, `~/.ssh/`, `~/.hermes/.env` | Overwrite sensitive file via redirection |
+| `xargs rm` | xargs with rm |
+| `find -exec rm` / `find -delete` | Find with destructive actions |
+| `cp`/`mv`/`install` to `/etc/` | Copy/move file into system config |
+| `sed -i` / `sed --in-place` on `/etc/` | In-place edit of system config |
 | `pkill`/`killall` hermes/gateway | Self-termination prevention |
-| `gateway run` with `&`/`disown`/`nohup` | Prevents starting gateway outside service manager |
+| `gateway run` with `&`/`disown`/`nohup`/`setsid` | Prevents starting gateway outside service manager |

 :::info
 **Container bypass**: When running in `docker`, `singularity`, `modal`, or `daytona` backends, dangerous command checks are **skipped** because the container itself is the security boundary. Destructive commands inside a container can't harm the host.
--- a/website/docs/user-guide/skills/godmode.md
+++ b/website/docs/user-guide/skills/godmode.md
@ -1,4 +1,6 @@
 ---
+sidebar_position: 1
+sidebar_label: "G0DM0D3 (Godmode)"
 title: "G0DM0D3 — Godmode Jailbreaking"
 description: "Automated LLM jailbreaking using G0DM0D3 techniques — system prompt templates, input obfuscation, and multi-model racing"
 ---