docs: comprehensive documentation update for recent features

New documentation:
- DingTalk messaging platform setup guide (dingtalk.md)

Updated existing docs:
- quickstart.md: add Alibaba Cloud, Kilo Code, Vercel AI Gateway to provider table
- configuration.md: add Alibaba Cloud provider, website blocklist config,
  light/dark theme mode, smart approvals (ask/smart/off)
- environment-variables.md: add Mattermost, Matrix, DingTalk, Browser Use,
  DashScope env vars
- browser.md: add Browser Use cloud provider, /browser connect CDP mode,
  multi-provider architecture, fix limitation section contradiction
- slash-commands.md: add /tools enable/disable/list, /browser connect/disconnect/status
- messaging/index.md: add DingTalk, Mattermost, Matrix to architecture diagram,
  platform toolset table, security allowlists, and Next Steps links
- security.md: add website access policy (blocklist) documentation
- sidebars.ts: add Mattermost, Matrix, DingTalk to Messaging Gateway sidebar
This commit is contained in:
Teknium 2026-03-17 03:42:02 -07:00 committed by GitHub
parent ba728f3e63
commit d9b9987ad3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 315 additions and 143 deletions

View file

@ -49,6 +49,9 @@ hermes setup # Or configure everything at once
| **Kimi / Moonshot** | Moonshot-hosted coding and chat models | Set `KIMI_API_KEY` |
| **MiniMax** | International MiniMax endpoint | Set `MINIMAX_API_KEY` |
| **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` |
| **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` |
| **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` |
| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
| **Custom Endpoint** | VLLM, SGLang, or any OpenAI-compatible API | Set base URL + API key |
:::tip

View file

@ -32,7 +32,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
| `KILOCODE_BASE_URL` | Override Kilo Code base URL (default: `https://api.kilo.ai/api/gateway`) |
| `ANTHROPIC_API_KEY` | Anthropic Console API key ([console.anthropic.com](https://console.anthropic.com/)) |
| `ANTHROPIC_TOKEN` | Manual or legacy Anthropic OAuth/setup-token override |
| `DASHSCOPE_API_KEY` | Alibaba Cloud DashScope API key for Qwen models via Anthropic-compatible API ([modelstudio.console.alibabacloud.com](https://modelstudio.console.alibabacloud.com/)) |
| `DASHSCOPE_API_KEY` | Alibaba Cloud DashScope API key for Qwen models ([modelstudio.console.alibabacloud.com](https://modelstudio.console.alibabacloud.com/)) |
| `DASHSCOPE_BASE_URL` | Custom DashScope base URL (default: international endpoint) |
| `CLAUDE_CODE_OAUTH_TOKEN` | Explicit Claude Code token override if you export one manually |
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
@ -64,6 +65,8 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
| `FIRECRAWL_API_URL` | Custom Firecrawl API endpoint for self-hosted instances (optional) |
| `BROWSERBASE_API_KEY` | Browser automation ([browserbase.com](https://browserbase.com/)) |
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
| `BROWSER_USE_API_KEY` | Browser Use cloud browser API key ([browser-use.com](https://browser-use.com/)) |
| `BROWSER_CDP_URL` | Chrome DevTools Protocol URL for local browser (set via `/browser connect`, e.g. `ws://localhost:9222`) |
| `BROWSER_INACTIVITY_TIMEOUT` | Browser session inactivity timeout in seconds |
| `FAL_KEY` | Image generation ([fal.ai](https://fal.ai/)) |
| `GROQ_API_KEY` | Groq Whisper STT API key ([groq.com](https://groq.com/)) |
@ -173,6 +176,19 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
| `EMAIL_ALLOW_ALL_USERS` | Allow all inbound email senders |
| `DINGTALK_CLIENT_ID` | DingTalk bot AppKey from developer portal ([open.dingtalk.com](https://open.dingtalk.com)) |
| `DINGTALK_CLIENT_SECRET` | DingTalk bot AppSecret from developer portal |
| `DINGTALK_ALLOWED_USERS` | Comma-separated DingTalk user IDs allowed to message the bot |
| `MATTERMOST_URL` | Mattermost server URL (e.g. `https://mm.example.com`) |
| `MATTERMOST_TOKEN` | Bot token or personal access token for Mattermost |
| `MATTERMOST_ALLOWED_USERS` | Comma-separated Mattermost user IDs allowed to message the bot |
| `MATTERMOST_HOME_CHANNEL` | Channel ID for proactive message delivery (cron, notifications) |
| `MATTERMOST_REPLY_MODE` | Reply style: `thread` (threaded replies) or `off` (flat messages, default) |
| `MATRIX_HOMESERVER` | Matrix homeserver URL (e.g. `https://matrix.org`) |
| `MATRIX_ACCESS_TOKEN` | Matrix access token for bot authentication |
| `MATRIX_USER_ID` | Matrix user ID (e.g. `@hermes:matrix.org`) — required for password login, optional with access token |
| `MATRIX_PASSWORD` | Matrix password (alternative to access token) |
| `MATRIX_ALLOWED_USERS` | Comma-separated Matrix user IDs allowed to message the bot (e.g. `@alice:matrix.org`) |
| `MATRIX_HOME_ROOM` | Room ID for proactive message delivery (e.g. `!abc123:matrix.org`) |
| `MATRIX_ENCRYPTION` | Enable end-to-end encryption (`true`/`false`, default: `false`) |
| `HASS_TOKEN` | Home Assistant Long-Lived Access Token (enables HA platform + tools) |
| `HASS_URL` | Home Assistant URL (default: `http://homeassistant.local:8123`) |
| `MESSAGING_CWD` | Working directory for terminal commands in messaging mode (default: `~`) |

View file

@ -52,8 +52,9 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
| Command | Description |
|---------|-------------|
| `/tools` | List available tools |
| `/tools [list\|disable\|enable] [name...]` | Manage tools: list available tools, or disable/enable specific tools for the current session. Disabling a tool removes it from the agent's toolset and triggers a session reset. |
| `/toolsets` | List available toolsets |
| `/browser [connect\|disconnect\|status]` | Manage local Chrome CDP connection. `connect` attaches browser tools to a running Chrome instance (default: `ws://localhost:9222`). `disconnect` detaches. `status` shows current connection. Auto-launches Chrome if no debugger is detected. |
| `/skills` | Search, install, inspect, or manage skills from online registries |
| `/cron` | Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove) |
| `/reload-mcp` | Reload MCP servers from config.yaml |
@ -118,7 +119,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
## Notes
- `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
- `/status`, `/stop`, `/sethome`, `/resume`, and `/update` are **messaging-only** commands.
- `/background`, `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.

View file

@ -72,6 +72,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro
| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`) |
| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
:::info Codex Note
@ -136,16 +137,20 @@ hermes chat --provider minimax --model MiniMax-Text-01
# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-Text-01
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
# Alibaba Cloud / DashScope (Qwen models)
hermes chat --provider alibaba --model qwen-plus
# Requires: DASHSCOPE_API_KEY in ~/.hermes/.env
```
Or set the provider permanently in `config.yaml`:
```yaml
model:
provider: "zai" # or: kimi-coding, minimax, minimax-cn
provider: "zai" # or: kimi-coding, minimax, minimax-cn, alibaba
default: "glm-4-plus"
```
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, or `MINIMAX_CN_BASE_URL` environment variables.
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
## Custom & Self-Hosted LLM Providers
@ -873,6 +878,7 @@ This controls both the `text_to_speech` tool and spoken replies in voice mode (`
display:
tool_progress: all # off | new | all | verbose
skin: default # Built-in or custom CLI skin (see user-guide/features/skins)
theme_mode: auto # auto | light | dark — color scheme for skin-aware rendering
personality: "kawaii" # Legacy cosmetic field still surfaced in some summaries
compact: false # Compact output mode (less whitespace)
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
@ -882,6 +888,18 @@ display:
background_process_notifications: all # all | result | error | off (gateway only)
```
### Theme mode
The `theme_mode` setting controls whether skins render in light or dark mode:
| Mode | Behavior |
|------|----------|
| `auto` (default) | Detects your terminal's background color automatically. Falls back to `dark` if detection fails. |
| `light` | Forces light-mode skin colors. Skins that define a `colors_light` override use those colors instead of the default dark-mode palette. |
| `dark` | Forces dark-mode skin colors. |
This works with any skin — built-in or custom. Skin authors can provide `colors_light` in their skin definition for optimal light-terminal appearance.
| Mode | What you see |
|------|-------------|
| `off` | Silent — just the final response |
@ -1056,6 +1074,54 @@ browser:
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
```
The browser toolset supports multiple providers. See the [Browser feature page](/docs/user-guide/features/browser) for details on Browserbase, Browser Use, and local Chrome CDP setup.
## Website Blocklist
Block specific domains from being accessed by the agent's web and browser tools:
```yaml
website_blocklist:
enabled: false # Enable URL blocking (default: false)
domains: # List of blocked domain patterns
- "*.internal.company.com"
- "admin.example.com"
- "*.local"
shared_files: # Load additional rules from external files
- "/etc/hermes/blocked-sites.txt"
```
When enabled, any URL matching a blocked domain pattern is rejected before the web or browser tool executes. This applies to `web_search`, `web_extract`, `browser_navigate`, and any tool that accesses URLs.
Domain rules support:
- Exact domains: `admin.example.com`
- Wildcard subdomains: `*.internal.company.com` (blocks all subdomains)
- TLD wildcards: `*.local`
Shared files contain one domain rule per line (blank lines and `#` comments are ignored). Missing or unreadable files log a warning but don't disable other web tools.
The policy is cached for 30 seconds, so config changes take effect quickly without restart.
## Smart Approvals
Control how Hermes handles potentially dangerous commands:
```yaml
approval_mode: ask # ask | smart | off
```
| Mode | Behavior |
|------|----------|
| `ask` (default) | Prompt the user before executing any flagged command. In the CLI, shows an interactive approval dialog. In messaging, queues a pending approval request. |
| `smart` | Use an auxiliary LLM to assess whether a flagged command is actually dangerous. Low-risk commands are auto-approved with session-level persistence. Genuinely risky commands are escalated to the user. |
| `off` | Skip all approval checks. Equivalent to `HERMES_YOLO_MODE=true`. **Use with caution.** |
Smart mode is particularly useful for reducing approval fatigue — it lets the agent work more autonomously on safe operations while still catching genuinely destructive commands.
:::warning
Setting `approval_mode: off` disables all safety checks for terminal commands. Only use this in trusted, sandboxed environments.
:::
## Checkpoints
Automatic filesystem snapshots before destructive file operations. See the [Checkpoints feature page](/docs/user-guide/features/checkpoints) for details.

View file

@ -1,27 +1,30 @@
---
title: Browser Automation
description: Control cloud browsers with Browserbase integration for web interaction, form filling, scraping, and more.
description: Control browsers with multiple providers, local Chrome via CDP, or cloud browsers for web interaction, form filling, scraping, and more.
sidebar_label: Browser
sidebar_position: 5
---
# Browser Automation
Hermes Agent includes a full browser automation toolset that can run in two modes:
Hermes Agent includes a full browser automation toolset with multiple backend options:
- **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
- **Browser Use cloud mode** via [Browser Use](https://browser-use.com) as an alternative cloud browser provider
- **Local Chrome via CDP** — connect browser tools to your own Chrome instance using `/browser connect`
- **Local browser mode** via the `agent-browser` CLI and a local Chromium installation
In both modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
## Overview
The browser tools use the `agent-browser` CLI. In Browserbase mode, `agent-browser` connects to Browserbase cloud sessions. In local mode, it drives a local Chromium installation. Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
Key capabilities:
- **Cloud execution** — no local browser needed
- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies
- **Multi-provider cloud execution** — Browserbase or Browser Use, no local browser needed
- **Local Chrome integration** — attach to your running Chrome via CDP for hands-on browsing
- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
- **Session isolation** — each task gets its own browser session
- **Automatic cleanup** — inactive sessions are closed after a timeout
- **Vision analysis** — screenshot + AI analysis for visual understanding
@ -40,9 +43,48 @@ BROWSERBASE_PROJECT_ID=your-project-id-here
Get your credentials at [browserbase.com](https://browserbase.com).
### Browser Use cloud mode
To use Browser Use as your cloud browser provider, add:
```bash
# Add to ~/.hermes/.env
BROWSER_USE_API_KEY=***
```
Get your API key at [browser-use.com](https://browser-use.com). Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.
### Local Chrome via CDP (`/browser connect`)
Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
In the CLI, use:
```
/browser connect # Connect to Chrome at ws://localhost:9222
/browser connect ws://host:port # Connect to a specific CDP endpoint
/browser status # Check current connection
/browser disconnect # Detach and return to cloud/local mode
```
If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with `--remote-debugging-port=9222`.
:::tip
To start Chrome manually with CDP enabled:
```bash
# Linux
google-chrome --remote-debugging-port=9222
# macOS
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
```
:::
When connected via CDP, all browser tools (`browser_navigate`, `browser_click`, etc.) operate on your live Chrome instance instead of spinning up a cloud session.
### Local browser mode
If you do **not** set Browserbase credentials, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
If you do **not** set any cloud credentials and don't use `/browser connect`, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
### Optional Environment Variables
@ -232,10 +274,8 @@ If paid features aren't available on your plan, Hermes automatically falls back
## Limitations
- **Requires Browserbase account** — no local browser fallback
- **Requires `agent-browser` CLI** — must be installed via npm
- **Text-based interaction** — relies on accessibility tree, not pixel coordinates
- **Snapshot size** — large pages may be truncated or LLM-summarized at 8000 characters
- **Session timeout**sessions expire based on your Browserbase plan settings
- **Cost**each session consumes Browserbase credits; use `browser_close` when done
- **Session timeout**cloud sessions expire based on your provider's plan settings
- **Cost**cloud sessions consume provider credits; use `browser_close` when done. Use `/browser connect` for free local browsing.
- **No file downloads** — cannot download files from the browser

View file

@ -1,178 +1,192 @@
---
sidebar_position: 10
title: "DingTalk"
description: "Set up Hermes Agent as a DingTalk bot using Stream Mode for real-time messaging"
description: "Set up Hermes Agent as a DingTalk chatbot"
---
# DingTalk Setup
Hermes connects to DingTalk through the [dingtalk-stream](https://pypi.org/project/dingtalk-stream/) SDK using Stream Mode — a WebSocket-based protocol that requires no public webhook URL. Messages arrive in real-time and responses are sent via the session webhook in markdown format.
Hermes Agent integrates with DingTalk (钉钉) as a chatbot, letting you chat with your AI assistant through direct messages or group chats. The bot connects via DingTalk's Stream Mode — a long-lived WebSocket connection that requires no public URL or webhook server — and replies using markdown-formatted messages through DingTalk's session webhook API.
DingTalk (钉钉) is Alibaba's enterprise communication platform with over 700 million registered users, making it the #1 business application in China. It combines messaging, video conferencing, task management, and workflow automation into a single platform used by millions of organizations.
Before setup, here's the part most people want to know: how Hermes behaves once it's in your DingTalk workspace.
:::info Dependencies
The DingTalk adapter requires additional Python packages:
## How Hermes Behaves
| Context | Behavior |
|---------|----------|
| **DMs (1:1 chat)** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
| **Group chats** | Hermes responds when you `@mention` it. Without a mention, Hermes ignores the message. |
| **Shared groups with multiple users** | By default, Hermes isolates session history per user inside the group. Two people talking in the same group do not share one transcript unless you explicitly disable that. |
### Session Model in DingTalk
By default:
- each DM gets its own session
- each user in a shared group chat gets their own session inside that group
This is controlled by `config.yaml`:
```yaml
group_sessions_per_user: true
```
Set it to `false` only if you explicitly want one shared conversation for the entire group:
```yaml
group_sessions_per_user: false
```
This guide walks you through the full setup process — from creating your DingTalk bot to sending your first message.
## Prerequisites
Install the required Python packages:
```bash
pip install dingtalk-stream httpx
```
`httpx` is already a core Hermes dependency, so in practice you only need `dingtalk-stream`.
- `dingtalk-stream` — DingTalk's official SDK for Stream Mode (WebSocket-based real-time messaging)
- `httpx` — async HTTP client used for sending replies via session webhooks
## Step 1: Create a DingTalk App
1. Go to the [DingTalk Developer Console](https://open-dev.dingtalk.com/).
2. Log in with your DingTalk admin account.
3. Click **Application Development****Custom Apps****Create App via H5 Micro-App** (or **Robot** depending on your console version).
4. Fill in:
- **App Name**: e.g., `Hermes Agent`
- **Description**: optional
5. After creating, navigate to **Credentials & Basic Info** to find your **Client ID** (AppKey) and **Client Secret** (AppSecret). Copy both.
:::warning[Credentials shown only once]
The Client Secret is only displayed once when you create the app. If you lose it, you'll need to regenerate it. Never share these credentials publicly or commit them to Git.
:::
---
## Step 2: Enable the Robot Capability
## Prerequisites
- **DingTalk developer account** — register at [open-dev.dingtalk.com](https://open-dev.dingtalk.com)
- **An application created** on the DingTalk Open Platform with Robot (机器人) capability enabled
---
## Step 1: Create a DingTalk Application
1. Go to [open-dev.dingtalk.com](https://open-dev.dingtalk.com) and log in
2. Click **Create Application** (创建应用)
3. Fill in the application name and description
4. Under **Capabilities** (添加能力), enable **Robot** (机器人)
5. In the Robot configuration:
- Enable **Stream Mode** (Stream 模式) — this is critical, as it eliminates the need for a public webhook URL
- Set the bot name and avatar
6. Navigate to **Credentials & Basic Info** (凭证与基本信息) to find:
- **AppKey** — this is your `DINGTALK_CLIENT_ID`
- **AppSecret** — this is your `DINGTALK_CLIENT_SECRET`
7. Publish the application (发布)
1. In your app's settings page, go to **Add Capability****Robot**.
2. Enable the robot capability.
3. Under **Message Reception Mode**, select **Stream Mode** (recommended — no public URL needed).
:::tip
Stream Mode is strongly recommended over the legacy HTTP webhook approach. It works behind firewalls, NATs, and requires no public IP or domain — the SDK maintains a persistent WebSocket connection to DingTalk's servers.
Stream Mode is the recommended setup. It uses a long-lived WebSocket connection initiated from your machine, so you don't need a public IP, domain name, or webhook endpoint. This works behind NAT, firewalls, and on local machines.
:::
---
## Step 3: Find Your DingTalk User ID
## Step 2: Configure Hermes
Hermes Agent uses your DingTalk User ID to control who can interact with the bot. DingTalk User IDs are alphanumeric strings set by your organization's admin.
The easiest way:
To find yours:
1. Ask your DingTalk organization admin — User IDs are configured in the DingTalk admin console under **Contacts****Members**.
2. Alternatively, the bot logs the `sender_id` for each incoming message. Start the gateway, send the bot a message, then check the logs for your ID.
## Step 4: Configure Hermes Agent
### Option A: Interactive Setup (Recommended)
Run the guided setup command:
```bash
hermes gateway setup
```
Select **DingTalk** from the platform menu. The wizard will:
Select **DingTalk** when prompted, then paste your Client ID, Client Secret, and allowed user IDs when asked.
1. Check if `dingtalk-stream` is installed
2. Prompt for your AppKey (Client ID)
3. Prompt for your AppSecret (Client Secret)
4. Configure allowed users and access policies
### Option B: Manual Configuration
### Manual Configuration
Add to `~/.hermes/.env`:
Add the following to your `~/.hermes/.env` file:
```bash
# Required
DINGTALK_CLIENT_ID=your-app-key
DINGTALK_CLIENT_SECRET=your-app-secret
# Security (recommended)
DINGTALK_ALLOWED_USERS=user1_staff_id,user2_staff_id # Comma-separated DingTalk staff IDs
# Security: restrict who can interact with the bot
DINGTALK_ALLOWED_USERS=user-id-1
# Optional
DINGTALK_HOME_CHANNEL=user1_staff_id # Default delivery target for cron jobs
# Multiple allowed users (comma-separated)
# DINGTALK_ALLOWED_USERS=user-id-1,user-id-2
```
Then start the gateway:
Optional behavior settings in `~/.hermes/config.yaml`:
```yaml
group_sessions_per_user: true
```
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared group chats
### Start the Gateway
Once configured, start the DingTalk gateway:
```bash
hermes gateway # Foreground
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux only: boot-time system service
hermes gateway
```
---
The bot should connect to DingTalk's Stream Mode within a few seconds. Send it a message — either a DM or in a group where it's been added — to test.
## Access Control
### DM Access
DM access follows the same pattern as all other Hermes platforms:
1. **`DINGTALK_ALLOWED_USERS` set** → only those users can message
2. **No allowlist set** → unknown users get a DM pairing code (approve via `hermes pairing approve dingtalk CODE`)
3. **`DINGTALK_ALLOW_ALL_USERS=true`** → anyone can message (use with caution)
### Group Access
In group chats, the bot responds when @mentioned. Group access follows the same rules — only allowed users can trigger the bot, even in groups.
---
## Features
### Stream Mode (No Webhook URL)
Unlike traditional bot platforms that require a publicly accessible webhook endpoint, DingTalk's Stream Mode uses a persistent WebSocket connection initiated from your side. This means:
- **No public IP required** — works behind firewalls and NATs
- **No domain or SSL certificate needed** — the SDK handles the connection
- **Automatic reconnection** — if the connection drops, the adapter reconnects with exponential backoff (2s → 5s → 10s → 30s → 60s)
### Markdown Replies
Responses are sent in DingTalk's markdown format, which supports rich text formatting including headers, bold, italic, links, and code blocks.
### DM and Group Chat
The adapter supports both:
- **Direct Messages (1:1)** — private conversations with the bot
- **Group Chat** — the bot responds when @mentioned in a group
### Message Deduplication
The adapter tracks recently processed message IDs (up to 1,000 messages within a 5-minute window) to prevent duplicate processing if DingTalk redelivers a message.
### Auto-Reconnection
If the WebSocket connection drops, the adapter automatically reconnects using exponential backoff:
- Retry intervals: 2s, 5s, 10s, 30s, 60s
- Reconnection is transparent — no manual intervention needed
---
:::tip
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
:::
## Troubleshooting
| Problem | Solution |
|---------|----------|
| **"dingtalk-stream not installed"** | Run `pip install dingtalk-stream httpx` in the Hermes environment |
| **"DINGTALK_CLIENT_ID not set"** | Set `DINGTALK_CLIENT_ID` and `DINGTALK_CLIENT_SECRET` in `~/.hermes/.env` |
| **Bot not responding** | Verify the application is published on open-dev.dingtalk.com and Stream Mode is enabled |
| **Connection keeps dropping** | Check network connectivity. The adapter will auto-reconnect with backoff. Check logs for specific error messages. |
| **Messages processed twice** | This is rare — the deduplication window handles most cases. If persistent, check that only one gateway instance is running. |
| **Bot responds to no one** | Configure `DINGTALK_ALLOWED_USERS`, use DM pairing, or explicitly allow all users through gateway policy if you want broader access. |
| **Group messages ignored** | Ensure the bot is @mentioned in group chats. Only @mentions trigger the bot in groups. |
### Bot is not responding to messages
---
**Cause**: The robot capability isn't enabled, or `DINGTALK_ALLOWED_USERS` doesn't include your User ID.
**Fix**: Verify the robot capability is enabled in your app settings and that Stream Mode is selected. Check that your User ID is in `DINGTALK_ALLOWED_USERS`. Restart the gateway.
### "dingtalk-stream not installed" error
**Cause**: The `dingtalk-stream` Python package is not installed.
**Fix**: Install it:
```bash
pip install dingtalk-stream httpx
```
### "DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET required"
**Cause**: The credentials aren't set in your environment or `.env` file.
**Fix**: Verify `DINGTALK_CLIENT_ID` and `DINGTALK_CLIENT_SECRET` are set correctly in `~/.hermes/.env`. The Client ID is your AppKey, and the Client Secret is your AppSecret from the DingTalk Developer Console.
### Stream disconnects / reconnection loops
**Cause**: Network instability, DingTalk platform maintenance, or credential issues.
**Fix**: The adapter automatically reconnects with exponential backoff (2s → 5s → 10s → 30s → 60s). Check that your credentials are valid and your app hasn't been deactivated. Verify your network allows outbound WebSocket connections.
### Bot is offline
**Cause**: The Hermes gateway isn't running, or it failed to connect.
**Fix**: Check that `hermes gateway` is running. Look at the terminal output for error messages. Common issues: wrong credentials, app deactivated, `dingtalk-stream` or `httpx` not installed.
### "No session_webhook available"
**Cause**: The bot tried to reply but doesn't have a session webhook URL. This typically happens if the webhook expired or the bot was restarted between receiving the message and sending the reply.
**Fix**: Send a new message to the bot — each incoming message provides a fresh session webhook for replies. This is a normal DingTalk limitation; the bot can only reply to messages it has received recently.
## Security
:::warning
**Always configure access controls.** The bot has terminal access by default. Without `DINGTALK_ALLOWED_USERS` or DM pairing, the gateway denies all incoming messages as a safety measure.
Always set `DINGTALK_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
:::
- Use DM pairing or explicit allowlists for safe onboarding of new users
- Keep your AppSecret confidential — treat it like a password
- The `DINGTALK_CLIENT_SECRET` in `~/.hermes/.env` should be readable only by the user running Hermes
- DingTalk's Stream Mode connection is encrypted via TLS
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
---
## Notes
## Environment Variables Reference
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `DINGTALK_CLIENT_ID` | Yes | — | DingTalk application AppKey |
| `DINGTALK_CLIENT_SECRET` | Yes | — | DingTalk application AppSecret |
| `DINGTALK_ALLOWED_USERS` | No | — | Comma-separated DingTalk staff IDs |
| `DINGTALK_ALLOW_ALL_USERS` | No | `false` | Allow all users (not recommended) |
| `DINGTALK_HOME_CHANNEL` | No | — | Default delivery target for cron jobs |
- **Stream Mode**: No public URL, domain name, or webhook server needed. The connection is initiated from your machine via WebSocket, so it works behind NAT and firewalls.
- **Markdown responses**: Replies are formatted in DingTalk's markdown format for rich text display.
- **Message deduplication**: The adapter deduplicates messages with a 5-minute window to prevent processing the same message twice.
- **Auto-reconnection**: If the stream connection drops, the adapter automatically reconnects with exponential backoff.
- **Message length limit**: Responses are capped at 20,000 characters per message. Longer responses are truncated.

View file

@ -1,12 +1,12 @@
---
sidebar_position: 1
title: "Messaging Gateway"
description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, DingTalk, Home Assistant, or your browser — architecture and setup overview"
description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Home Assistant, Mattermost, Matrix, DingTalk, or your browser — architecture and setup overview"
---
# Messaging Gateway
Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, DingTalk, Home Assistant, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Home Assistant, Mattermost, Matrix, DingTalk, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
@ -24,6 +24,8 @@ flowchart TB
sms[SMS]
em[Email]
ha[Home Assistant]
mm[Mattermost]
mx[Matrix]
dt[DingTalk]
end
@ -40,6 +42,8 @@ flowchart TB
sms --> store
em --> store
ha --> store
mm --> store
mx --> store
dt --> store
store --> agent
cron --> store
@ -133,9 +137,11 @@ Configure per-platform overrides in `~/.hermes/gateway.json`:
TELEGRAM_ALLOWED_USERS=123456789,987654321
DISCORD_ALLOWED_USERS=123456789012345678
SIGNAL_ALLOWED_USERS=+155****4567,+155****6543
DINGTALK_ALLOWED_USERS=staff_id_1,staff_id_2
SMS_ALLOWED_USERS=+155****4567,+155****6543
EMAIL_ALLOWED_USERS=trusted@example.com,colleague@work.com
MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c
MATRIX_ALLOWED_USERS=@alice:matrix.org
DINGTALK_ALLOWED_USERS=user-id-1
# Or allow
GATEWAY_ALLOWED_USERS=123456789,987654321
@ -296,8 +302,10 @@ Each platform has its own toolset:
| Signal | `hermes-signal` | Full tools including terminal |
| SMS | `hermes-sms` | Full tools including terminal |
| Email | `hermes-email` | Full tools including terminal |
| DingTalk | `hermes-dingtalk` | Full tools including terminal |
| Home Assistant | `hermes-homeassistant` | Full tools + HA device control (ha_list_entities, ha_get_state, ha_call_service, ha_list_services) |
| Mattermost | `hermes-mattermost` | Full tools including terminal |
| Matrix | `hermes-matrix` | Full tools including terminal |
| DingTalk | `hermes-dingtalk` | Full tools including terminal |
## Next Steps
@ -308,5 +316,7 @@ Each platform has its own toolset:
- [Signal Setup](signal.md)
- [SMS Setup (Twilio)](sms.md)
- [Email Setup](email.md)
- [DingTalk Setup](dingtalk.md)
- [Home Assistant Integration](homeassistant.md)
- [Mattermost Setup](mattermost.md)
- [Matrix Setup](matrix.md)
- [DingTalk Setup](dingtalk.md)

View file

@ -277,6 +277,25 @@ Error messages from MCP tools are sanitized before being returned to the LLM. Th
- Bearer tokens
- `token=`, `key=`, `API_KEY=`, `password=`, `secret=` parameters
### Website Access Policy
You can restrict which websites the agent can access through its web and browser tools. This is useful for preventing the agent from accessing internal services, admin panels, or other sensitive URLs.
```yaml
# In ~/.hermes/config.yaml
website_blocklist:
enabled: true
domains:
- "*.internal.company.com"
- "admin.example.com"
shared_files:
- "/etc/hermes/blocked-sites.txt"
```
When a blocked URL is requested, the tool returns an error explaining the domain is blocked by policy. The blocklist is enforced across `web_search`, `web_extract`, `browser_navigate`, and all URL-capable tools.
See [Website Blocklist](/docs/user-guide/configuration#website-blocklist) in the configuration guide for full details.
### Context File Injection Protection
Context files (AGENTS.md, .cursorrules, SOUL.md) are scanned for prompt injection before being included in the system prompt. The scanner checks for:

View file

@ -48,6 +48,9 @@ const sidebars: SidebarsConfig = {
'user-guide/messaging/signal',
'user-guide/messaging/email',
'user-guide/messaging/homeassistant',
'user-guide/messaging/mattermost',
'user-guide/messaging/matrix',
'user-guide/messaging/dingtalk',
],
},
{