docs: comprehensive documentation audit — fix 9 HIGH, 20+ MEDIUM gaps (#4087)

Reference docs fixes:
- cli-commands.md: remove non-existent --provider alibaba, add hermes
  profile/completion/plugins/mcp to top-level table, add --profile/-p
  global flag, add --source chat option
- slash-commands.md: add /yolo and /commands, fix /q alias conflict
  (resolves to /queue not /quit), add missing aliases (/bg, /set-home,
  /reload_mcp, /gateway)
- toolsets-reference.md: fix hermes-api-server (not same as hermes-cli,
  omits clarify/send_message/text_to_speech)
- profile-commands.md: fix show name required not optional, --clone-from
  not --from, add --remove/--name to alias, fix alias path, fix export/
  import arg types, remove non-existent fish completion
- tools-reference.md: add EXA_API_KEY to web tools requires_env
- mcp-config-reference.md: add auth key for OAuth, tool name sanitization
- environment-variables.md: add EXA_API_KEY, update provider values
- plugins.md: remove non-existent ctx.register_command(), add
  ctx.inject_message()

Feature docs additions:
- security.md: add /yolo mode, approval modes (manual/smart/off),
  configurable timeout, expanded dangerous patterns table
- cron.md: add wrap_response config, [SILENT] suppression
- mcp.md: add dynamic tool discovery, MCP sampling support
- cli.md: add Ctrl+Z suspend, busy_input_mode, tool_preview_length
- docker.md: add skills/credential file mounting

Messaging platform docs:
- telegram.md: add webhook mode, DoH fallback IPs
- slack.md: add multi-workspace OAuth support
- discord.md: add DISCORD_IGNORE_NO_MENTION
- matrix.md: add MSC3245 native voice messages
- feishu.md: expand from 129 to 365 lines (encrypt key, verification
  token, group policy, card actions, media, rate limiting, markdown,
  troubleshooting)
- wecom.md: expand from 86 to 264 lines (per-group allowlists, media,
  AES decryption, stream replies, reconnection, troubleshooting)

Configuration docs:
- quickstart.md: add DeepSeek, Copilot, Copilot ACP providers
- configuration.md: add DeepSeek provider, Exa web backend, terminal
  env_passthrough/images, browser.command_timeout, compression params,
  discord config, security/tirith config, timezone, auxiliary models

21 files changed, ~1000 lines added
This commit is contained in:
Teknium 2026-03-30 17:15:21 -07:00 committed by GitHub
parent 3c8f910973
commit 7e0c2c3ce3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
21 changed files with 1004 additions and 83 deletions

View file

@ -19,6 +19,7 @@ Before setup, here's the part most people want to know: how Hermes behaves once
| **Free-response channels** | You can make specific channels mention-free with `DISCORD_FREE_RESPONSE_CHANNELS`, or disable mentions globally with `DISCORD_REQUIRE_MENTION=false`. |
| **Threads** | Hermes replies in the same thread. Mention rules still apply unless that thread or its parent channel is configured as free-response. Threads stay isolated from the parent channel for session history. |
| **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel for safety and clarity. Two people talking in the same channel do not share one transcript unless you explicitly disable that. |
| **Messages mentioning other users** | When `DISCORD_IGNORE_NO_MENTION` is `true` (the default), Hermes stays silent if a message @mentions other users but does **not** mention the bot. This prevents the bot from jumping into conversations directed at other people. Set to `false` if you want the bot to respond to all messages regardless of who is mentioned. This only applies in server channels, not DMs. |
:::tip
If you want a normal bot-help channel where people can talk to Hermes without tagging it every time, add that channel to `DISCORD_FREE_RESPONSE_CHANNELS`.
@ -253,6 +254,9 @@ DISCORD_ALLOWED_USERS=284102345871466496
# Optional: channels where bot responds without @mention (comma-separated channel IDs)
# DISCORD_FREE_RESPONSE_CHANNELS=1234567890,9876543210
# Optional: ignore messages that @mention other users but NOT the bot (default: true)
# DISCORD_IGNORE_NO_MENTION=true
```
Optional behavior settings in `~/.hermes/config.yaml`:

View file

@ -18,7 +18,7 @@ The integration supports both connection modes:
| Context | Behavior |
|---------|----------|
| Direct messages | Hermes responds to every message. |
| Group chats | Hermes responds when the bot is addressed in the chat. |
| Group chats | Hermes responds only when the bot is @mentioned in the chat. |
| Shared group chats | By default, session history is isolated per user inside a shared chat. |
This shared-chat behavior is controlled by `config.yaml`:
@ -46,12 +46,16 @@ Keep the App Secret private. Anyone with it can impersonate your app.
### Recommended: WebSocket mode
Use WebSocket mode when Hermes runs on your laptop, workstation, or a private server. No public URL is required.
Use WebSocket mode when Hermes runs on your laptop, workstation, or a private server. No public URL is required. The official Lark SDK opens and maintains a persistent outbound WebSocket connection with automatic reconnection.
```bash
FEISHU_CONNECTION_MODE=websocket
```
**Requirements:** The `websockets` Python package must be installed. The SDK handles connection lifecycle, heartbeats, and auto-reconnection internally.
**How it works:** The adapter runs the Lark SDK's WebSocket client in a background executor thread. Inbound events (messages, reactions, card actions) are dispatched to the main asyncio loop. On disconnect, the SDK will attempt to reconnect automatically.
### Optional: Webhook mode
Use webhook mode only when you already run Hermes behind a reachable HTTP endpoint.
@ -60,12 +64,24 @@ Use webhook mode only when you already run Hermes behind a reachable HTTP endpoi
FEISHU_CONNECTION_MODE=webhook
```
In webhook mode, Hermes serves a Feishu endpoint at:
In webhook mode, Hermes starts an HTTP server (via `aiohttp`) and serves a Feishu endpoint at:
```text
/feishu/webhook
```
**Requirements:** The `aiohttp` Python package must be installed.
You can customize the webhook server bind address and path:
```bash
FEISHU_WEBHOOK_HOST=127.0.0.1 # default: 127.0.0.1
FEISHU_WEBHOOK_PORT=8765 # default: 8765
FEISHU_WEBHOOK_PATH=/feishu/webhook # default: /feishu/webhook
```
When Feishu sends a URL verification challenge (`type: url_verification`), the webhook responds automatically so you can complete the subscription setup in the Feishu developer console.
## Step 3: Configure Hermes
### Option A: Interactive Setup
@ -116,13 +132,233 @@ FEISHU_HOME_CHANNEL=oc_xxx
## Security
For production use, set an allowlist:
### User Allowlist
For production use, set an allowlist of Feishu Open IDs:
```bash
FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy
```
If you leave the allowlist empty, anyone who can reach the bot may be able to use it.
If you leave the allowlist empty, anyone who can reach the bot may be able to use it. In group chats, the allowlist is checked against the sender's open_id before the message is processed.
### Webhook Encryption Key
When running in webhook mode, set an encryption key to enable signature verification of inbound webhook payloads:
```bash
FEISHU_ENCRYPT_KEY=your-encrypt-key
```
This key is found in the **Event Subscriptions** section of your Feishu app configuration. When set, the adapter verifies every webhook request using the signature algorithm:
```
SHA256(timestamp + nonce + encrypt_key + body)
```
The computed hash is compared against the `x-lark-signature` header using timing-safe comparison. Requests with invalid or missing signatures are rejected with HTTP 401.
:::tip
In WebSocket mode, signature verification is handled by the SDK itself, so `FEISHU_ENCRYPT_KEY` is optional. In webhook mode, it is strongly recommended for production.
:::
### Verification Token
An additional layer of authentication that checks the `token` field inside webhook payloads:
```bash
FEISHU_VERIFICATION_TOKEN=your-verification-token
```
This token is also found in the **Event Subscriptions** section of your Feishu app. When set, every inbound webhook payload must contain a matching `token` in its `header` object. Mismatched tokens are rejected with HTTP 401.
Both `FEISHU_ENCRYPT_KEY` and `FEISHU_VERIFICATION_TOKEN` can be used together for defense in depth.
## Group Message Policy
The `FEISHU_GROUP_POLICY` environment variable controls whether and how Hermes responds in group chats:
```bash
FEISHU_GROUP_POLICY=allowlist # default
```
| Value | Behavior |
|-------|----------|
| `open` | Hermes responds to @mentions from any user in any group. |
| `allowlist` | Hermes only responds to @mentions from users listed in `FEISHU_ALLOWED_USERS`. |
| `disabled` | Hermes ignores all group messages entirely. |
In all modes, the bot must be explicitly @mentioned (or @all) in the group before the message is processed. Direct messages bypass this gate.
### Bot Identity for @Mention Gating
For precise @mention detection in groups, the adapter needs to know the bot's identity. It can be provided explicitly:
```bash
FEISHU_BOT_OPEN_ID=ou_xxx
FEISHU_BOT_USER_ID=xxx
FEISHU_BOT_NAME=MyBot
```
If none of these are set, the adapter will attempt to auto-discover the bot name via the Application Info API on startup. For this to work, grant the `admin:app.info:readonly` or `application:application:self_manage` permission scope.
## Interactive Card Actions
When users click buttons or interact with interactive cards sent by the bot, the adapter routes these as synthetic `/card` command events:
- Button clicks become: `/card button {"key": "value", ...}`
- The action's `value` payload from the card definition is included as JSON.
- Card actions are deduplicated with a 15-minute window to prevent double processing.
Card action events are dispatched with `MessageType.COMMAND`, so they flow through the normal command processing pipeline.
To use this feature, enable the **Interactive Card** event in your Feishu app's event subscriptions (`card.action.trigger`).
## Media Support
### Inbound (receiving)
The adapter receives and caches the following media types from users:
| Type | Extensions | How it's processed |
|------|-----------|-------------------|
| **Images** | .jpg, .jpeg, .png, .gif, .webp, .bmp | Downloaded via Feishu API and cached locally |
| **Audio** | .ogg, .mp3, .wav, .m4a, .aac, .flac, .opus, .webm | Downloaded and cached; small text files are auto-extracted |
| **Video** | .mp4, .mov, .avi, .mkv, .webm, .m4v, .3gp | Downloaded and cached as documents |
| **Files** | .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, and more | Downloaded and cached as documents |
Media from rich-text (post) messages, including inline images and file attachments, is also extracted and cached.
For small text-based documents (.txt, .md), the file content is automatically injected into the message text so the agent can read it directly without needing tools.
### Outbound (sending)
| Method | What it sends |
|--------|--------------|
| `send` | Text or rich post messages (auto-detected based on markdown content) |
| `send_image` / `send_image_file` | Uploads image to Feishu, then sends as native image bubble (with optional caption) |
| `send_document` | Uploads file to Feishu API, then sends as file attachment |
| `send_voice` | Uploads audio file as a Feishu file attachment |
| `send_video` | Uploads video and sends as native media message |
| `send_animation` | GIFs are downgraded to file attachments (Feishu has no native GIF bubble) |
File upload routing is automatic based on extension:
- `.ogg`, `.opus` → uploaded as `opus` audio
- `.mp4`, `.mov`, `.avi`, `.m4v` → uploaded as `mp4` media
- `.pdf`, `.doc(x)`, `.xls(x)`, `.ppt(x)` → uploaded with their document type
- Everything else → uploaded as a generic stream file
## Markdown Rendering and Post Fallback
When outbound text contains markdown formatting (headings, bold, lists, code blocks, links, etc.), the adapter automatically sends it as a Feishu **post** message with an embedded `md` tag rather than as plain text. This enables rich rendering in the Feishu client.
If the Feishu API rejects the post payload (e.g., due to unsupported markdown constructs), the adapter automatically falls back to sending as plain text with markdown stripped. This two-stage fallback ensures messages are always delivered.
Plain text messages (no markdown detected) are sent as the simple `text` message type.
## ACK Emoji Reactions
When the adapter receives an inbound message, it immediately adds an ✅ (OK) emoji reaction to signal that the message was received and is being processed. This provides visual feedback before the agent completes its response.
The reaction is persistent — it remains on the message after the response is sent, serving as a receipt marker.
User reactions on bot messages are also tracked. If a user adds or removes an emoji reaction on a message sent by the bot, it is routed as a synthetic text event (`reaction:added:EMOJI_TYPE` or `reaction:removed:EMOJI_TYPE`) so the agent can respond to feedback.
## Burst Protection and Batching
The adapter includes debouncing for rapid message bursts to avoid overwhelming the agent:
### Text Batching
When a user sends multiple text messages in quick succession, they are merged into a single event before being dispatched:
| Setting | Env Var | Default |
|---------|---------|---------|
| Quiet period | `HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS` | 0.6s |
| Max messages per batch | `HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES` | 8 |
| Max characters per batch | `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | 4000 |
### Media Batching
Multiple media attachments sent in quick succession (e.g., dragging several images) are merged into a single event:
| Setting | Env Var | Default |
|---------|---------|---------|
| Quiet period | `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | 0.8s |
### Per-Chat Serialization
Messages within the same chat are processed serially (one at a time) to maintain conversation coherence. Each chat has its own lock, so messages in different chats are processed concurrently.
## Rate Limiting (Webhook Mode)
In webhook mode, the adapter enforces per-IP rate limiting to protect against abuse:
- **Window:** 60-second sliding window
- **Limit:** 120 requests per window per (app_id, path, IP) triple
- **Tracking cap:** Up to 4096 unique keys tracked (prevents unbounded memory growth)
Requests that exceed the limit receive HTTP 429 (Too Many Requests).
### Webhook Anomaly Tracking
The adapter tracks consecutive error responses per IP address. After 25 consecutive errors from the same IP within a 6-hour window, a warning is logged. This helps detect misconfigured clients or probing attempts.
Additional webhook protections:
- **Body size limit:** 1 MB maximum
- **Body read timeout:** 30 seconds
- **Content-Type enforcement:** Only `application/json` is accepted
## Deduplication
Inbound messages are deduplicated using message IDs with a 24-hour TTL. The dedup state is persisted across restarts to `~/.hermes/feishu_seen_message_ids.json`.
| Setting | Env Var | Default |
|---------|---------|---------|
| Cache size | `HERMES_FEISHU_DEDUP_CACHE_SIZE` | 2048 entries |
## All Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `FEISHU_APP_ID` | ✅ | — | Feishu/Lark App ID |
| `FEISHU_APP_SECRET` | ✅ | — | Feishu/Lark App Secret |
| `FEISHU_DOMAIN` | — | `feishu` | `feishu` (China) or `lark` (international) |
| `FEISHU_CONNECTION_MODE` | — | `websocket` | `websocket` or `webhook` |
| `FEISHU_ALLOWED_USERS` | — | _(empty)_ | Comma-separated open_id list for user allowlist |
| `FEISHU_HOME_CHANNEL` | — | — | Chat ID for cron/notification output |
| `FEISHU_ENCRYPT_KEY` | — | _(empty)_ | Encrypt key for webhook signature verification |
| `FEISHU_VERIFICATION_TOKEN` | — | _(empty)_ | Verification token for webhook payload auth |
| `FEISHU_GROUP_POLICY` | — | `allowlist` | Group message policy: `open`, `allowlist`, `disabled` |
| `FEISHU_BOT_OPEN_ID` | — | _(empty)_ | Bot's open_id (for @mention detection) |
| `FEISHU_BOT_USER_ID` | — | _(empty)_ | Bot's user_id (for @mention detection) |
| `FEISHU_BOT_NAME` | — | _(empty)_ | Bot's display name (for @mention detection) |
| `FEISHU_WEBHOOK_HOST` | — | `127.0.0.1` | Webhook server bind address |
| `FEISHU_WEBHOOK_PORT` | — | `8765` | Webhook server port |
| `FEISHU_WEBHOOK_PATH` | — | `/feishu/webhook` | Webhook endpoint path |
| `HERMES_FEISHU_DEDUP_CACHE_SIZE` | — | `2048` | Max deduplicated message IDs to track |
| `HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS` | — | `0.6` | Text burst debounce quiet period |
| `HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES` | — | `8` | Max messages merged per text batch |
| `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | — | `4000` | Max characters merged per text batch |
| `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | — | `0.8` | Media burst debounce quiet period |
## Troubleshooting
| Problem | Fix |
|---------|-----|
| `lark-oapi not installed` | Install the SDK: `pip install lark-oapi` |
| `websockets not installed; websocket mode unavailable` | Install websockets: `pip install websockets` |
| `aiohttp not installed; webhook mode unavailable` | Install aiohttp: `pip install aiohttp` |
| `FEISHU_APP_ID or FEISHU_APP_SECRET not set` | Set both env vars or configure via `hermes gateway setup` |
| `Another local Hermes gateway is already using this Feishu app_id` | Only one Hermes instance can use the same app_id at a time. Stop the other gateway first. |
| Bot doesn't respond in groups | Ensure the bot is @mentioned, check `FEISHU_GROUP_POLICY`, and verify the sender is in `FEISHU_ALLOWED_USERS` if policy is `allowlist` |
| `Webhook rejected: invalid verification token` | Ensure `FEISHU_VERIFICATION_TOKEN` matches the token in your Feishu app's Event Subscriptions config |
| `Webhook rejected: invalid signature` | Ensure `FEISHU_ENCRYPT_KEY` matches the encrypt key in your Feishu app config |
| Post messages show as plain text | The Feishu API rejected the post payload; this is normal fallback behavior. Check logs for details. |
| Images/files not received by bot | Grant `im:message` and `im:resource` permission scopes to your Feishu app |
| Bot identity not auto-detected | Grant `admin:app.info:readonly` scope, or set `FEISHU_BOT_OPEN_ID` / `FEISHU_BOT_NAME` manually |
| `Webhook rate limit exceeded` | More than 120 requests/minute from the same IP. This is usually a misconfiguration or loop. |
## Toolset

View file

@ -352,3 +352,4 @@ For more information on securing your Hermes Agent deployment, see the [Security
- **Federation**: If you're on a federated homeserver, the bot can communicate with users from other servers — just add their full `@user:server` IDs to `MATRIX_ALLOWED_USERS`.
- **Auto-join**: The bot automatically accepts room invites and joins. It starts responding immediately after joining.
- **Media support**: Hermes can send and receive images, audio, video, and file attachments. Media is uploaded to your homeserver using the Matrix content repository API.
- **Native voice messages (MSC3245)**: The Matrix adapter automatically tags outgoing voice messages with the `org.matrix.msc3245.voice` flag. This means TTS responses and voice audio are rendered as **native voice bubbles** in Element and other clients that support MSC3245, rather than as generic audio file attachments. Incoming voice messages with the MSC3245 flag are also correctly identified and routed to speech-to-text transcription. No configuration is needed — this works automatically.

View file

@ -237,6 +237,60 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
---
## Multi-Workspace Support
Hermes can connect to **multiple Slack workspaces** simultaneously using a single gateway instance. Each workspace is authenticated independently with its own bot user ID.
### Configuration
Provide multiple bot tokens as a **comma-separated list** in `SLACK_BOT_TOKEN`:
```bash
# Multiple bot tokens — one per workspace
SLACK_BOT_TOKEN=xoxb-workspace1-token,xoxb-workspace2-token,xoxb-workspace3-token
# A single app-level token is still used for Socket Mode
SLACK_APP_TOKEN=xapp-your-app-token
```
Or in `~/.hermes/config.yaml`:
```yaml
platforms:
slack:
token: "xoxb-workspace1-token,xoxb-workspace2-token"
```
### OAuth Token File
In addition to tokens in the environment or config, Hermes also loads tokens from an **OAuth token file** at:
```
~/.hermes/platforms/slack/slack_tokens.json
```
This file is a JSON object mapping team IDs to token entries:
```json
{
"T01ABC2DEF3": {
"token": "xoxb-workspace-token-here",
"team_name": "My Workspace"
}
}
```
Tokens from this file are merged with any tokens specified via `SLACK_BOT_TOKEN`. Duplicate tokens are automatically deduplicated.
### How it works
- The **first token** in the list is the primary token, used for the Socket Mode connection (AsyncApp).
- Each token is authenticated via `auth.test` on startup. The gateway maps each `team_id` to its own `WebClient` and `bot_user_id`.
- When a message arrives, Hermes uses the correct workspace-specific client to respond.
- The primary `bot_user_id` (from the first token) is used for backward compatibility with features that expect a single bot identity.
---
## Voice Messages
Hermes supports voice on Slack:

View file

@ -258,6 +258,73 @@ Topics created outside of the config (e.g., by manually calling the Telegram API
- **Privacy policy:** Telegram now requires bots to have a privacy policy. Set one via BotFather with `/setprivacy_policy`, or Telegram may auto-generate a placeholder. This is particularly important if your bot is public-facing.
- **Message streaming:** Bot API 9.x added support for streaming long responses, which can improve perceived latency for lengthy agent replies.
## Webhook Mode
By default, the Telegram adapter connects via **long polling** — the gateway makes outbound connections to Telegram's servers. This works everywhere but keeps a persistent connection open.
**Webhook mode** is an alternative where Telegram pushes updates to your server over HTTPS. This is ideal for **serverless and cloud deployments** (Fly.io, Railway, etc.) where inbound HTTP can wake a suspended machine.
### Configuration
Set the `TELEGRAM_WEBHOOK_URL` environment variable to enable webhook mode:
```bash
# Required — your public HTTPS endpoint
TELEGRAM_WEBHOOK_URL=https://app.fly.dev/telegram
# Optional — local listen port (default: 8443)
TELEGRAM_WEBHOOK_PORT=8443
# Optional — secret token for update verification (auto-generated if not set)
TELEGRAM_WEBHOOK_SECRET=my-secret-token
```
Or in `~/.hermes/config.yaml`:
```yaml
telegram:
webhook_mode: true
```
When `TELEGRAM_WEBHOOK_URL` is set, the gateway starts an HTTP server listening on `0.0.0.0:<port>` and registers the webhook URL with Telegram. The URL path is extracted from the webhook URL (defaults to `/telegram`).
:::warning
Telegram requires a **valid TLS certificate** on the webhook endpoint. Self-signed certificates will be rejected. Use a reverse proxy (nginx, Caddy) or a platform that provides TLS termination (Fly.io, Railway, Cloudflare Tunnel).
:::
## DNS-over-HTTPS Fallback IPs
In some restricted networks, `api.telegram.org` may resolve to an IP that is unreachable. The Telegram adapter includes a **fallback IP** mechanism that transparently retries connections against alternative IPs while preserving the correct TLS hostname and SNI.
### How it works
1. If `TELEGRAM_FALLBACK_IPS` is set, those IPs are used directly.
2. Otherwise, the adapter automatically queries **Google DNS** and **Cloudflare DNS** via DNS-over-HTTPS (DoH) to discover alternative IPs for `api.telegram.org`.
3. IPs returned by DoH that differ from the system DNS result are used as fallbacks.
4. If DoH is also blocked, a hardcoded seed IP (`149.154.167.220`) is used as a last resort.
5. Once a fallback IP succeeds, it becomes "sticky" — subsequent requests use it directly without retrying the primary path first.
### Configuration
```bash
# Explicit fallback IPs (comma-separated)
TELEGRAM_FALLBACK_IPS=149.154.167.220,149.154.167.221
```
Or in `~/.hermes/config.yaml`:
```yaml
platforms:
telegram:
extra:
fallback_ips:
- "149.154.167.220"
```
:::tip
You usually don't need to configure this manually. The auto-discovery via DoH handles most restricted-network scenarios. The `TELEGRAM_FALLBACK_IPS` env var is only needed if DoH is also blocked on your network.
:::
## Troubleshooting
| Problem | Solution |

View file

@ -13,6 +13,7 @@ Connect Hermes to [WeCom](https://work.weixin.qq.com/) (企业微信), Tencent's
- A WeCom organization account
- An AI Bot created in the WeCom Admin Console
- The Bot ID and Secret from the bot's credentials page
- Python packages: `aiohttp` and `httpx`
## Setup
@ -56,10 +57,12 @@ hermes gateway start
- **WebSocket transport** — persistent connection, no public endpoint needed
- **DM and group messaging** — configurable access policies
- **Per-group sender allowlists** — fine-grained control over who can interact in each group
- **Media support** — images, files, voice, video upload and download
- **AES-encrypted media** — automatic decryption for inbound attachments
- **Quote context** — preserves reply threading
- **Markdown rendering** — rich text responses
- **Reply-mode streaming** — correlates responses to inbound message context
- **Auto-reconnect** — exponential backoff on connection drops
## Configuration Options
@ -75,12 +78,187 @@ Set these in `config.yaml` under `platforms.wecom.extra`:
| `group_policy` | `open` | Group access: `open`, `allowlist`, `disabled` |
| `allow_from` | `[]` | User IDs allowed for DMs (when dm_policy=allowlist) |
| `group_allow_from` | `[]` | Group IDs allowed (when group_policy=allowlist) |
| `groups` | `{}` | Per-group configuration (see below) |
## Access Policies
### DM Policy
Controls who can send direct messages to the bot:
| Value | Behavior |
|-------|----------|
| `open` | Anyone can DM the bot (default) |
| `allowlist` | Only user IDs in `allow_from` can DM |
| `disabled` | All DMs are ignored |
| `pairing` | Pairing mode (for initial setup) |
```bash
WECOM_DM_POLICY=allowlist
```
### Group Policy
Controls which groups the bot responds in:
| Value | Behavior |
|-------|----------|
| `open` | Bot responds in all groups (default) |
| `allowlist` | Bot only responds in group IDs listed in `group_allow_from` |
| `disabled` | All group messages are ignored |
```bash
WECOM_GROUP_POLICY=allowlist
```
### Per-Group Sender Allowlists
For fine-grained control, you can restrict which users are allowed to interact with the bot within specific groups. This is configured in `config.yaml`:
```yaml
platforms:
wecom:
enabled: true
extra:
bot_id: "your-bot-id"
secret: "your-secret"
group_policy: "allowlist"
group_allow_from:
- "group_id_1"
- "group_id_2"
groups:
group_id_1:
allow_from:
- "user_alice"
- "user_bob"
group_id_2:
allow_from:
- "user_charlie"
"*":
allow_from:
- "user_admin"
```
**How it works:**
1. The `group_policy` and `group_allow_from` controls determine whether a group is allowed at all.
2. If a group passes the top-level check, the `groups.<group_id>.allow_from` list (if present) further restricts which senders within that group can interact with the bot.
3. A wildcard `"*"` group entry serves as a default for groups not explicitly listed.
4. Allowlist entries support the `*` wildcard to allow all users, and entries are case-insensitive.
5. Entries can optionally use the `wecom:user:` or `wecom:group:` prefix format — the prefix is stripped automatically.
If no `allow_from` is configured for a group, all users in that group are allowed (assuming the group itself passes the top-level policy check).
## Media Support
### Inbound (receiving)
The adapter receives media attachments from users and caches them locally for agent processing:
| Type | How it's handled |
|------|-----------------|
| **Images** | Downloaded and cached locally. Supports both URL-based and base64-encoded images. |
| **Files** | Downloaded and cached. Filename is preserved from the original message. |
| **Voice** | Voice message text transcription is extracted if available. |
| **Mixed messages** | WeCom mixed-type messages (text + images) are parsed and all components extracted. |
**Quoted messages:** Media from quoted (replied-to) messages is also extracted, so the agent has context about what the user is replying to.
### AES-Encrypted Media Decryption
WeCom encrypts some inbound media attachments with AES-256-CBC. The adapter handles this automatically:
- When an inbound media item includes an `aeskey` field, the adapter downloads the encrypted bytes and decrypts them using AES-256-CBC with PKCS#7 padding.
- The AES key is the base64-decoded value of the `aeskey` field (must be exactly 32 bytes).
- The IV is derived from the first 16 bytes of the key.
- This requires the `cryptography` Python package (`pip install cryptography`).
No configuration is needed — decryption happens transparently when encrypted media is received.
### Outbound (sending)
| Method | What it sends | Size limit |
|--------|--------------|------------|
| `send` | Markdown text messages | 4000 chars |
| `send_image` / `send_image_file` | Native image messages | 10 MB |
| `send_document` | File attachments | 20 MB |
| `send_voice` | Voice messages (AMR format only for native voice) | 2 MB |
| `send_video` | Video messages | 10 MB |
**Chunked upload:** Files are uploaded in 512 KB chunks through a three-step protocol (init → chunks → finish). The adapter handles this automatically.
**Automatic downgrade:** When media exceeds the native type's size limit but is under the absolute 20 MB file limit, it is automatically sent as a generic file attachment instead:
- Images > 10 MB → sent as file
- Videos > 10 MB → sent as file
- Voice > 2 MB → sent as file
- Non-AMR audio → sent as file (WeCom only supports AMR for native voice)
Files exceeding the absolute 20 MB limit are rejected with an informational message sent to the chat.
## Reply-Mode Stream Responses
When the bot receives a message via the WeCom callback, the adapter remembers the inbound request ID. If a response is sent while the request context is still active, the adapter uses WeCom's reply-mode (`aibot_respond_msg`) with streaming to correlate the response directly to the inbound message. This provides a more natural conversation experience in the WeCom client.
If the inbound request context has expired or is unavailable, the adapter falls back to proactive message sending via `aibot_send_msg`.
Reply-mode also works for media: uploaded media can be sent as a reply to the originating message.
## Connection and Reconnection
The adapter maintains a persistent WebSocket connection to WeCom's gateway at `wss://openws.work.weixin.qq.com`.
### Connection Lifecycle
1. **Connect:** Opens a WebSocket connection and sends an `aibot_subscribe` authentication frame with the bot_id and secret.
2. **Heartbeat:** Sends application-level ping frames every 30 seconds to keep the connection alive.
3. **Listen:** Continuously reads inbound frames and dispatches message callbacks.
### Reconnection Behavior
On connection loss, the adapter uses exponential backoff to reconnect:
| Attempt | Delay |
|---------|-------|
| 1st retry | 2 seconds |
| 2nd retry | 5 seconds |
| 3rd retry | 10 seconds |
| 4th retry | 30 seconds |
| 5th+ retry | 60 seconds |
After each successful reconnection, the backoff counter resets to zero. All pending request futures are failed on disconnect so callers don't hang indefinitely.
### Deduplication
Inbound messages are deduplicated using message IDs with a 5-minute window and a maximum cache of 1000 entries. This prevents double-processing of messages during reconnection or network hiccups.
## All Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `WECOM_BOT_ID` | ✅ | — | WeCom AI Bot ID |
| `WECOM_SECRET` | ✅ | — | WeCom AI Bot Secret |
| `WECOM_ALLOWED_USERS` | — | _(empty)_ | Comma-separated user IDs for the gateway-level allowlist |
| `WECOM_HOME_CHANNEL` | — | — | Chat ID for cron/notification output |
| `WECOM_WEBSOCKET_URL` | — | `wss://openws.work.weixin.qq.com` | WebSocket gateway URL |
| `WECOM_DM_POLICY` | — | `open` | DM access policy |
| `WECOM_GROUP_POLICY` | — | `open` | Group access policy |
## Troubleshooting
| Problem | Fix |
|---------|-----|
| "WECOM_BOT_ID and WECOM_SECRET are required" | Set both env vars or configure in setup wizard |
| "invalid secret (errcode=40013)" | Verify the secret matches your bot's credentials |
| "Timed out waiting for subscribe acknowledgement" | Check network connectivity to `openws.work.weixin.qq.com` |
| Bot doesn't respond in groups | Check `group_policy` setting and group allowlist |
| `WECOM_BOT_ID and WECOM_SECRET are required` | Set both env vars or configure in setup wizard |
| `WeCom startup failed: aiohttp not installed` | Install aiohttp: `pip install aiohttp` |
| `WeCom startup failed: httpx not installed` | Install httpx: `pip install httpx` |
| `invalid secret (errcode=40013)` | Verify the secret matches your bot's credentials |
| `Timed out waiting for subscribe acknowledgement` | Check network connectivity to `openws.work.weixin.qq.com` |
| Bot doesn't respond in groups | Check `group_policy` setting and ensure the group ID is in `group_allow_from` |
| Bot ignores certain users in a group | Check per-group `allow_from` lists in the `groups` config section |
| Media decryption fails | Install `cryptography`: `pip install cryptography` |
| `cryptography is required for WeCom media decryption` | The inbound media is AES-encrypted. Install: `pip install cryptography` |
| Voice messages sent as files | WeCom only supports AMR format for native voice. Other formats are auto-downgraded to file. |
| `File too large` error | WeCom has a 20 MB absolute limit on all file uploads. Compress or split the file. |
| Images sent as files | Images > 10 MB exceed the native image limit and are auto-downgraded to file attachments. |
| `Timeout sending message to WeCom` | The WebSocket may have disconnected. Check logs for reconnection messages. |
| `WeCom websocket closed during authentication` | Network issue or incorrect credentials. Verify bot_id and secret. |