Follow-up for cherry-picked PR #8272:
- Add MATRIX_RECOVERY_KEY to module docstring header in matrix.py
- Register in OPTIONAL_ENV_VARS (config.py) with password=True, advanced=True
- Add to _NON_SETUP_ENV_VARS set
- Document cross-signing verification in matrix.md E2EE section
- Update migration guide with recovery key step (step 3)
- Add to environment-variables.md reference
After the PgCryptoStore migration in v0.8.0, the verify_with_recovery_key
call that previously ran after share_keys() was dropped. On any rotation
that uploads fresh device keys (fresh crypto.db, server had stale keys
from a prior install, etc.), the new device keys carry no valid self-
signing signature because the bot has no access to the self-signing
private key.
Peers like Element then refuse to share Megolm sessions with the
rotated device, so the bot silently stops decrypting incoming messages.
This restores the recovery-key bootstrap: on startup, if
MATRIX_RECOVERY_KEY is set, import the cross-signing private keys from
SSSS and sign_own_device(), producing a valid signature server-side.
Idempotent and gated on MATRIX_RECOVERY_KEY — no behavior change for
users who don't configure a recovery key.
Verified end-to-end by deleting crypto.db and restarting: the bot
rotates device identity keys, re-uploads, self-signs via recovery key,
and decrypts+replies to fresh messages from a paired Element client.
Add content-aware splitting to compact mode: short chat-like exchanges
(2-6 short lines without headings/lists/quotes) get separate message
bubbles for a natural chat feel, while structured content (tables,
headings with body, numbered lists) stays in a single message.
Cherry-picked from PR #7587 by bravohenry, adapted to the compact/legacy
split_per_line architecture from #7903.
Fixes#7952 — Matrix E2EE completely broken after mautrix migration.
- Replace MemoryCryptoStore + pickle/HMAC persistence with mautrix's
PgCryptoStore backed by SQLite via aiosqlite. Crypto state now
persists reliably across restarts without fragile serialization.
- Add handle_sync() call on initial sync response so to-device events
(queued Megolm key shares) are dispatched to OlmMachine instead of
being silently dropped.
- Add _verify_device_keys_on_server() after loading crypto state.
Detects missing keys (re-uploads), stale keys from migration
(attempts re-upload), and corrupted state (refuses E2EE).
- Add _CryptoStateStore adapter wrapping MemoryStateStore to satisfy
mautrix crypto's StateStore interface (is_encrypted,
get_encryption_info, find_shared_rooms).
- Remove redundant share_keys() call from sync loop — OlmMachine
already handles this via DEVICE_OTK_COUNT event handler.
- Fix datetime vs float TypeError in session.py suspend_recently_active()
that crashed gateway startup.
- Add aiosqlite and asyncpg to [matrix] extra in pyproject.toml.
- Update test mocks for PgCryptoStore/Database and add query_keys mock
for key verification. 174 tests pass.
- Add E2EE upgrade/migration docs to Matrix user guide.
Add a second WeCom integration mode for regular enterprise self-built
applications. Unlike the existing bot/websocket adapter (wecom.py),
this handles WeCom's standard callback flow: WeCom POSTs encrypted XML
to an HTTP endpoint, the adapter decrypts, queues for the agent, and
immediately acknowledges. The agent's reply is delivered proactively
via the message/send API.
Key design choice: always acknowledge immediately and use proactive
send — agent sessions take 3-30 minutes, so the 5-second inline reply
window is never useful. The original PR's Future/pending-reply
machinery was removed in favour of this simpler architecture.
Features:
- AES-CBC encrypt/decrypt (BizMsgCrypt-compatible)
- Multi-app routing scoped by corp_id:user_id
- Legacy bare user_id fallback for backward compat
- Access-token management with auto-refresh
- WECOM_CALLBACK_* env var overrides
- Port-in-use pre-check before binding
- Health endpoint at /health
Salvaged from PR #7774 by @chqchshj. Simplified by removing the
inline reply Future system and fixing: secrets.choice for nonce
generation, immediate plain-text acknowledgment (not encrypted XML
containing 'success'), and initial token refresh error handling.
When 'hermes claw migrate' copies Telegram/Discord/Slack bot tokens from
OpenClaw while the Hermes gateway is already polling with those same tokens,
the platforms conflict (e.g. Telegram 409). Add a pre-flight check that reads
gateway_state.json via get_running_pid() + read_runtime_status(), warns the
user, and lets them cancel or continue.
Also improve the Telegram polling conflict error message to mention OpenClaw
as a common cause and give the 'hermes start' restart command.
Refs #7907
When sending multi-chunk responses, individual chunks can fail due to
transient iLink API errors. Previously a single failure would abort the
entire message. Now each chunk is retried with linear backoff before
giving up, and the same client_id is reused across retries for
server-side deduplication.
Configurable via config.yaml (platforms.weixin.extra) or env vars:
- send_chunk_delay_seconds (default 0.35s) — pacing between chunks
- send_chunk_retries (default 2) — max retry attempts per chunk
- send_chunk_retry_delay_seconds (default 1.0s) — base retry delay
Replaces the hardcoded 0.3s inter-chunk delay from #7903.
Salvaged from PR #7899 by @corazzione. Fixes#7836.
WeCom AI Bot sends file attachments with msgtype="appmsg", not
msgtype="file". Previously only file content was discarded while
the text title reached the agent.
Changes:
- _extract_text(): Extract appmsg title (filename) for display
- _extract_media(): Handle appmsg type with file/image content
Fixes#7750
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Weixin adapter was splitting responses at every top-level newline,
causing notification spam (up to 70 API calls for a single long markdown
response). This salvages the best aspects of six contributor PRs:
Compact mode (new default):
- Messages under the 4000-char limit stay as a single bubble even with
multiple lines, paragraphs, and code blocks
- Only oversized messages get split at logical markdown boundaries
- Inter-chunk delay (0.3s) between chunks prevents WeChat rate-limit drops
Legacy mode (opt-in):
- Set split_multiline_messages: true in platforms.weixin.extra config
- Or set WEIXIN_SPLIT_MULTILINE_MESSAGES=true env var
- Restores the old per-line splitting behavior
Salvaged from PRs #7797 (guantoubaozi), #7792 (luoxiao6645),
#7838 (qyx596), #7825 (weedge), #7784 (sherunlock03), #7773 (JnyRoad).
Core fix unanimous across all six; config toggle from #7838; inter-chunk
delay from #7825.
Matrix gateway: fix sync loop never dispatching events (#5819)
- _sync_loop() called client.sync() but never called handle_sync()
to dispatch events to registered callbacks — _on_room_message was
registered but never fired for new messages
- Store next_batch token from initial sync and pass as since= to
subsequent incremental syncs (was doing full initial sync every time)
- 17 comments, confirmed by multiple users on matrix.org
Feishu docs: add interactive card configuration for approvals (#6893)
- Error 200340 is a Feishu Developer Console configuration issue,
not a code bug — users need to enable Interactive Card capability
and configure Card Request URL
- Added required 3-step setup instructions to feishu.md
- Added troubleshooting entry for error 200340
- 17 comments from Feishu users
Copilot provider drift: detect GPT-5.x Responses API requirement (#3388)
- GPT-5.x models are rejected on /v1/chat/completions by both OpenAI
and OpenRouter (unsupported_api_for_model error)
- Added _model_requires_responses_api() to detect models needing
Responses API regardless of provider
- Applied in __init__ (covers OpenRouter primary users) and in
_try_activate_fallback() (covers Copilot->OpenRouter drift)
- Fixed stale comment claiming gateway creates fresh agents per message
(it caches them via _agent_cache since the caching was added)
- 7 comments, reported on Copilot+Telegram gateway
* fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21
MemoryCryptoStore.__init__() now requires account_id and pickle_key
positional arguments as of mautrix 0.21. The migration from matrix-nio
(commit 1850747) didn't account for this, causing E2EE initialization
to fail with:
MemoryCryptoStore.__init__() missing 2 required positional arguments:
'account_id' and 'pickle_key'
Pass self._user_id as account_id and derive pickle_key from the same
user_id:device_id pair already used for the on-disk HMAC signature.
Update the test stub to accept the new parameters.
Fixes#7803
* fix: use consistent fallback for pickle_key derivation
Address review: _pickle_key now uses _acct_id (which has the 'hermes'
fallback) instead of raw self._user_id, so both values stay consistent
when user_id is empty.
---------
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Follow-up fixes for the matrix-nio → mautrix migration:
1. Module-level mautrix.types import now wrapped in try/except with
proper stub classes. Without this, importing gateway.platforms.matrix
crashes the entire gateway when mautrix isn't installed — even for
users who don't use Matrix. The stubs mirror mautrix's real attribute
names so tests that exercise adapter methods (send, reactions, etc.)
work without the real SDK.
2. Removed _ensure_mautrix_mock() from test_matrix_mention.py — it
permanently installed MagicMock modules in sys.modules via setdefault(),
polluting later tests in the suite. No longer needed since the module
imports cleanly without mautrix.
3. Fixed thread persistence tests to use direct class reference in
monkeypatch.setattr() instead of string-based paths, which broke
when the module was reimported by other tests.
4. Moved the module-importability test to a subprocess to prevent it
from polluting sys.modules (reimporting creates a second module object
with different __dict__, breaking patch.object in subsequent tests).
The old nio code only handled RoomMessageText (m.text). The mautrix
rewrite dispatched both m.text and m.notice, which would cause infinite
loops between bots since m.notice is the conventional msgtype for bot
responses in the Matrix ecosystem.
- Add api.session.close() on E2EE dep check and E2EE setup failure
paths (two missing cleanup points from the mautrix migration)
- Replace raw pickle.load/dump with HMAC-SHA256 signed payloads to
prevent arbitrary code execution from a tampered store file
- Extract _resolve_message_context() to deduplicate ~40 lines of
mention/thread/DM gating logic between text and media handlers
- Move mautrix.types imports to module level (16 scattered local
imports consolidated)
- Parse mention/thread env vars once in __init__ instead of per-message
- Cache _is_bot_mentioned() result instead of calling 3x per event
- Consolidate send_emote/send_notice into shared _send_simple_message()
- Use _is_dm_room() in get_chat_info() instead of inline duplication
- Add _CRYPTO_PICKLE_PATH constant (was duplicated in 2 locations)
- Fix fragile event_ts extraction (double getattr, None safety)
- Clean up leaked aiohttp session on auth failure paths
- Remove redundant trailing _track_thread() calls
Address two bugs found by code review:
1. MemoryCryptoStore loses all E2EE keys on restart — now pickle the
store to disk on disconnect and restore on connect, preserving
Megolm sessions across restarts.
2. Encrypted events buffered for retry were silently dropped after
decryption because _on_encrypted_event registered the event ID
in the dedup set, then _on_room_message rejected it as a
duplicate. Now clear the dedup entry before routing decrypted
events.
Translate all nio SDK calls to mautrix equivalents while preserving the
adapter structure, business logic, and all features (E2EE, reactions,
threading, mention gating, text batching, media caching, voice MSC3245).
Key changes:
- nio.AsyncClient -> mautrix.client.Client + HTTPAPI + MemoryStateStore
- Manual E2EE key management -> OlmMachine with auto key lifecycle
- isinstance(resp, nio.XxxResponse) -> mautrix returns values directly
- add_event_callback per type -> single ROOM_MESSAGE handler with
msgtype dispatch
- Room state (member_count, display_name) via async state store lookups
- Upload/download return ContentURI/bytes directly (no wrapper objects)
Tool progress markers (e.g. `⏰ list`) were injected directly into
SSE delta.content chunks. OpenAI-compatible frontends (Open WebUI,
LobeChat, etc.) store delta.content verbatim as the assistant message
and send it back on subsequent requests. After enough turns, the model
learns to emit these markers as plain text instead of issuing real tool
calls — silently hallucinating tool results without ever running them.
Fix: Send tool progress as a custom `event: hermes.tool.progress` SSE
event instead of mixing it into delta.content. Per the SSE spec, clients
that don't understand a custom event type silently ignore it, so this is
backward-compatible. Frontends that want to render progress indicators
can listen for the custom event without persisting it to conversation
history.
The /v1/runs endpoint already uses structured events — this aligns the
/v1/chat/completions streaming path with the same principle.
Closes#6972
Add is_network_accessible() helper using Python's ipaddress module to
robustly classify bind addresses (IPv4/IPv6 loopback, wildcards,
mapped addresses, hostname resolution with DNS-failure-fails-closed).
The API server connect() now refuses to start when the bind address is
network-accessible and no API_SERVER_KEY is set, preventing RCE from
other machines on the network.
Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>
When enabled, @mentioning the bot in a DM creates a thread (default:
false). Supports both env var and YAML config (matrix.dm_mention_threads).
6 new tests, docs updated.
From #6957
Bot-added and bot-removed events were silently dropped because
_on_bot_added_to_chat and _on_bot_removed_from_chat were not
registered in _build_event_handler().
From #6975
Replace the simple DISCORD_IGNORE_NO_MENTION check with bot-aware
multi-agent filtering. When multiple agents share a channel:
- If other bots are @mentioned but this bot is not → stay silent
- If only humans are mentioned but not this bot → stay silent
- Messages with no mentions still flow to _handle_message for the
existing DISCORD_REQUIRE_MENTION check
- DMs are unaffected (always handled)
This prevents both agents from responding when only one is addressed.
Telegram's Bot API only allows a specific set of emoji for bot reactions
(the ReactionEmoji enum). ✅ (U+2705) and ❌ (U+274C) are not in that
set, causing on_processing_complete reactions to silently fail with
REACTION_INVALID (caught at debug log level).
Replace with 👍 (U+1F44D) / 👎 (U+1F44E) which are always available in
Telegram's allowed reaction list. The 👀 (eyes) reaction used by
on_processing_start was already valid.
Based on the fix by @ppdng in PR #6685.
Fixes#6068
Simplified implementation of the feature from PR #6842 (RunzhouLi).
Allows Discord channels/forum threads to auto-bind skills via config:
discord:
channel_skill_bindings:
- id: "123456"
skills: ["skill-a", "skill-b"]
The run.py auto-skill loader now handles both str and list[str],
loading multiple skills in order and concatenating their payloads.
Forum threads inherit their parent channel's bindings.
Co-authored-by: RunzhouLi <RunzhouLi@users.noreply.github.com>
Add debug logging when eyes reaction redaction fails, and add tests
for the success=False path and the no-pending-reaction edge case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The on_processing_complete handler was never removing the eyes reaction because
_send_reaction didn't return the reaction event_id.
Fix:
- _send_reaction returns Optional[str] event_id
- on_processing_start stores it in _pending_reactions dict
- on_processing_complete redacts the eyes reaction before adding completion emoji
When extra.base_url is set in the Telegram platform config, use it as
the base URL for all Telegram API requests instead of api.telegram.org.
This allows agents to route Telegram traffic through the credential
proxy, which injects the real bot token — the VM never sees it.
Also supports extra.base_file_url for file downloads (defaults to
base_url if not set separately).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up to Dusk1e's PR #7120 (Slack send_image redirect guard):
- Rename _safe_url_for_log -> safe_url_for_log (drop underscore) since
it is now imported cross-module by the Slack adapter
- Add _ssrf_redirect_guard httpx event hook to cache_image_from_url()
and cache_audio_from_url() in base.py — same pattern as vision_tools
and the Slack adapter fix
- Update url_safety.py docstring to reflect broader coverage
- Add regression tests for image/audio redirect blocking + safe passthrough
The API server's _run_agent() was not passing task_id to
run_conversation(), causing a fresh random UUID per request. This meant
every Open WebUI message spun up a new Docker container and tore it down
afterward — making persistent filesystem state impossible.
Two fixes:
1. Pass task_id="default" so all API server conversations share the same
Docker container (matching the design intent: one configured Docker
environment, always the same container).
2. Derive a stable session_id from the system prompt + first user message
hash instead of uuid4(). This stops hermes sessions list from being
polluted with single-message throwaway sessions.
Fixes#3438.
Slack may return an HTML sign-in/redirect page instead of actual media
bytes (e.g. expired token, restricted file access). This adds two layers
of defense:
1. Content-Type check in slack.py rejects text/html responses early
2. Magic-byte validation in base.py's cache_image_from_bytes() rejects
non-image data regardless of source platform
Also adds ValueError guards in wecom.py and email.py so the new
validation doesn't crash those adapters.
Closes#6829
- configure Telegram HTTPXRequest pool/timeouts with env-overridable defaults\n- use separate request/get_updates request objects to reduce pool contention\n- skip fallback-IP transport when proxy is configured (or explicitly disabled)\n\nThis mitigates recurrent pool-timeout failures during polling reconnect/bootstrap (delete_webhook).