When sending multi-chunk responses, individual chunks can fail due to
transient iLink API errors. Previously a single failure would abort the
entire message. Now each chunk is retried with linear backoff before
giving up, and the same client_id is reused across retries for
server-side deduplication.
Configurable via config.yaml (platforms.weixin.extra) or env vars:
- send_chunk_delay_seconds (default 0.35s) — pacing between chunks
- send_chunk_retries (default 2) — max retry attempts per chunk
- send_chunk_retry_delay_seconds (default 1.0s) — base retry delay
Replaces the hardcoded 0.3s inter-chunk delay from #7903.
Salvaged from PR #7899 by @corazzione. Fixes#7836.
WeCom AI Bot sends file attachments with msgtype="appmsg", not
msgtype="file". Previously only file content was discarded while
the text title reached the agent.
Changes:
- _extract_text(): Extract appmsg title (filename) for display
- _extract_media(): Handle appmsg type with file/image content
Fixes#7750
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked from PR #7747 with follow-up fixes:
- Narrowed suspend_all_active() to suspend_recently_active() — only
suspends sessions updated within the last 2 minutes (likely in-flight),
not all sessions which would unnecessarily reset idle users
- /stop with no running agent no longer suspends the session; only
actual force-stops mark the session for reset
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes#6341, #6485, #7643
Relates to #6516, #7392
The Weixin adapter was splitting responses at every top-level newline,
causing notification spam (up to 70 API calls for a single long markdown
response). This salvages the best aspects of six contributor PRs:
Compact mode (new default):
- Messages under the 4000-char limit stay as a single bubble even with
multiple lines, paragraphs, and code blocks
- Only oversized messages get split at logical markdown boundaries
- Inter-chunk delay (0.3s) between chunks prevents WeChat rate-limit drops
Legacy mode (opt-in):
- Set split_multiline_messages: true in platforms.weixin.extra config
- Or set WEIXIN_SPLIT_MULTILINE_MESSAGES=true env var
- Restores the old per-line splitting behavior
Salvaged from PRs #7797 (guantoubaozi), #7792 (luoxiao6645),
#7838 (qyx596), #7825 (weedge), #7784 (sherunlock03), #7773 (JnyRoad).
Core fix unanimous across all six; config toggle from #7838; inter-chunk
delay from #7825.
Matrix gateway: fix sync loop never dispatching events (#5819)
- _sync_loop() called client.sync() but never called handle_sync()
to dispatch events to registered callbacks — _on_room_message was
registered but never fired for new messages
- Store next_batch token from initial sync and pass as since= to
subsequent incremental syncs (was doing full initial sync every time)
- 17 comments, confirmed by multiple users on matrix.org
Feishu docs: add interactive card configuration for approvals (#6893)
- Error 200340 is a Feishu Developer Console configuration issue,
not a code bug — users need to enable Interactive Card capability
and configure Card Request URL
- Added required 3-step setup instructions to feishu.md
- Added troubleshooting entry for error 200340
- 17 comments from Feishu users
Copilot provider drift: detect GPT-5.x Responses API requirement (#3388)
- GPT-5.x models are rejected on /v1/chat/completions by both OpenAI
and OpenRouter (unsupported_api_for_model error)
- Added _model_requires_responses_api() to detect models needing
Responses API regardless of provider
- Applied in __init__ (covers OpenRouter primary users) and in
_try_activate_fallback() (covers Copilot->OpenRouter drift)
- Fixed stale comment claiming gateway creates fresh agents per message
(it caches them via _agent_cache since the caching was added)
- 7 comments, reported on Copilot+Telegram gateway
* fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21
MemoryCryptoStore.__init__() now requires account_id and pickle_key
positional arguments as of mautrix 0.21. The migration from matrix-nio
(commit 1850747) didn't account for this, causing E2EE initialization
to fail with:
MemoryCryptoStore.__init__() missing 2 required positional arguments:
'account_id' and 'pickle_key'
Pass self._user_id as account_id and derive pickle_key from the same
user_id:device_id pair already used for the on-disk HMAC signature.
Update the test stub to accept the new parameters.
Fixes#7803
* fix: use consistent fallback for pickle_key derivation
Address review: _pickle_key now uses _acct_id (which has the 'hermes'
fallback) instead of raw self._user_id, so both values stay consistent
when user_id is empty.
---------
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Telegram flood control during streaming caused messages to be cut off
mid-response. The old behavior permanently disabled edits after a single
flood-control failure, losing the remainder of the response.
Changes:
- Adaptive backoff: on flood-control edit failures, double the edit interval
instead of immediately disabling edits. Only permanently disable after 3
consecutive failures (_MAX_FLOOD_STRIKES).
- Cursor strip: when entering fallback mode, best-effort edit to remove the
cursor (▉) from the last visible message so it doesn't appear stuck.
- Fallback send retry: _send_fallback_final retries each chunk once on
flood-control failures (3s delay) before giving up.
- Default edit_interval increased from 0.3s to 1.0s. Telegram rate-limits
edits at ~1/s per message; 0.3s was virtually guaranteed to trigger flood
control on any non-trivial response.
- _send_or_edit returns bool so the overflow split loop knows not to
truncate accumulated text when an edit fails (prevents content loss).
Fixes: messages cutting/stopping mid-response on Telegram, especially
with streaming enabled.
* feat: add watch_patterns to background processes for output monitoring
Adds a new 'watch_patterns' parameter to terminal(background=true) that
lets the agent specify strings to watch for in process output. When a
matching line appears, a notification is queued and injected as a
synthetic message — triggering a new agent turn, similar to
notify_on_complete but mid-process.
Implementation:
- ProcessSession gets watch_patterns field + rate-limit state
- _check_watch_patterns() in ProcessRegistry scans new output chunks
from all three reader threads (local, PTY, env-poller)
- Rate limited: max 8 notifications per 10s window
- Sustained overload (45s) permanently disables watching for that process
- watch_queue alongside completion_queue, same consumption pattern
- CLI drains watch_queue in both idle loop and post-turn drain
- Gateway drains after agent runs via _inject_watch_notification()
- Checkpoint persistence + crash recovery includes watch_patterns
- Blocked in execute_code sandbox (like other bg params)
- 20 new tests covering matching, rate limiting, overload kill,
checkpoint persistence, schema, and handler passthrough
Usage:
terminal(
command='npm run dev',
background=true,
watch_patterns=['ERROR', 'WARN', 'listening on port']
)
* refactor: merge watch_queue into completion_queue
Unified queue with 'type' field distinguishing 'completion',
'watch_match', and 'watch_disabled' events. Extracted
_format_process_notification() in CLI and gateway to handle
all event types in a single drain loop. Removes duplication
across both CLI drain sites and the gateway.
When the stream consumer has sent at least one message (already_sent=True),
the gateway skips sending the final response to avoid duplicates. But this
also suppressed error messages when the agent failed mid-loop — rate limit
exhaustion, context overflow, compression failure, etc.
The user would see the last streamed content and then nothing: no error
message, no explanation. The agent appeared to 'stop responding.'
Fix: check the 'failed' flag at both the producer (_run_agent marks
already_sent) and consumer (_handle_message_with_agent checks it) sites.
Error messages are always delivered regardless of streaming state.
- Add agent.close() call to _finalize_shutdown_agents() to prevent
zombie processes (terminal sandboxes, browser daemons, httpx clients)
- Global cleanup (process_registry, environments, browsers) preserved
in _stop_impl() during conflict resolution
- Move /restart CommandDef from 'Info' to 'Session' category to match
/stop and /status
* fix: circuit breaker stops CPU-burning restart loops on persistent errors
When a gateway session hits a non-retryable error (e.g. invalid model
ID → HTTP 400), the agent fails and returns. But if the session keeps
receiving messages (or something periodically recreates agents), each
attempt spawns a new AIAgent — reinitializing MCP server connections,
burning CPU — only to hit the same 400 error again. On a 4-core server,
this pegs an entire core per stuck session and accumulates 300+ minutes
of CPU time over hours.
Fix: add a per-session consecutive failure counter in the gateway runner.
- Track consecutive non-retryable failures per session key
- After 3 consecutive failures (_MAX_CONSECUTIVE_FAILURES), block
further agent creation for that session and notify the user:
'⚠️ This session has failed N times in a row with a non-retryable
error. Use /reset to start a new session.'
- Evict the cached agent when the circuit breaker engages to prevent
stale state from accumulating
- Reset the counter on successful agent runs
- Clear the counter on /reset and /new so users can recover
- Uses getattr() pattern so bare GatewayRunner instances (common in
tests using object.__new__) don't crash
Tests:
- 8 new tests in test_circuit_breaker.py covering counter behavior,
threshold, reset, session isolation, and bare-runner safety
Addresses #7130.
* Revert "fix: circuit breaker stops CPU-burning restart loops on persistent errors"
This reverts commit d848ea7109.
* fix: don't evict cached agent on failed runs — prevents MCP restart loop
When a run fails (e.g. invalid model ID → 400) and fallback activated,
the gateway was evicting the cached agent to 'retry primary next time.'
But evicting a failed agent forces a full AIAgent recreation on the next
message — reinitializing MCP server connections, spawning stdio
processes — only to hit the same 400 again. This created a CPU-burning
loop (91%+ for hours, #7130).
The fix: add `and not _run_failed` to the fallback-eviction check.
Failed runs keep the cached agent. The next message reuses it (no MCP
reinit), hits the same error, returns it to the user quickly. The user
can /reset or /model to fix their config.
Successful fallback runs still evict as before so the next message
retries the primary model.
Addresses #7130.
Follow-up fixes for the matrix-nio → mautrix migration:
1. Module-level mautrix.types import now wrapped in try/except with
proper stub classes. Without this, importing gateway.platforms.matrix
crashes the entire gateway when mautrix isn't installed — even for
users who don't use Matrix. The stubs mirror mautrix's real attribute
names so tests that exercise adapter methods (send, reactions, etc.)
work without the real SDK.
2. Removed _ensure_mautrix_mock() from test_matrix_mention.py — it
permanently installed MagicMock modules in sys.modules via setdefault(),
polluting later tests in the suite. No longer needed since the module
imports cleanly without mautrix.
3. Fixed thread persistence tests to use direct class reference in
monkeypatch.setattr() instead of string-based paths, which broke
when the module was reimported by other tests.
4. Moved the module-importability test to a subprocess to prevent it
from polluting sys.modules (reimporting creates a second module object
with different __dict__, breaking patch.object in subsequent tests).
The old nio code only handled RoomMessageText (m.text). The mautrix
rewrite dispatched both m.text and m.notice, which would cause infinite
loops between bots since m.notice is the conventional msgtype for bot
responses in the Matrix ecosystem.
- Add api.session.close() on E2EE dep check and E2EE setup failure
paths (two missing cleanup points from the mautrix migration)
- Replace raw pickle.load/dump with HMAC-SHA256 signed payloads to
prevent arbitrary code execution from a tampered store file
- Extract _resolve_message_context() to deduplicate ~40 lines of
mention/thread/DM gating logic between text and media handlers
- Move mautrix.types imports to module level (16 scattered local
imports consolidated)
- Parse mention/thread env vars once in __init__ instead of per-message
- Cache _is_bot_mentioned() result instead of calling 3x per event
- Consolidate send_emote/send_notice into shared _send_simple_message()
- Use _is_dm_room() in get_chat_info() instead of inline duplication
- Add _CRYPTO_PICKLE_PATH constant (was duplicated in 2 locations)
- Fix fragile event_ts extraction (double getattr, None safety)
- Clean up leaked aiohttp session on auth failure paths
- Remove redundant trailing _track_thread() calls
Address two bugs found by code review:
1. MemoryCryptoStore loses all E2EE keys on restart — now pickle the
store to disk on disconnect and restore on connect, preserving
Megolm sessions across restarts.
2. Encrypted events buffered for retry were silently dropped after
decryption because _on_encrypted_event registered the event ID
in the dedup set, then _on_room_message rejected it as a
duplicate. Now clear the dedup entry before routing decrypted
events.
Translate all nio SDK calls to mautrix equivalents while preserving the
adapter structure, business logic, and all features (E2EE, reactions,
threading, mention gating, text batching, media caching, voice MSC3245).
Key changes:
- nio.AsyncClient -> mautrix.client.Client + HTTPAPI + MemoryStateStore
- Manual E2EE key management -> OlmMachine with auto key lifecycle
- isinstance(resp, nio.XxxResponse) -> mautrix returns values directly
- add_event_callback per type -> single ROOM_MESSAGE handler with
msgtype dispatch
- Room state (member_count, display_name) via async state store lookups
- Upload/download return ContentURI/bytes directly (no wrapper objects)
Tool progress markers (e.g. `⏰ list`) were injected directly into
SSE delta.content chunks. OpenAI-compatible frontends (Open WebUI,
LobeChat, etc.) store delta.content verbatim as the assistant message
and send it back on subsequent requests. After enough turns, the model
learns to emit these markers as plain text instead of issuing real tool
calls — silently hallucinating tool results without ever running them.
Fix: Send tool progress as a custom `event: hermes.tool.progress` SSE
event instead of mixing it into delta.content. Per the SSE spec, clients
that don't understand a custom event type silently ignore it, so this is
backward-compatible. Frontends that want to render progress indicators
can listen for the custom event without persisting it to conversation
history.
The /v1/runs endpoint already uses structured events — this aligns the
/v1/chat/completions streaming path with the same principle.
Closes#6972
Six platforms (matrix, mattermost, dingtalk, feishu, wecom, homeassistant)
were missing from the session-based discovery loop, causing /channels and
send_message to return empty results on those platforms.
Instead of adding them to the hardcoded tuple (which would break again when
new platforms are added), derive the list dynamically from the Platform enum.
Only infrastructure entries (local, api_server, webhook) are excluded;
Discord and Slack are skipped automatically because their direct builders
already populate the platforms dict.
Reported by sprmn24 in PR #7416.
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes#7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Two fixes from PR review:
1. Session expiry was looking in _running_agents for the cached agent,
but idle expired sessions live in _agent_cache. Now checks
_agent_cache first, falls back to _running_agents.
2. Global cleanup in stop() was missing process_registry.kill_all(),
so background processes from agents evicted without close() (branch,
fallback) survived shutdown.
Use getattr guard for _agent_cache_lock in _handle_reset_command
because test fixtures may create GatewayRunner without calling
__init__, leaving the attribute unset.
Fixes e2e test failure: test_new_resets_session,
test_new_then_status_reflects_reset, test_new_is_idempotent.
Add 9 tests covering the full zombie process prevention chain:
- TestZombieReproduction: demonstrates that processes survive when
references are dropped without explicit cleanup (the original bug)
- TestAgentCloseMethod: verifies close() calls all cleanup functions,
is idempotent, propagates to children, and continues cleanup even
when individual steps fail
- TestGatewayCleanupWiring: verifies stop() calls close() and that
_evict_cached_agent() does NOT call close() (since it's also used
for non-destructive cache refreshes)
- TestDelegationCleanup: calls the real _run_single_child function and
verifies close() is called on the child agent
Ref: #7131
Wire AIAgent.close() into every gateway code path where an agent's
session is actually ending:
- stop(): close all running agents after interrupt + memory shutdown,
then call cleanup_all_environments() and cleanup_all_browsers() as
a global catch-all
- _session_expiry_watcher(): close agents when sessions expire after
the 5-minute idle timeout
- _handle_reset_command(): close the old agent before evicting it from
cache on /new or /reset
Note: _evict_cached_agent() intentionally does NOT call close() because
it is also used for non-destructive cache refreshes (model switch,
branch, fallback) where tool resources should persist.
Ref: #7131
Add is_network_accessible() helper using Python's ipaddress module to
robustly classify bind addresses (IPv4/IPv6 loopback, wildcards,
mapped addresses, hostname resolution with DNS-failure-fails-closed).
The API server connect() now refuses to start when the bind address is
network-accessible and no API_SERVER_KEY is set, preventing RCE from
other machines on the network.
Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>
When enabled, @mentioning the bot in a DM creates a thread (default:
false). Supports both env var and YAML config (matrix.dm_mention_threads).
6 new tests, docs updated.
From #6957
Bot-added and bot-removed events were silently dropped because
_on_bot_added_to_chat and _on_bot_removed_from_chat were not
registered in _build_event_handler().
From #6975
Replace the simple DISCORD_IGNORE_NO_MENTION check with bot-aware
multi-agent filtering. When multiple agents share a channel:
- If other bots are @mentioned but this bot is not → stay silent
- If only humans are mentioned but not this bot → stay silent
- Messages with no mentions still flow to _handle_message for the
existing DISCORD_REQUIRE_MENTION check
- DMs are unaffected (always handled)
This prevents both agents from responding when only one is addressed.
- Remove sys.path.insert hack (leftover from standalone dev)
- Add token lock (acquire_scoped_lock/release_scoped_lock) in
connect()/disconnect() to prevent duplicate pollers across profiles
- Fix get_connected_platforms: WEIXIN check must precede generic
token/api_key check (requires both token AND account_id)
- Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS
- Add gateway setup wizard with QR login flow
- Add platform status check for partially configured state
- Add weixin.md docs page with full adapter documentation
- Update environment-variables.md reference with all 11 env vars
- Update sidebars.ts to include weixin docs page
- Wire all gateway integration points onto current main
Salvaged from PR #6747 by Zihan Huang.