hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	aa583cb14e	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 17:51:40 -05:00
Teknium	d2206c69cc	fix(qqbot): add back-compat for env var rename; drop qrcode core dep Follow-up to WideLee's salvaged PR #11582. Back-compat for QQ_HOME_CHANNEL → QQBOT_HOME_CHANNEL rename: - gateway/config.py reads QQBOT_HOME_CHANNEL, falls back to QQ_HOME_CHANNEL with a one-shot deprecation warning so users on the old name aren't silently broken. - cron/scheduler.py: _HOME_TARGET_ENV_VARS['qqbot'] now maps to the new name; _get_home_target_chat_id falls back to the legacy name via a _LEGACY_HOME_TARGET_ENV_VARS table. - hermes_cli/status.py + hermes_cli/setup.py: honor both names when displaying or checking for missing home channels. - hermes_cli/config.py: keep legacy QQ_HOME_CHANNEL[_NAME] in _EXTRA_ENV_KEYS so .env sanitization still recognizes them. Scope cleanup: - Drop qrcode from core dependencies and requirements.txt (remains in messaging/dingtalk/feishu extras). _qqbot_render_qr already degrades gracefully when qrcode is missing, printing a 'pip install qrcode' tip and falling back to URL-only display. - Restore @staticmethod on QQAdapter._detect_message_type (it doesn't use self). Revert the test change that was only needed when it was converted to an instance method. - Reset uv.lock to origin/main; the PR's stale lock also included unrelated changes (atroposlib source URL, hermes-agent version bump, fastapi additions) that don't belong. Verified E2E: - Existing user (QQ_HOME_CHANNEL set): gateway + cron both pick up the legacy name; deprecation warning logs once. - Fresh user (QQBOT_HOME_CHANNEL set): gateway + cron use new name, no warning. - Both set: new name wins on both surfaces. Targeted tests: 296 passed, 4 skipped (qqbot + cron + hermes_cli).	2026-04-17 15:31:14 -07:00
WideLee	103beea7a6	fix(qqbot): fix test failures after package refactor - Re-export _ssrf_redirect_guard from __init__.py - Fix _parse_json @staticmethod using self._log_tag - Update test_detect_message_type to call as instance method - Fix mock.patch path for httpx.AsyncClient in adapter submodule	2026-04-17 15:31:14 -07:00
WideLee	6fd58e1e4a	refactor(qqbot): replace log tags with self._log_tag	2026-04-17 15:31:14 -07:00
WideLee	235e6ecc0e	refactor(qqbot): replace hardcoded log tags with self._log_tag and adjust STT log levels - Remove @staticmethod from _detect_message_type, _convert_silk_to_wav, _convert_raw_to_wav, _convert_ffmpeg_to_wav so they can use self._log_tag - Replace all remaining hardcoded "QQBot" log args with self._log_tag - Downgrade STT routine flow logs (download, convert, success) from info to debug - Keep warning level for actual failures (STT failed, ffmpeg error, empty transcript)	2026-04-17 15:31:14 -07:00
WideLee	02f5e3dc27	refactor(qqbot): use _log_tag with app_id in all logger calls for multi-instance disambiguation	2026-04-17 15:31:14 -07:00
WideLee	6358501915	refactor(qqbot): split qqbot.py into package & add QR scan-to-configure onboard flow - Refactor gateway/platforms/qqbot.py into gateway/platforms/qqbot/ package: - adapter.py: core QQAdapter (unchanged logic, constants from shared module) - constants.py: shared constants (API URLs, timeouts, message types) - crypto.py: AES-256-GCM key generation and secret decryption - onboard.py: QR-code scan-to-configure API (create_bind_task, poll_bind_result) - utils.py: User-Agent builder, HTTP headers, config helpers - __init__.py: re-exports all public symbols for backward compatibility - Add interactive QR-code setup flow in hermes_cli/gateway.py: - Terminal QR rendering via qrcode package (graceful fallback to URL) - Auto-refresh on QR expiry (up to 3 times) - AES-256-GCM encrypted credential exchange - DM security policy selection (pairing/allowlist/open) - Update hermes_cli/setup.py to delegate to gateway's _setup_qqbot() - Add qrcode>=7.4 dependency to pyproject.toml and requirements.txt	2026-04-17 15:31:14 -07:00
Teknium	31e7276474	fix(gateway): consolidate per-session cleanup; close SessionDB on shutdown (#11800 ) Three closely-related fixes for shutdown / lifecycle hygiene. 1. _release_running_agent_state(session_key) helper ---------------------------------------------------- Per-running-agent state lived in three dicts that drifted out of sync across cleanup sites: self._running_agents — AIAgent per session_key self._running_agents_ts — start timestamp per session_key self._busy_ack_ts — last busy-ack timestamp per session_key Inventory before this PR: 8 sites: del self._running_agents[key] — only 1 (stale-eviction) cleaned all three — 1 cleaned _running_agents + _running_agents_ts only — 6 cleaned _running_agents only Each missed entry was a (str, float) tuple per session per gateway lifetime — small, persistent, accumulates across thousands of sessions over months. Per-platform leaks compounded. This change adds a single helper that pops all three dicts in lockstep, and replaces every bare 'del self._running_agents[key]' site with it. Per-session state that PERSISTS across turns (_session_model_overrides, _voice_mode, _pending_approvals, _update_prompt_pending) is intentionally NOT touched here — those have their own lifecycles tied to user actions, not turn boundaries. 2. _running_agents_ts cleared in _stop_impl ---------------------------------------- Was being missed alongside _running_agents.clear(); now included. 3. SessionDB close() in _stop_impl --------------------------------- The SQLite WAL write lock stayed held by the old gateway connection until Python actually exited — causing 'database is locked' errors when --replace launched a new gateway against the same file. We now explicitly close both self._db and self.session_store._db inside _stop_impl, with try/except so a flaky close on one doesn't block the other. Tests ----- tests/gateway/test_session_state_cleanup.py — 10 cases covering: * helper pops all three dicts atomically * idempotent on missing/empty keys * preserves other sessions * tolerates older runners without _busy_ack_ts attribute * thread-safe under concurrent release * regression guard: scans gateway/run.py and fails if a future contributor reintroduces 'del self._running_agents[...]' outside docstrings * SessionDB close called on both holders during shutdown * shutdown tolerates missing session_store * shutdown tolerates close() raising on one db (other still closes) Broader gateway suite: 3108 passed (vs 3100 on baseline) — failure delta is +8 net passes; the 10 remaining failures are pre-existing cross-test pollution / missing optional deps (matrix needs olm, signal/telegram approval flake, dingtalk Mock wiring), all reproduce on stashed baseline.	2026-04-17 15:18:23 -07:00
Teknium	036dacf659	feat(telegram): auto-wrap markdown tables in code blocks (#11794 ) Telegram's MarkdownV2 has no table syntax — pipes get backslash-escaped and tables render as noisy unaligned text. format_message now detects GFM-style pipe tables (header row + delimiter row + optional body) and wraps them in ``` fences before the existing MarkdownV2 conversion runs. Telegram renders fenced code blocks as monospace preformatted text with columns intact. Tables already inside an existing code block are left alone. Plain prose with pipes, lone '---' horizontal rules, and non-table content are unaffected. Closes the recurring community request to stop having to ask the agent to re-render tables as code blocks manually.	2026-04-17 14:27:26 -07:00
Teknium	eb07c05646	fix(gateway): prune stale SessionStore entries to bound memory + disk (#11789 ) SessionStore._entries grew unbounded. Every unique (platform, chat_id, thread_id, user_id) tuple ever seen was kept in RAM and rewritten to sessions.json on every message. A Discord bot in 100 servers x 100 channels x ~100 rotating users accumulates on the order of 10^5 entries after a few months; each sessions.json write becomes an O(n) fsync. Nothing trimmed this — there was no TTL, no cap, no eviction path. Changes ------- * SessionStore.prune_old_entries(max_age_days) — drops entries whose updated_at is older than the cutoff. Preserves: - suspended entries (user paused them via /stop for later resume) - entries with an active background process attached Pruning is functionally identical to a natural reset-policy expiry: SQLite transcript stays, session_key -> session_id mapping dropped, returning user gets a fresh session. * GatewayConfig.session_store_max_age_days (default 90; 0 disables). Serialized in to_dict/from_dict, coerced from bad types / negatives to safe defaults. No migration needed — missing field -> 90 days. * _session_expiry_watcher calls prune_old_entries once per hour (first tick is immediate). Uses the existing watcher loop so no new background task is created. Why not more aggressive ----------------------- 90 days is long enough that legitimate long-idle users (seasonal, vacation, etc.) aren't surprised — pruning just means they get a fresh session on return, same outcome they'd get from any other reset-policy trigger. Admins can lower it via config; 0 disables. Tests ----- tests/gateway/test_session_store_prune.py — 17 cases covering: * entry age based on updated_at, not created_at * max_age_days=0 disables; negative coerces to 0 * suspended + active-process entries are skipped * _save fires iff something was removed * disk JSON reflects post-prune state * thread safety against concurrent readers * config field roundtrips + graceful fallback on bad values * watcher gate logic (first tick prunes, subsequent within 1h don't) 119 broader session/gateway tests remain green.	2026-04-17 13:48:49 -07:00
Brooklyn Nicholson	1f37ef2fd1	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 08:59:33 -05:00
Young Sherlock	8dcd08d8bb	Fix Weixin media uploads and refresh lockfile	2026-04-17 06:50:36 -07:00
shenuu	3a0ec1d935	fix(weixin): macOS SSL cert, QR data, and refresh rendering - Use certifi CA bundle for aiohttp SSL in qr_login(), start(), and send_weixin_direct() to fix SSL verification failures against Tencent's iLink server on macOS (Homebrew OpenSSL lacks system certs) - Fix QR code data: encode qrcode_img_content (full liteapp URL) instead of raw hex token — WeChat needs the full URL to resolve the scan - Render ASCII QR on refresh so the user can re-scan without restarting - Improve error message on QR render failure to show the actual exception Tested on macOS (Apple Silicon, Homebrew Python 3.13)	2026-04-17 06:50:36 -07:00
jinzheng8115	e105b7ac93	fix(weixin): retry send without context_token on iLink session expiry iLink context_token has a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail with errcode -14 (session timeout). Tested that iLink accepts sends without context_token as a degraded fallback, so we now automatically strip the expired token and retry once. This keeps scheduled push messages (weather, digests, etc.) working reliably without requiring a user message to refresh the session first. Changes: - _send_text_chunk() catches iLinkDeliveryError with session-expired errcode (-14) and retries without context_token - Stale tokens are cleared from ContextTokenStore on session expiry - All 34 existing weixin tests pass	2026-04-17 06:50:36 -07:00
memosr	cedc95c100	fix(security): validate WeChat media URLs against CDN allowlist to prevent SSRF	2026-04-17 06:50:36 -07:00
Teknium	3f3d8a7b24	fix(discord): strip mention syntax from auto-thread names Previously a message like `<@&1490963422786093149> help` would spawn a thread literally named `<@&1490963422786093149> help`, exposing raw Discord mention markers in the thread list. Only user mentions (`<@id>`) were being stripped upstream — role mentions (`<@&id>`) and channel mentions (`<#id>`) leaked through. Fix: strip all three mention patterns in `_auto_create_thread` before building the thread name. Collapse runs of whitespace left by the removal. If the entire content was mention-only, fall back to 'Hermes' instead of an empty title. Fixes #6336. Tests: two new regression guards in test_discord_slash_commands.py covering mixed-mention content and mention-only content.	2026-04-17 06:46:52 -07:00
sgaofen	32a694ad5f	fix(discord): fall back when auto-thread creation fails	2026-04-17 06:46:52 -07:00
OwenYWT	f5dc4e905d	fix(discord): skip auto-threading reply messages	2026-04-17 06:46:52 -07:00
Matteo De Agazio	93fe4b357d	fix(discord): free-response channels skip auto-threading Free-response channels already bypassed the @mention gate so users could chat inline with the bot, but auto-threading still fired on every message — spinning off a thread per message and defeating the lightweight-chat purpose. Fix: fold `is_free_channel` into `skip_thread` so threading is skipped whenever the channel is in DISCORD_FREE_RESPONSE_CHANNELS (via env or discord.free_response_channels in config.yaml). Net change: one line in _handle_message + one regression test. Partially addresses #9399. Authored by @Hypn0sis (salvaged from PR #9650; the bundled 'smart' auto-thread mode from that PR was dropped in favor of deterministic true/false semantics).	2026-04-17 06:46:52 -07:00
Teknium	8d7b7feb0d	fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction (#11565 ) * fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction The per-session AIAgent cache was unbounded. Each cached AIAgent holds LLM clients, tool schemas, memory providers, and a conversation buffer. In a long-lived gateway serving many chats/threads, cached agents accumulated indefinitely — entries were only evicted on /new, /model, or session reset. Changes: - Cache is now an OrderedDict so we can pop least-recently-used entries. - _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64 when a new agent is inserted. LRU order is refreshed via move_to_end() on cache hits. - _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing _session_expiry_watcher so no new background task is created. - The expiry watcher now also pops the cache entry after calling _cleanup_agent_resources on a flushed session — previously the agent was shut down but its reference stayed in the cache dict. - Evicted agents have _cleanup_agent_resources() called on a daemon thread so the cache lock isn't held during slow teardown. Both tuning constants live at module scope so tests can monkeypatch them without touching class state. Tests: 7 new cases in test_agent_cache.py covering LRU eviction, move_to_end refresh, cleanup thread dispatch, idle TTL sweep, defensive handling of agents without _last_activity_ts, and plain-dict test fixture tolerance. * tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128 * fix(gateway): never evict mid-turn agents; live spillover tests The prior commit could tear down an active agent if its session_key happened to be LRU when the cap was exceeded. AIAgent.close() kills process_registry entries for the task, tears down the terminal sandbox, closes the OpenAI client (sets self.client = None), and cascades .close() into any active child subagents — all fatal if the agent is still processing a turn. Changes: - _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at GatewayRunner._running_agents and skip any entry whose AIAgent instance is present (identity via id(), so MagicMock doesn't confuse lookup in tests). _AGENT_PENDING_SENTINEL is treated as 'not active' since no real agent exists yet. - Eviction only considers the LRU-excess window (first size-cap entries). If an excess slot is held by a mid-turn agent, we skip it WITHOUT compensating by evicting a newer entry. A freshly inserted session (zero cache history) shouldn't be punished to protect a long-lived one that happens to be busy. - Cache may therefore stay transiently over cap when load spikes; a WARNING is logged so operators can see it, and the next insert re-runs the check after some turns have finished. New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive): - Active LRU entry is skipped; no newer entry compensated - Mixed active/idle excess window: only idle slots go - All-active cache: no eviction, WARNING logged, all clients intact - _AGENT_PENDING_SENTINEL doesn't block other evictions - Idle-TTL sweep skips active agents - End-to-end: active agent's .client survives eviction attempt - Live fill-to-cap with real AIAgents, then spillover - Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown - Live: 8 threads racing 160 inserts into CAP=16 — settles at 16 - Live: evicted session's next turn gets a fresh agent that works 30 tests pass (13 pre-existing + 17 new). Related gateway suites (model switch, session reset, proxy, etc.) all green. * fix(gateway): cache eviction preserves per-task state for session resume The prior commits called AIAgent.close() on cache-evicted agents, which tears down process_registry entries, terminal sandbox, and browser daemon for that task_id — permanently. Fine for session-expiry (session ended), wrong for cache eviction (session may resume). Real-world scenario: a user leaves a Telegram session open for 2+ hours, idle TTL evicts the cached AIAgent, user returns and sends a message. Conversation history is preserved via SessionStore, but their terminal sandbox (cwd, env vars, bg shells) and browser state were destroyed. Fix: split the two cleanup modes. close() Full teardown — session ended. Kills bg procs, tears down terminal sandbox + browser daemon, closes LLM client. Used by session-expiry, /new, /reset (unchanged). release_clients() Soft cleanup — session may resume. Closes LLM client only. Leaves process_registry, terminal sandbox, browser daemon intact for the resuming agent to inherit via shared task_id. Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents) now dispatches _release_evicted_agent_soft on the daemon thread instead of _cleanup_agent_resources. All session-expiry call sites of _cleanup_agent_resources are unchanged. Tests (TestAgentCacheIdleResume, 5 new cases): - release_clients does NOT call process_registry.kill_all - release_clients does NOT call cleanup_vm / cleanup_browser - release_clients DOES close the LLM client (agent.client is None after) - close() vs release_clients() — semantic contract pinned - Idle-evicted session's rebuild with same session_id gets same task_id Updated test_cap_triggers_cleanup_thread to assert the soft path fires and the hard path does NOT. 35 tests pass in test_agent_cache.py; 67 related tests green.	2026-04-17 06:36:34 -07:00
Ubuntu	5ca52bae5b	fix(gateway/weixin): split poll/send sessions, reuse live adapter for cron & send_message - gateway/platforms/weixin.py: - Split aiohttp.ClientSession into _poll_session and _send_session - Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session - Fixes silent message loss when gateway is running (iLink token contention) - cron/scheduler.py: - Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery - Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config - tools/send_message_tool.py: - Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry - Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution - tests/gateway/test_weixin.py: - Update mocks to include _send_session	2026-04-17 06:26:43 -07:00
Teknium	c60b6dc317	test(dingtalk): cover get_connected_platforms + null platform_toolsets Follow-ups to the salvaged commits in this PR: * gateway/config.py — strip trailing whitespace from youngDoo's diff (line 315 had ~140 trailing spaces). * hermes_cli/tools_config.py — replace `config.get("platform_toolsets", {})` with `config.get("platform_toolsets") or {}`. Handles the case where the YAML key is present but explicitly null (parses as None, previously crashed with AttributeError on the next line's .get(platform)). Cherry-picked from yyq4193's #9003 with attribution. * tests/gateway/test_config.py — 4 new tests for TestGetConnectedPlatforms covering DingTalk via extras, via env vars, disabled, and missing creds. * tests/hermes_cli/test_tools_config.py — regression test for the null platform_toolsets edge case. * scripts/release.py — add kagura-agent, youngDoo, yyq4193 to AUTHOR_MAP. Co-authored-by: yyq4193 <39405770+yyq4193@users.noreply.github.com>	2026-04-17 06:26:18 -07:00
kagura-agent	47a0dd1024	fix(dingtalk): fire-and-forget message processing & session_webhook fallback Fixes #11463: DingTalk channel receives messages but fails to reply with 'No session_webhook available'. Two changes: 1. Fire-and-forget message processing: process() now dispatches _on_message as a background task via asyncio.create_task instead of awaiting it. This ensures the SDK ACK is returned immediately, preventing heartbeat timeouts and disconnections when message processing takes longer than the SDK's ACK deadline. 2. session_webhook extraction fallback: If ChatbotMessage.from_dict() fails to map the sessionWebhook field (possible across SDK versions), the handler now falls back to extracting it directly from the raw callback data dict using both 'sessionWebhook' and 'session_webhook' key variants. Added 3 tests covering webhook extraction, fallback behavior, and fire-and-forget ACK timing.	2026-04-17 06:26:18 -07:00
youngDoo	91e7aff219	gateway cant add DingTalk platform gateway cant add DingTalk platform without key and secret	2026-04-17 06:26:18 -07:00
Teknium	e5b880264b	fix(discord): harden DISCORD_ALLOWED_ROLES and cover gateway layer Two follow-ups to the cherry-picked PR #9873 (`e3bcc819`): 1. `_is_allowed_user` now uses `getattr(self, '_allowed_*_ids', set())` so test fixtures that build the adapter via `object.__new__` (skipping __init__) don't crash with AttributeError. See AGENTS.md pitfall #17 — same pattern as gateway.run. 2. New 3-case regression coverage in test_discord_bot_auth_bypass.py: - role-only config bypasses the gateway 'no allowlists' branch - roles + users combined still authorizes user-allowlist matches - the role bypass does NOT leak to other platforms (Telegram, etc.) 3. Autouse fixture in test_discord_bot_auth_bypass.py clears all Discord auth env vars before each test so DISCORD_ALLOWED_ROLES leakage from a previous test in the session can't flip later 'should-reject' tests into false-pass. Required because the bare cherry-pick of #9873 only added the adapter- level role check — it didn't cover the gateway-level _is_user_authorized, which still rejected role-only setups via the 'no allowlists configured' branch.	2026-04-17 05:48:26 -07:00
0xyg3n	541a3e27d7	feat(discord): add DISCORD_ALLOWED_ROLES env var for role-based access control Adds a new DISCORD_ALLOWED_ROLES environment variable that allows filtering bot interactions by Discord role ID. Uses OR semantics with the existing DISCORD_ALLOWED_USERS - if a user matches either allowlist, they're permitted. Changes: - Parse DISCORD_ALLOWED_ROLES comma-separated role IDs on connect - Enable members intent when roles are configured (needed for role lookup) - Update _is_allowed_user() to accept optional author param for direct role check - Fallback to scanning mutual guilds when author object lacks roles (DMs, voice) - Fully backwards compatible: no behavior change when env var is unset	2026-04-17 05:48:26 -07:00
gnanam1990	0f4403346d	fix(discord): DISCORD_ALLOW_BOTS=mentions/all now works without DISCORD_ALLOWED_USERS Fixes #4466. Root cause: two sequential authorization gates both independently rejected bot messages, making DISCORD_ALLOW_BOTS completely ineffective. Gate 1 — `discord.py` `on_message`: _is_allowed_user ran BEFORE the bot filter, so bot senders were dropped before the DISCORD_ALLOW_BOTS policy was ever evaluated. Gate 2 — `gateway/run.py` _is_user_authorized: The gateway-level allowlist check rejected bot IDs with 'Unauthorized user: <bot_id>' even if they passed Gate 1. Fix: gateway/platforms/discord.py — reorder on_message so DISCORD_ALLOW_BOTS runs BEFORE _is_allowed_user. Bots permitted by the filter skip the user allowlist; non-bots are still checked. gateway/session.py — add is_bot: bool = False to SessionSource so the gateway layer can distinguish bot senders. gateway/platforms/base.py — expose is_bot parameter in build_source. gateway/platforms/discord.py _handle_message — set is_bot=True when building the SessionSource for bot authors. gateway/run.py _is_user_authorized — when source.is_bot is True AND DISCORD_ALLOW_BOTS is 'mentions' or 'all', return True early. Platform filter already validated the message at on_message; don't re-reject. Behavior matrix: \| Config \| Before \| After \| \| DISCORD_ALLOW_BOTS=none (default) \| Blocked \| Blocked \| \| DISCORD_ALLOW_BOTS=all \| Blocked \| Allowed \| \| DISCORD_ALLOW_BOTS=mentions + @mention \| Blocked \| Allowed \| \| DISCORD_ALLOW_BOTS=mentions, no mention \| Blocked \| Blocked \| \| Human in DISCORD_ALLOWED_USERS \| Allowed \| Allowed \| \| Human NOT in DISCORD_ALLOWED_USERS \| Blocked \| Blocked \| Co-authored-by: Hermes Maintainer <hermes@nousresearch.com>	2026-04-17 05:42:04 -07:00
Teknium	d7fb435e0e	fix(discord): flat /skill command with autocomplete — fits 8KB limit trivially (#11580 ) Closes #11321, closes #10259. ## Problem The nested /skill command group (category subcommand groups + skill subcommands) serialized to ~14KB with the default 75-skill catalog, exceeding Discord's ~8000-byte per-command registration payload. The entire tree.sync() rejected with error 50035 — ALL slash commands including the 27 base commands failed to register. ## Fix Replace the nested Group layout with a single flat Command: /skill name:<autocomplete> args:<optional string> Autocomplete options are fetched dynamically by Discord when the user types — they do NOT count against the per-command registration budget. So this single command registers at ~200 bytes regardless of how many skills exist. Scales to thousands of skills with no size calculations, no splitting, no hidden skills. UX improvements: - Discord live-filters by user's typed prefix against BOTH name and description, so '/skill pdf' finds 'ocr-and-documents' via its description. More discoverable than clicking through category menus. - Unknown skill name → ephemeral error pointing user at autocomplete. - Stable alphabetical ordering across restarts. ## Why not the other proposed approaches Three prior PRs tried to fit within the 8KB limit by modifying the nested layout: - #10214 (njiangk): truncated all descriptions to 'Run <name>' and category descriptions to 'Skills'. Works but destroys slash picker UX. - #11385 (LeonSGP43): 40-char description clamp + iterative trim-largest-category fallback. Works but HIDES skills the user can no longer invoke via slash — functional regression. - #10261 (zeapsu): adaptive split into /skill-<cat> top-level groups. Preserves all skills but pollutes the slash namespace with 20 top-level commands. All three work around the symptom. The flat autocomplete design dissolves the problem — there is no payload-size pressure to manage. ## Tests tests/gateway/test_discord_slash_commands.py — 5 new test cases replace the 3 old nested-structure tests: - flat-not-nested structure assertion - empty skills → no command registered - callback dispatches the right cmd_key by name - unknown name → ephemeral error, no dispatch - large-catalog regression guard (500 skills) — command payload stays under 500 bytes regardless E2E validated against real discord.py 2.7.1: - Command registers as discord.app_commands.Command (not Group). - Autocomplete filters by name AND description (verified across several queries including description-only matches like 'pdf' → OCR skill). - 500-skill catalog returns max 25 results per autocomplete query (Discord's hard cap), filtered correctly. - Choice labels formatted as 'name — description' clamped to 100 chars.	2026-04-17 05:19:14 -07:00
meng93	9deeee7bb7	feat(dingtalk): add QR code auth support and fix 3 critical bugs - feat: support one-click QR scan to create DingTalk bot and establish connection - fix(gateway): wrap blocking DingTalkStreamClient.start() with asyncio.to_thread() - fix(gateway): extract message fields from CallbackMessage payload instead of ChatbotMessage - fix(gateway): add oapi.dingtalk.com to allowed webhook URL domains	2026-04-17 05:08:07 -07:00
Berny Linville	6ee65b4d61	fix(weixin): preserve native markdown rendering - stop rewriting markdown tables, headings, and links before delivery - keep markdown table blocks and headings together during chunking - update Weixin tests and docs for native markdown rendering Closes #10308	2026-04-17 05:01:29 -07:00
Hedgeho9	498fc6780e	fix(weixin): extract and deliver MEDIA: attachments in normal send() path The Weixin adapter's send() method previously split and delivered the raw response text without first extracting MEDIA: tags or bare local file paths. This meant images, documents, and voice files referenced by the agent were silently dropped in normal (non-streaming, non-background) conversations. Changes: - In WeixinAdapter.send(), call extract_media() and extract_local_files() before formatting/splitting text. - Deliver extracted files via send_image_file(), send_document(), send_voice(), or send_video() prior to sending text chunks. - Also fix two minor typing issues in gateway/run.py where extract_media() tuples were not unpacked correctly in background and /btw task handlers. Fixes missing media delivery on Weixin personal accounts.	2026-04-17 05:01:29 -07:00
Patrick Wang	4ed6e4c1a5	refactor(weixin): drop pilk dependency from voice fallback	2026-04-17 05:01:29 -07:00
Patrick Wang	649f38390c	fix: force Weixin voice fallback to file attachments	2026-04-17 05:01:29 -07:00
Patrick Wang	678b69ec1b	fix(weixin): use Tencent SILK encoding for voice replies	2026-04-17 05:01:29 -07:00
Teknium	53da34a4fc	fix(discord): route attachment downloads through authenticated bot session (#11568 ) Three open issues — #8242, #6587, #11345 — all trace to the same root cause: the image / audio / document download paths in `DiscordAdapter._handle_message` used plain, unauthenticated HTTP to fetch `att.url`. That broke in three independent ways: #8242 cdn.discordapp.com attachment URLs increasingly require the bot session to download; unauthenticated httpx sees 403 Forbidden, image/voice analysis fail silently. #6587 Some user environments (VPNs, corporate DNS, tunnels) resolve cdn.discordapp.com to private-looking IPs. Our is_safe_url() guard correctly blocks them as SSRF risks, but the user environment is legitimate — image analysis and voice STT die. #11345 The document download path skipped is_safe_url() entirely — raw aiohttp.ClientSession.get(att.url) with no SSRF check, inconsistent with the image/audio branches. Unified fix: use `discord.Attachment.read()` as the primary download path on all three branches. `att.read()` routes through discord.py's own authenticated HTTPClient, so: - Discord CDN auth is handled (#8242 resolved). - Our is_safe_url() gate isn't consulted for the attachment path at all — the bot session handles networking internally (#6587 resolved). - All three branches now share the same code path, eliminating the document-path SSRF gap (#11345 resolved). Falls back to the existing cache_*_from_url helpers (image/audio) or an SSRF-gated aiohttp fetch (documents) when `att.read()` is unavailable or fails — preserves defense-in-depth for any future payload-schema drift that could slip a non-CDN URL into att.url. New helpers on DiscordAdapter: - _read_attachment_bytes(att) — safe att.read() wrapper - _cache_discord_image(att, ext) — primary + URL fallback - _cache_discord_audio(att, ext) — primary + URL fallback - _cache_discord_document(att, ext) — primary + SSRF-gated aiohttp fallback Tests: - tests/gateway/test_discord_attachment_download.py — 12 new cases covering all three helpers: primary path, fallback on missing .read(), fallback on validator rejection, SSRF guard on document fallback, aiohttp fallback happy-path, and an E2E case via _handle_message confirming cache_image_from_url is never invoked when att.read() succeeds. - All 11 existing document-handling tests continue to pass via the aiohttp fallback path (their SimpleNamespace attachments have no .read(), which triggers the fallback — now SSRF-gated). Closes #8242, closes #6587, closes #11345.	2026-04-17 04:59:03 -07:00
LehaoLin	504e7eb9e5	fix(gateway): wait for reconnection before dropping WebSocket sends When a WebSocket-based platform adapter (e.g. QQ Bot) temporarily loses its connection, send() now polls is_connected for up to 15s instead of immediately returning a non-retryable failure. If the auto-reconnect completes within the window, the message is delivered normally. On timeout, the SendResult is marked retryable=True so the base class retry mechanism can attempt re-delivery. Same treatment applied to _send_media(). Adds 4 async tests covering: - Successful send after simulated reconnection - Retryable failure on timeout - Immediate success when already connected - _send_media reconnection wait Fixes #11163	2026-04-17 04:22:40 -07:00
dieutx	995177d542	fix(gateway): honor QQ_GROUP_ALLOWED_USERS in runner auth	2026-04-17 04:22:40 -07:00
Pedro Gonzalez	590c9964e1	Fix QQ voice attachment SSRF validation	2026-04-17 04:22:40 -07:00
yule975	9039273ff0	feat(platforms): add require_mention + allowed_users gating to DingTalk DingTalk was the only messaging platform without group-mention gating or a per-user allowlist. Slack, Telegram, Discord, WhatsApp, Matrix, and Mattermost all support these via config.yaml + matching env vars; this change closes the gap for DingTalk using the same surface: Config: platforms.dingtalk.require_mention: bool (env: DINGTALK_REQUIRE_MENTION) platforms.dingtalk.mention_patterns: list (env: DINGTALK_MENTION_PATTERNS) platforms.dingtalk.free_response_chats: list (env: DINGTALK_FREE_RESPONSE_CHATS) platforms.dingtalk.allowed_users: list (env: DINGTALK_ALLOWED_USERS) Semantics mirror Telegram's implementation: - DMs are always accepted (subject to allowed_users). - Group messages are accepted only when the chat is allowlisted, mention is not required, the bot was @mentioned (dingtalk_stream sets is_in_at_list), or the text matches a configured regex wake-word. - allowed_users matches sender_id / sender_staff_id case-insensitively; a single "*" disables the check. Rationale: without this, any DingTalk user in a group chat can trigger the bot, which makes DingTalk less safe to deploy than the other platforms. A user's config.yaml already accepts require_mention for dingtalk but the value was silently ignored.	2026-04-17 04:21:49 -07:00
LeonSGP43	a448e7a04d	fix(discord): drop invalid reply references	2026-04-17 04:17:56 -07:00
Asunfly	7c932c5aa4	fix(dingtalk): close websocket on disconnect	2026-04-17 04:11:30 -07:00
赵晨飞	902d6b97d6	fix(weixin): correct send_image_file parameter name to match base class The send_image_file method in WeixinAdapter used 'path' as parameter name, but BasePlatformAdapter and gateway callers use 'image_path'. This mismatch caused image sending to fail when called through the gateway's extract_media path. Changed parameter name from 'path' to 'image_path' to match the interface defined in base.py and the calls in gateway/run.py.	2026-04-17 04:09:21 -07:00
Michel Belleau	efa6c9f715	fix(discord): default allowed_mentions to block @everyone and role pings discord.py does not apply a default AllowedMentions to the client, so any reply whose content contains @everyone/@here or a role mention would ping the whole server — including verbatim echoes of user input or LLM output that happens to contain those tokens. Set a safe default on commands.Bot: everyone=False, roles=False, users=True, replied_user=True. Operators can opt back in via four DISCORD_ALLOW_MENTION_* env vars or discord.allow_mentions.* in config.yaml. No behavior change for normal user/reply pings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 04:08:42 -07:00
Teknium	3438d274f6	fix(dingtalk): repair _extract_text for dingtalk-stream >= 0.20 SDK shape The cherry-picked SDK compat fix (previous commit) wired process() to parse CallbackMessage.data into a ChatbotMessage, but _extract_text() was still written against the pre-0.20 payload shape: * message.text changed from dict {content: ...} → TextContent object. The old code's str(text) fallback produced 'TextContent(content=...)' as the agent's input, so every received message came in mangled. * rich_text moved from message.rich_text (list) to message.rich_text_content.rich_text_list. This preserves legacy fallbacks (dict-shaped text, bare rich_text list) while handling the current SDK layout via hasattr(text, 'content'). Adds regression tests covering: * webhook domain allowlist (api., oapi., and hostile lookalikes) * _IncomingHandler.process is a coroutine function * _extract_text against TextContent object, dict, rich_text_content, legacy rich_text, and empty-message cases Also adds kevinskysunny to scripts/release.py AUTHOR_MAP (release CI blocks unmapped emails).	2026-04-17 00:52:35 -07:00
Kevin S. Sunny	c3d2895b18	fix(dingtalk): support dingtalk-stream 0.24+ and oapi webhooks	2026-04-17 00:52:35 -07:00
Fatty911	94168b7f60	fix: register missing Feishu event handlers for P2P chat entered and message recalled	2026-04-16 22:08:11 -07:00
Teknium	7af9bf3a54	fix(feishu): queue inbound events when adapter loop not ready (#5499 ) (#11372 ) Inbound Feishu messages arriving during brief windows when the adapter loop is unavailable (startup/restart transitions, network-flap reconnect) were silently dropped with a WARNING log. This matches the symptom in issue #5499 — and users have reported seeing only a subset of their messages reach the agent. Fix: queue pending events in a thread-safe list and spawn a single drainer thread that replays them once the loop becomes ready. Covers these scenarios: * Queue events instead of dropping when loop is None/closed * Single drainer handles the full queue (not thread-per-event) * Thread-safe with threading.Lock on the queue and schedule flag * Handles mid-drain bursts (new events arrive while drainer is working) * Handles RuntimeError if loop closes between check and submit * Depth cap (1000) prevents unbounded growth during extended outages * Drops queue cleanly on disconnect rather than holding forever * Safety timeout (120s) prevents infinite retention on broken adapters Based on the approach proposed in #4789 by milkoor, rewritten for thread-safety and correctness. Test plan: * 5 new unit tests (TestPendingInboundQueue) — all passing * E2E test with real asyncio loop + fake WS thread: 10-event burst before loop ready → all 10 delivered in order * E2E concurrent burst test: 20 events queued, 20 more arrive during drainer dispatch → all 40 delivered, no loss, no duplicates * All 111 existing feishu tests pass Related: #5499, #4789 Co-authored-by: milkoor <milkoor@users.noreply.github.com>	2026-04-16 20:36:59 -07:00
Brooklyn Nicholson	7f1204840d	test(tui): fix stale mocks + xdist flakes in TUI test suite All 61 TUI-related tests green across 3 consecutive xdist runs. tests/tui_gateway/test_protocol.py: - rename `get_messages` → `get_messages_as_conversation` on mock DB (method was renamed in the real backend, test was still stubbing the old name) - update tool-message shape expectation: `{role, name, context}` matches current `_history_to_messages` output, not the legacy `{role, text}` tests/hermes_cli/test_tui_resume_flow.py: - `cmd_chat` grew a first-run provider-gate that bailed to "Run: hermes setup" before `_launch_tui` was ever reached; 3 tests stubbed `_resolve_last_session` + `_launch_tui` but not the gate - factored a `main_mod` fixture that stubs `_has_any_provider_configured`, reused by all three tests tests/test_tui_gateway_server.py: - `test_config_set_personality_resets_history_and_returns_info` was flaky under xdist because the real `_write_config_key` touches `~/.hermes/config.yaml`, racing with any other worker that writes config. Stub it in the test.	2026-04-16 19:07:49 -05:00
Michel Belleau	c1c9ab534c	fix(discord): strip RTP padding before DAVE/Opus decode (#11267 ) The Discord voice receive path skipped RFC 3550 §5.1 padding handling, passing padding-contaminated payloads into DAVE E2EE decrypt and Opus decode. Symptoms in live VC sessions: deaf inbound speech, intermittent empty STT results, "corrupted stream" decode errors — especially on the first reply after join. When the P bit is set in the RTP header, the last payload byte holds the count of trailing padding bytes (including itself) that must be removed. Receive pipeline now follows the spec order: 1. RTP header parse 2. NaCl transport decrypt (aead_xchacha20_poly1305_rtpsize) 3. strip encrypted RTP extension data from start 4. strip RTP padding from end if P bit set ← was missing 5. DAVE inner media decrypt 6. Opus decode Drops malformed packets where pad_len is 0 or exceeds payload length. Adds 7 integration tests covering valid padded packets, the X+P combined case, padding under DAVE passthrough, and three malformed-padding paths. Closes #11267 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-16 16:50:15 -07:00
helix4u	5d7d574779	fix(gateway): let /queue bypass active-session guard	2026-04-16 16:36:40 -07:00

1 2 3 4 5 ...

1099 commits