hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	6cae0744f0	test: mock retry backoff and compression sleeps in slow tests Cuts ~65s off shard 3's local runtime (108s \u2192 48s) by neutralizing real wall-clock waits in backoff/compression/retry paths. Tests assert behavior (retry count, final result, error handling), never timing. Changes: - tests/run_agent/conftest.py (NEW): autouse fixture mocks run_agent.jittered_backoff to 0.0 for all tests in the directory. Collapses the `while time.time() < sleep_end` busy-loop to a no-op. Does NOT mock time.sleep globally (breaks threading tests). - test_anthropic_error_handling.py: per-file fixture mocks time.sleep and asyncio.sleep for this test's retry paths (6 tests \u00d7 10s \u2192 ~2s each). - test_413_compression.py: mocks time.sleep for the 2s compression retry pauses (9 tests \u00d7 2s \u2192 millisecond range). - test_run_agent_codex_responses.py: mocks time.sleep for Codex retry path (6.8s \u2192 0.24s on the empty-output retry test). - test_fallback_model.py: mocks time.sleep for transport-recovery path. - test_retaindb_plugin.py: caps retaindb module's time.sleep to 0.05s so background writer-thread sleeps don't block tests. Replaces arbitrary time.sleep(N) waits with polling loops. Validation: - tests/run_agent/ + tests/plugins/test_retaindb_plugin.py: 827 passed, 0 failed, 22.9s (was ~75s before). - Matrix shard 3 local: 3098 passed, 48.2s (was 108s). - No test's timing-assertion contract is changed (tests still verify retry happens, just don't wait 5s for it).	2026-04-17 13:19:00 -07:00
Teknium	d0e1388ca9	fix(tests): make AIAgent constructor calls self-contained (#11755 ) * fix(tests): make AIAgent constructor calls self-contained (no env leakage) Tests in tests/run_agent/ were constructing AIAgent() without passing both api_key and base_url, then relying on leaked state from other tests in the same xdist worker (or process-level env vars) to keep provider resolution happy. Under hermetic conftest + pytest-split, that state is gone and the tests fail with 'No LLM provider configured'. Fix: pass both api_key and base_url explicitly on 47 AIAgent() construction sites across 13 files. AIAgent.__init__ with both set takes the direct-construction path (line 960 in run_agent.py) and skips the resolver entirely. One call site (test_none_base_url_passed_as_none) left alone — that test asserts behavior for base_url=None specifically. This is a prerequisite for any future matrix-split or stricter isolation work, and lands cleanly on its own. Validation: - tests/run_agent/ full: 760 passed, 0 failed (local) - Previously relied on cross-test pollution; now self-contained * fix(tests): update opencode-go model order assertion to match kimi-k2.5-first commit `78a74bb` promoted kimi-k2.5 to first position in model suggestion lists but didn't update this test, which has been failing on main since. Reorder expected list to match the new canonical order.	2026-04-17 12:32:03 -07:00
kshitij	78a74bb097	feat: promote kimi-k2.5 to first position in all model suggestion lists (#11745 ) Move moonshotai/kimi-k2.5 to position #1 in every model picker list: - OPENROUTER_MODELS (with 'recommended' tag) - _PROVIDER_MODELS: nous, kimi-coding, opencode-zen, opencode-go, alibaba, huggingface - _model_flow_kimi() Coding Plan model list in main.py kimi-coding-cn and moonshot lists already had kimi-k2.5 first.	2026-04-17 12:05:22 -07:00
Teknium	6ea7386a6f	chore: map memosr, anthhub, shenuu, xiayh0107 emails to AUTHOR_MAP	2026-04-17 06:50:36 -07:00
Young Sherlock	8dcd08d8bb	Fix Weixin media uploads and refresh lockfile	2026-04-17 06:50:36 -07:00
shenuu	3a0ec1d935	fix(weixin): macOS SSL cert, QR data, and refresh rendering - Use certifi CA bundle for aiohttp SSL in qr_login(), start(), and send_weixin_direct() to fix SSL verification failures against Tencent's iLink server on macOS (Homebrew OpenSSL lacks system certs) - Fix QR code data: encode qrcode_img_content (full liteapp URL) instead of raw hex token — WeChat needs the full URL to resolve the scan - Render ASCII QR on refresh so the user can re-scan without restarting - Improve error message on QR render failure to show the actual exception Tested on macOS (Apple Silicon, Homebrew Python 3.13)	2026-04-17 06:50:36 -07:00
jinzheng8115	e105b7ac93	fix(weixin): retry send without context_token on iLink session expiry iLink context_token has a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail with errcode -14 (session timeout). Tested that iLink accepts sends without context_token as a degraded fallback, so we now automatically strip the expired token and retry once. This keeps scheduled push messages (weather, digests, etc.) working reliably without requiring a user message to refresh the session first. Changes: - _send_text_chunk() catches iLinkDeliveryError with session-expired errcode (-14) and retries without context_token - Stale tokens are cleared from ContextTokenStore on session expiry - All 34 existing weixin tests pass	2026-04-17 06:50:36 -07:00
anthhub	4b1567f425	fix(packaging): include qrcode in messaging extra	2026-04-17 06:50:36 -07:00
memosr	cedc95c100	fix(security): validate WeChat media URLs against CDN allowlist to prevent SSRF	2026-04-17 06:50:36 -07:00
Teknium	c7334b4a50	chore(release): map @Hypn0sis and @OwenYWT to AUTHOR_MAP	2026-04-17 06:46:52 -07:00
Teknium	3f3d8a7b24	fix(discord): strip mention syntax from auto-thread names Previously a message like `<@&1490963422786093149> help` would spawn a thread literally named `<@&1490963422786093149> help`, exposing raw Discord mention markers in the thread list. Only user mentions (`<@id>`) were being stripped upstream — role mentions (`<@&id>`) and channel mentions (`<#id>`) leaked through. Fix: strip all three mention patterns in `_auto_create_thread` before building the thread name. Collapse runs of whitespace left by the removal. If the entire content was mention-only, fall back to 'Hermes' instead of an empty title. Fixes #6336. Tests: two new regression guards in test_discord_slash_commands.py covering mixed-mention content and mention-only content.	2026-04-17 06:46:52 -07:00
sgaofen	32a694ad5f	fix(discord): fall back when auto-thread creation fails	2026-04-17 06:46:52 -07:00
OwenYWT	f5dc4e905d	fix(discord): skip auto-threading reply messages	2026-04-17 06:46:52 -07:00
Matteo De Agazio	93fe4b357d	fix(discord): free-response channels skip auto-threading Free-response channels already bypassed the @mention gate so users could chat inline with the bot, but auto-threading still fired on every message — spinning off a thread per message and defeating the lightweight-chat purpose. Fix: fold `is_free_channel` into `skip_thread` so threading is skipped whenever the channel is in DISCORD_FREE_RESPONSE_CHANNELS (via env or discord.free_response_channels in config.yaml). Net change: one line in _handle_message + one regression test. Partially addresses #9399. Authored by @Hypn0sis (salvaged from PR #9650; the bundled 'smart' auto-thread mode from that PR was dropped in favor of deterministic true/false semantics).	2026-04-17 06:46:52 -07:00
Teknium	8d7b7feb0d	fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction (#11565 ) * fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction The per-session AIAgent cache was unbounded. Each cached AIAgent holds LLM clients, tool schemas, memory providers, and a conversation buffer. In a long-lived gateway serving many chats/threads, cached agents accumulated indefinitely — entries were only evicted on /new, /model, or session reset. Changes: - Cache is now an OrderedDict so we can pop least-recently-used entries. - _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64 when a new agent is inserted. LRU order is refreshed via move_to_end() on cache hits. - _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing _session_expiry_watcher so no new background task is created. - The expiry watcher now also pops the cache entry after calling _cleanup_agent_resources on a flushed session — previously the agent was shut down but its reference stayed in the cache dict. - Evicted agents have _cleanup_agent_resources() called on a daemon thread so the cache lock isn't held during slow teardown. Both tuning constants live at module scope so tests can monkeypatch them without touching class state. Tests: 7 new cases in test_agent_cache.py covering LRU eviction, move_to_end refresh, cleanup thread dispatch, idle TTL sweep, defensive handling of agents without _last_activity_ts, and plain-dict test fixture tolerance. * tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128 * fix(gateway): never evict mid-turn agents; live spillover tests The prior commit could tear down an active agent if its session_key happened to be LRU when the cap was exceeded. AIAgent.close() kills process_registry entries for the task, tears down the terminal sandbox, closes the OpenAI client (sets self.client = None), and cascades .close() into any active child subagents — all fatal if the agent is still processing a turn. Changes: - _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at GatewayRunner._running_agents and skip any entry whose AIAgent instance is present (identity via id(), so MagicMock doesn't confuse lookup in tests). _AGENT_PENDING_SENTINEL is treated as 'not active' since no real agent exists yet. - Eviction only considers the LRU-excess window (first size-cap entries). If an excess slot is held by a mid-turn agent, we skip it WITHOUT compensating by evicting a newer entry. A freshly inserted session (zero cache history) shouldn't be punished to protect a long-lived one that happens to be busy. - Cache may therefore stay transiently over cap when load spikes; a WARNING is logged so operators can see it, and the next insert re-runs the check after some turns have finished. New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive): - Active LRU entry is skipped; no newer entry compensated - Mixed active/idle excess window: only idle slots go - All-active cache: no eviction, WARNING logged, all clients intact - _AGENT_PENDING_SENTINEL doesn't block other evictions - Idle-TTL sweep skips active agents - End-to-end: active agent's .client survives eviction attempt - Live fill-to-cap with real AIAgents, then spillover - Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown - Live: 8 threads racing 160 inserts into CAP=16 — settles at 16 - Live: evicted session's next turn gets a fresh agent that works 30 tests pass (13 pre-existing + 17 new). Related gateway suites (model switch, session reset, proxy, etc.) all green. * fix(gateway): cache eviction preserves per-task state for session resume The prior commits called AIAgent.close() on cache-evicted agents, which tears down process_registry entries, terminal sandbox, and browser daemon for that task_id — permanently. Fine for session-expiry (session ended), wrong for cache eviction (session may resume). Real-world scenario: a user leaves a Telegram session open for 2+ hours, idle TTL evicts the cached AIAgent, user returns and sends a message. Conversation history is preserved via SessionStore, but their terminal sandbox (cwd, env vars, bg shells) and browser state were destroyed. Fix: split the two cleanup modes. close() Full teardown — session ended. Kills bg procs, tears down terminal sandbox + browser daemon, closes LLM client. Used by session-expiry, /new, /reset (unchanged). release_clients() Soft cleanup — session may resume. Closes LLM client only. Leaves process_registry, terminal sandbox, browser daemon intact for the resuming agent to inherit via shared task_id. Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents) now dispatches _release_evicted_agent_soft on the daemon thread instead of _cleanup_agent_resources. All session-expiry call sites of _cleanup_agent_resources are unchanged. Tests (TestAgentCacheIdleResume, 5 new cases): - release_clients does NOT call process_registry.kill_all - release_clients does NOT call cleanup_vm / cleanup_browser - release_clients DOES close the LLM client (agent.client is None after) - close() vs release_clients() — semantic contract pinned - Idle-evicted session's rebuild with same session_id gets same task_id Updated test_cap_triggers_cleanup_thread to assert the soft path fires and the hard path does NOT. 35 tests pass in test_agent_cache.py; 67 related tests green.	2026-04-17 06:36:34 -07:00
Teknium	fc04f83062	chore(release): map jvcl author email for release notes	2026-04-17 06:33:21 -07:00
Jorge	fe0e7edd27	fix(cli): clear input buffer after /model picker selection The Enter handler that confirms a selection in the /model picker closed the picker but never reset event.app.current_buffer, leaving the user's original "/model" command lingering in the prompt. Match the ESC and Ctrl+C handlers (which already reset the buffer) so the prompt is empty after a successful switch.	2026-04-17 06:33:21 -07:00
Jorge	86f02d8d71	refactor(cli): align model picker viewport with PR #11260 vocabulary Match the row-budget naming introduced in PR #11260 for the approval and clarify panels: rename chrome_reserve=14 into reserved_below=6 (input chrome below the panel) + panel_chrome=6 (this panel's borders, blanks, and hint row) + min_visible=3 (floor on visible items). Same arithmetic as before, but a reviewer reading both files now sees the same handle. Compact-chrome mode is intentionally not adopted — that pattern fits the "fixed mandatory content might overflow" shape of approval/clarify (solved by truncating with a marker), whereas the picker's overflow is already handled by the scrolling viewport.	2026-04-17 06:33:21 -07:00
Jorge	5fbe16635b	fix(cli): scroll the /model picker viewport so long catalogs aren't clipped The /model picker rendered every choice into a prompt_toolkit Window with no max height. Providers with many models (e.g. Ollama Cloud's 36+) overflowed the terminal, clipping the bottom border and the last items. - Add HermesCLI._compute_model_picker_viewport() to slide a scroll offset that keeps the cursor on screen, sized from the live terminal rows minus chrome reserved for input/status/border. - Render only the visible slice in _get_model_picker_display() and persist the offset on _model_picker_state across redraws. - Bind ESC (eager) to close the picker, matching the Cancel button. - Cover the viewport math with 8 unit tests in tests/hermes_cli/test_model_picker_viewport.py.	2026-04-17 06:33:21 -07:00
Teknium	fdf42d62a0	chore: map briandevans and LLQWQ emails to AUTHOR_MAP	2026-04-17 06:26:43 -07:00
Teknium	f64241ed90	feat(cron+tests): extend origin fallback to email/dingtalk/qqbot + fix Weixin test mocks Cron origin fallback extension (builds on #9193's _HOME_TARGET_ENV_VARS): adds the three remaining origin-fallback-eligible platforms that have home channel env vars configured in gateway/config.py but use non-generic env var names: - email → EMAIL_HOME_ADDRESS (non-standard suffix) - dingtalk → DINGTALK_HOME_CHANNEL - qqbot → QQ_HOME_CHANNEL (non-standard prefix: QQ_ not QQBOT_) Picks up the completeness intent of @Xowiek's PR #11317 using the architecturally-correct dict-based lookup from #9193, so platforms with non-standard env var names actually resolve instead of silently missing. Extended the parametrized regression test to cover the new three. Weixin test mock alignment (builds on #10091's _send_session split): Three test sites added in Batch 1 (TestWeixinSendImageFileParameterName) and Batch 3 (TestWeixinVoiceSending) mocked only adapter._session, but #10091 switched the send paths to check self._send_session. Added the companion setter so the tests stay green with the session split in place.	2026-04-17 06:26:43 -07:00
bde3249023	b46db048c3	fix(cron): align home target env lookup	2026-04-17 06:26:43 -07:00
bde3249023	f696b4745a	fix(cron): restore origin fallback for feishu home channels	2026-04-17 06:26:43 -07:00
Ubuntu	5ca52bae5b	fix(gateway/weixin): split poll/send sessions, reuse live adapter for cron & send_message - gateway/platforms/weixin.py: - Split aiohttp.ClientSession into _poll_session and _send_session - Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session - Fixes silent message loss when gateway is running (iLink token contention) - cron/scheduler.py: - Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery - Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config - tools/send_message_tool.py: - Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry - Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution - tests/gateway/test_weixin.py: - Update mocks to include _send_session	2026-04-17 06:26:43 -07:00
Teknium	c60b6dc317	test(dingtalk): cover get_connected_platforms + null platform_toolsets Follow-ups to the salvaged commits in this PR: * gateway/config.py — strip trailing whitespace from youngDoo's diff (line 315 had ~140 trailing spaces). * hermes_cli/tools_config.py — replace `config.get("platform_toolsets", {})` with `config.get("platform_toolsets") or {}`. Handles the case where the YAML key is present but explicitly null (parses as None, previously crashed with AttributeError on the next line's .get(platform)). Cherry-picked from yyq4193's #9003 with attribution. * tests/gateway/test_config.py — 4 new tests for TestGetConnectedPlatforms covering DingTalk via extras, via env vars, disabled, and missing creds. * tests/hermes_cli/test_tools_config.py — regression test for the null platform_toolsets edge case. * scripts/release.py — add kagura-agent, youngDoo, yyq4193 to AUTHOR_MAP. Co-authored-by: yyq4193 <39405770+yyq4193@users.noreply.github.com>	2026-04-17 06:26:18 -07:00
kagura-agent	47a0dd1024	fix(dingtalk): fire-and-forget message processing & session_webhook fallback Fixes #11463: DingTalk channel receives messages but fails to reply with 'No session_webhook available'. Two changes: 1. Fire-and-forget message processing: process() now dispatches _on_message as a background task via asyncio.create_task instead of awaiting it. This ensures the SDK ACK is returned immediately, preventing heartbeat timeouts and disconnections when message processing takes longer than the SDK's ACK deadline. 2. session_webhook extraction fallback: If ChatbotMessage.from_dict() fails to map the sessionWebhook field (possible across SDK versions), the handler now falls back to extracting it directly from the raw callback data dict using both 'sessionWebhook' and 'session_webhook' key variants. Added 3 tests covering webhook extraction, fallback behavior, and fire-and-forget ACK timing.	2026-04-17 06:26:18 -07:00
youngDoo	91e7aff219	gateway cant add DingTalk platform gateway cant add DingTalk platform without key and secret	2026-04-17 06:26:18 -07:00
Teknium	d404849351	test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577 ) * test: make test env hermetic; enforce CI parity via scripts/run_tests.sh Fixes the recurring 'works locally, fails in CI' (and vice versa) class of flakes by making tests hermetic and providing a canonical local runner that matches CI's environment. ## Layer 1 — hermetic conftest.py (tests/conftest.py) Autouse fixture now unsets every credential-shaped env var before every test, so developer-local API keys can't leak into tests that assert 'auto-detect provider when key present'. Pattern: unset any var ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIALS, _ACCESS_KEY, _PRIVATE_KEY, etc. Plus an explicit list of credential names that don't fit the suffix pattern (AWS_ACCESS_KEY_ID, FAL_KEY, GH_TOKEN, etc.) and all the provider BASE_URL overrides that change auto-detect behavior. Also unsets HERMES_* behavioral vars (HERMES_YOLO_MODE, HERMES_QUIET, HERMES_SESSION_, etc.) that mutate agent behavior. Also: - Redirects HOME to a per-test tempdir (not just HERMES_HOME), so code reading ~/.hermes/ directly can't touch the real dir. - Pins TZ=UTC, LANG=C.UTF-8, LC_ALL=C.UTF-8, PYTHONHASHSEED=0 to match CI's deterministic runtime. The old _isolate_hermes_home fixture name is preserved as an alias so any test that yields it explicitly still works. ## Layer 2 — scripts/run_tests.sh canonical runner 'Always use scripts/run_tests.sh, never call pytest directly' is the new rule (documented in AGENTS.md). The script: - Unsets all credential env vars (belt-and-suspenders for callers who bypass conftest — e.g. IDE integrations) - Pins TZ/LANG/PYTHONHASHSEED - Uses -n 4 xdist workers (matches GHA ubuntu-latest; -n auto on a 20-core workstation surfaces test-ordering flakes CI will never see, causing the infamous 'passes in CI, fails locally' drift) - Finds the venv in .venv, venv, or main checkout's venv - Passes through arbitrary pytest args Installs pytest-split on demand so the script can also be used to run matrix-split subsets locally for debugging. ## Remove 3 module-level dotenv stubs that broke test isolation tests/hermes_cli/test_{arcee,xiaomi,api_key}_provider.py each had a module-level: if 'dotenv' not in sys.modules: fake_dotenv = types.ModuleType('dotenv') fake_dotenv.load_dotenv = lambda a, kw: None sys.modules['dotenv'] = fake_dotenv This patches sys.modules['dotenv'] to a fake at import time with no teardown. Under pytest-xdist LoadScheduling, whichever worker collected one of these files first poisoned its sys.modules; subsequent tests in the same worker that imported load_dotenv transitively (e.g. test_env_loader.py via hermes_cli.env_loader) got the no-op lambda and saw their assertions fail. dotenv is a required dependency (python-dotenv>=1.2.1 in pyproject.toml), so the defensive stub was never needed. Removed. ## Validation - tests/hermes_cli/ alone: 2178 passed, 1 skipped, 0 failed (was 4 failures in test_env_loader.py before this fix) - tests/test_plugin_skills.py, tests/hermes_cli/test_plugins.py, tests/test_hermes_logging.py combined: 123 passed (the caplog regression tests from PR #11453 still pass) - Local full run shows no F/E clusters in the 0-55% range that were previously present before the conftest hardening ## Background See AGENTS.md 'Testing' section for the full list of drift sources this closes. Matrix split (closed as #11566) will be re-attempted once this foundation lands — cross-test pollution was the root cause of the shard-3 hang in that PR. fix(conftest): don't redirect HOME — it broke CI subprocesses PR #11577's autouse fixture was setting HOME to a per-test tempdir. CI started timing out at 97% complete with dozens of E/F markers and orphan python processes at cleanup — tests (or transitive deps) spawn subprocesses that expect a stable HOME, and the redirect broke them in non-obvious ways. Env-var unsetting and TZ/LANG/hashseed pinning (the actual CI-drift fixes) are unchanged and still in place. HERMES_HOME redirection is also unchanged — that's the canonical way to isolate tests from ~/.hermes/, not HOME. Any code in the codebase reading ~/.hermes/* via `Path.home() / ".hermes"` instead of `get_hermes_home()` is a bug to fix at the callsite, not something to paper over in conftest.	2026-04-17 06:09:09 -07:00
Teknium	ee95822e07	chore(release): map jz.pentest@gmail.com to @0xyg3n	2026-04-17 05:48:26 -07:00
Teknium	e5b880264b	fix(discord): harden DISCORD_ALLOWED_ROLES and cover gateway layer Two follow-ups to the cherry-picked PR #9873 (`e3bcc819`): 1. `_is_allowed_user` now uses `getattr(self, '_allowed_*_ids', set())` so test fixtures that build the adapter via `object.__new__` (skipping __init__) don't crash with AttributeError. See AGENTS.md pitfall #17 — same pattern as gateway.run. 2. New 3-case regression coverage in test_discord_bot_auth_bypass.py: - role-only config bypasses the gateway 'no allowlists' branch - roles + users combined still authorizes user-allowlist matches - the role bypass does NOT leak to other platforms (Telegram, etc.) 3. Autouse fixture in test_discord_bot_auth_bypass.py clears all Discord auth env vars before each test so DISCORD_ALLOWED_ROLES leakage from a previous test in the session can't flip later 'should-reject' tests into false-pass. Required because the bare cherry-pick of #9873 only added the adapter- level role check — it didn't cover the gateway-level _is_user_authorized, which still rejected role-only setups via the 'no allowlists configured' branch.	2026-04-17 05:48:26 -07:00
0xyg3n	541a3e27d7	feat(discord): add DISCORD_ALLOWED_ROLES env var for role-based access control Adds a new DISCORD_ALLOWED_ROLES environment variable that allows filtering bot interactions by Discord role ID. Uses OR semantics with the existing DISCORD_ALLOWED_USERS - if a user matches either allowlist, they're permitted. Changes: - Parse DISCORD_ALLOWED_ROLES comma-separated role IDs on connect - Enable members intent when roles are configured (needed for role lookup) - Update _is_allowed_user() to accept optional author param for direct role check - Fallback to scanning mutual guilds when author object lacks roles (DMs, voice) - Fully backwards compatible: no behavior change when env var is unset	2026-04-17 05:48:26 -07:00
Teknium	0741f22463	chore(release): map gnanasekaran.sekareee@gmail.com to @gnanam1990	2026-04-17 05:42:04 -07:00
Teknium	7d888ab49c	test(discord): regression guard for DISCORD_ALLOW_BOTS auth bypass Six test cases covering: - DISCORD_ALLOW_BOTS=mentions + bot not in DISCORD_ALLOWED_USERS → authorized - DISCORD_ALLOW_BOTS=all + bot not in DISCORD_ALLOWED_USERS → authorized - DISCORD_ALLOW_BOTS=none → bots still rejected (preserves security) - DISCORD_ALLOW_BOTS unset → same as 'none' - Humans still checked against allowlist even with allow_bots=all - Bot bypass is Discord-specific — doesn't leak to other platforms Guards against a regression where the is_bot bypass in _is_user_authorized gets moved, removed, or accidentally extended to other platforms.	2026-04-17 05:42:04 -07:00
gnanam1990	0f4403346d	fix(discord): DISCORD_ALLOW_BOTS=mentions/all now works without DISCORD_ALLOWED_USERS Fixes #4466. Root cause: two sequential authorization gates both independently rejected bot messages, making DISCORD_ALLOW_BOTS completely ineffective. Gate 1 — `discord.py` `on_message`: _is_allowed_user ran BEFORE the bot filter, so bot senders were dropped before the DISCORD_ALLOW_BOTS policy was ever evaluated. Gate 2 — `gateway/run.py` _is_user_authorized: The gateway-level allowlist check rejected bot IDs with 'Unauthorized user: <bot_id>' even if they passed Gate 1. Fix: gateway/platforms/discord.py — reorder on_message so DISCORD_ALLOW_BOTS runs BEFORE _is_allowed_user. Bots permitted by the filter skip the user allowlist; non-bots are still checked. gateway/session.py — add is_bot: bool = False to SessionSource so the gateway layer can distinguish bot senders. gateway/platforms/base.py — expose is_bot parameter in build_source. gateway/platforms/discord.py _handle_message — set is_bot=True when building the SessionSource for bot authors. gateway/run.py _is_user_authorized — when source.is_bot is True AND DISCORD_ALLOW_BOTS is 'mentions' or 'all', return True early. Platform filter already validated the message at on_message; don't re-reject. Behavior matrix: \| Config \| Before \| After \| \| DISCORD_ALLOW_BOTS=none (default) \| Blocked \| Blocked \| \| DISCORD_ALLOW_BOTS=all \| Blocked \| Allowed \| \| DISCORD_ALLOW_BOTS=mentions + @mention \| Blocked \| Allowed \| \| DISCORD_ALLOW_BOTS=mentions, no mention \| Blocked \| Blocked \| \| Human in DISCORD_ALLOWED_USERS \| Allowed \| Allowed \| \| Human NOT in DISCORD_ALLOWED_USERS \| Blocked \| Blocked \| Co-authored-by: Hermes Maintainer <hermes@nousresearch.com>	2026-04-17 05:42:04 -07:00
Teknium	d7fb435e0e	fix(discord): flat /skill command with autocomplete — fits 8KB limit trivially (#11580 ) Closes #11321, closes #10259. ## Problem The nested /skill command group (category subcommand groups + skill subcommands) serialized to ~14KB with the default 75-skill catalog, exceeding Discord's ~8000-byte per-command registration payload. The entire tree.sync() rejected with error 50035 — ALL slash commands including the 27 base commands failed to register. ## Fix Replace the nested Group layout with a single flat Command: /skill name:<autocomplete> args:<optional string> Autocomplete options are fetched dynamically by Discord when the user types — they do NOT count against the per-command registration budget. So this single command registers at ~200 bytes regardless of how many skills exist. Scales to thousands of skills with no size calculations, no splitting, no hidden skills. UX improvements: - Discord live-filters by user's typed prefix against BOTH name and description, so '/skill pdf' finds 'ocr-and-documents' via its description. More discoverable than clicking through category menus. - Unknown skill name → ephemeral error pointing user at autocomplete. - Stable alphabetical ordering across restarts. ## Why not the other proposed approaches Three prior PRs tried to fit within the 8KB limit by modifying the nested layout: - #10214 (njiangk): truncated all descriptions to 'Run <name>' and category descriptions to 'Skills'. Works but destroys slash picker UX. - #11385 (LeonSGP43): 40-char description clamp + iterative trim-largest-category fallback. Works but HIDES skills the user can no longer invoke via slash — functional regression. - #10261 (zeapsu): adaptive split into /skill-<cat> top-level groups. Preserves all skills but pollutes the slash namespace with 20 top-level commands. All three work around the symptom. The flat autocomplete design dissolves the problem — there is no payload-size pressure to manage. ## Tests tests/gateway/test_discord_slash_commands.py — 5 new test cases replace the 3 old nested-structure tests: - flat-not-nested structure assertion - empty skills → no command registered - callback dispatches the right cmd_key by name - unknown name → ephemeral error, no dispatch - large-catalog regression guard (500 skills) — command payload stays under 500 bytes regardless E2E validated against real discord.py 2.7.1: - Command registers as discord.app_commands.Command (not Group). - Autocomplete filters by name AND description (verified across several queries including description-only matches like 'pdf' → OCR skill). - 500-skill catalog returns max 25 results per autocomplete query (Discord's hard cap), filtered correctly. - Choice labels formatted as 'name — description' clamped to 100 chars.	2026-04-17 05:19:14 -07:00
Teknium	13f2d997b0	test(dingtalk): cover QR device-flow auth + OpenClaw branding disclosure Adds 15 regression tests for hermes_cli/dingtalk_auth.py covering: * _api_post — network error mapping, errcode-nonzero mapping, success path * begin_registration — 2-step chain, missing-nonce/device_code/uri error cases * wait_for_registration_success — success path, missing-creds guard, on_waiting callback invocation * render_qr_to_terminal — returns False when qrcode missing, prints when available * Configuration — BASE_URL default + override, SOURCE default Also adds a one-line disclosure in dingtalk_qr_auth() telling users the scan page will be OpenClaw-branded. Interim measure: DingTalk's registration portal is hardcoded to route all sources to /openapp/ registration/openClaw, so users see OpenClaw branding regardless of what 'source' value we send. We keep 'openClaw' as the source token until DingTalk-Real-AI registers a Hermes-specific template. Also adds meng93 to scripts/release.py AUTHOR_MAP.	2026-04-17 05:08:07 -07:00
meng93	9deeee7bb7	feat(dingtalk): add QR code auth support and fix 3 critical bugs - feat: support one-click QR scan to create DingTalk bot and establish connection - fix(gateway): wrap blocking DingTalkStreamClient.start() with asyncio.to_thread() - fix(gateway): extract message fields from CallbackMessage payload instead of ChatbotMessage - fix(gateway): add oapi.dingtalk.com to allowed webhook URL domains	2026-04-17 05:08:07 -07:00
Teknium	08930a65ea	chore: map Patrick Wang, Hedgeho9, Berny Linville emails to AUTHOR_MAP	2026-04-17 05:01:29 -07:00
Berny Linville	6ee65b4d61	fix(weixin): preserve native markdown rendering - stop rewriting markdown tables, headings, and links before delivery - keep markdown table blocks and headings together during chunking - update Weixin tests and docs for native markdown rendering Closes #10308	2026-04-17 05:01:29 -07:00
Hedgeho9	498fc6780e	fix(weixin): extract and deliver MEDIA: attachments in normal send() path The Weixin adapter's send() method previously split and delivered the raw response text without first extracting MEDIA: tags or bare local file paths. This meant images, documents, and voice files referenced by the agent were silently dropped in normal (non-streaming, non-background) conversations. Changes: - In WeixinAdapter.send(), call extract_media() and extract_local_files() before formatting/splitting text. - Deliver extracted files via send_image_file(), send_document(), send_voice(), or send_video() prior to sending text chunks. - Also fix two minor typing issues in gateway/run.py where extract_media() tuples were not unpacked correctly in background and /btw task handlers. Fixes missing media delivery on Weixin personal accounts.	2026-04-17 05:01:29 -07:00
Patrick Wang	4ed6e4c1a5	refactor(weixin): drop pilk dependency from voice fallback	2026-04-17 05:01:29 -07:00
Patrick Wang	649f38390c	fix: force Weixin voice fallback to file attachments	2026-04-17 05:01:29 -07:00
Patrick Wang	678b69ec1b	fix(weixin): use Tencent SILK encoding for voice replies	2026-04-17 05:01:29 -07:00
Teknium	53da34a4fc	fix(discord): route attachment downloads through authenticated bot session (#11568 ) Three open issues — #8242, #6587, #11345 — all trace to the same root cause: the image / audio / document download paths in `DiscordAdapter._handle_message` used plain, unauthenticated HTTP to fetch `att.url`. That broke in three independent ways: #8242 cdn.discordapp.com attachment URLs increasingly require the bot session to download; unauthenticated httpx sees 403 Forbidden, image/voice analysis fail silently. #6587 Some user environments (VPNs, corporate DNS, tunnels) resolve cdn.discordapp.com to private-looking IPs. Our is_safe_url() guard correctly blocks them as SSRF risks, but the user environment is legitimate — image analysis and voice STT die. #11345 The document download path skipped is_safe_url() entirely — raw aiohttp.ClientSession.get(att.url) with no SSRF check, inconsistent with the image/audio branches. Unified fix: use `discord.Attachment.read()` as the primary download path on all three branches. `att.read()` routes through discord.py's own authenticated HTTPClient, so: - Discord CDN auth is handled (#8242 resolved). - Our is_safe_url() gate isn't consulted for the attachment path at all — the bot session handles networking internally (#6587 resolved). - All three branches now share the same code path, eliminating the document-path SSRF gap (#11345 resolved). Falls back to the existing cache_*_from_url helpers (image/audio) or an SSRF-gated aiohttp fetch (documents) when `att.read()` is unavailable or fails — preserves defense-in-depth for any future payload-schema drift that could slip a non-CDN URL into att.url. New helpers on DiscordAdapter: - _read_attachment_bytes(att) — safe att.read() wrapper - _cache_discord_image(att, ext) — primary + URL fallback - _cache_discord_audio(att, ext) — primary + URL fallback - _cache_discord_document(att, ext) — primary + SSRF-gated aiohttp fallback Tests: - tests/gateway/test_discord_attachment_download.py — 12 new cases covering all three helpers: primary path, fallback on missing .read(), fallback on validator rejection, SSRF guard on document fallback, aiohttp fallback happy-path, and an E2E case via _handle_message confirming cache_image_from_url is never invoked when att.read() succeeds. - All 11 existing document-handling tests continue to pass via the aiohttp fallback path (their SimpleNamespace attachments have no .read(), which triggers the fallback — now SSRF-gated). Closes #8242, closes #6587, closes #11345.	2026-04-17 04:59:03 -07:00
Teknium	24342813fe	fix(qqbot): correct Authorization header format in send_message REST path (#11569 ) The send_message tool's direct-REST QQBot path used "QQBotAccessToken {token}" which QQ's API rejects with 401. The correct format is "QQBot {token}" — the gateway adapter at gateway/platforms/qqbot.py uses this format in all 5 header sites (lines 341, 551, 579, 1068, 1467); this was the one outlier. Credit to @Quon for surfacing this in #10257 (that PR had unrelated issues in its media-upload logic and was closed; this salvages the genuine 1-line fix).	2026-04-17 04:25:47 -07:00
Teknium	ca03e80348	chore: map LehaoLin email to AUTHOR_MAP for release script	2026-04-17 04:22:40 -07:00
LehaoLin	504e7eb9e5	fix(gateway): wait for reconnection before dropping WebSocket sends When a WebSocket-based platform adapter (e.g. QQ Bot) temporarily loses its connection, send() now polls is_connected for up to 15s instead of immediately returning a non-retryable failure. If the auto-reconnect completes within the window, the message is delivered normally. On timeout, the SendResult is marked retryable=True so the base class retry mechanism can attempt re-delivery. Same treatment applied to _send_media(). Adds 4 async tests covering: - Successful send after simulated reconnection - Retryable failure on timeout - Immediate success when already connected - _send_media reconnection wait Fixes #11163	2026-04-17 04:22:40 -07:00
dieutx	b594b30de4	fix(release): map dieutx email in author map	2026-04-17 04:22:40 -07:00
dieutx	995177d542	fix(gateway): honor QQ_GROUP_ALLOWED_USERS in runner auth	2026-04-17 04:22:40 -07:00
Pedro Gonzalez	590c9964e1	Fix QQ voice attachment SSRF validation	2026-04-17 04:22:40 -07:00

1 2 3 4 5 ...

4494 commits