hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-19 15:18:03 +00:00

Author	SHA1	Message	Date
WideLee	cf55c738e7	refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py - Replace async create_bind_task/poll_bind_result with synchronous httpx.Client equivalents, eliminating manual event loop management - Move _render_qr and full qr_register() entry-point into onboard.py, mirroring the Feishu onboarding pattern - Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines); call site becomes a single qr_register() import - Fix potential segfault: previous code called loop.close() in the EXPIRED branch and again in the finally block (double-close crashed under uvloop)	2026-04-22 05:50:21 -07:00
Teknium	b43524ecab	fix(wecom): visible poll progress + clearer no-bot-info failure + docstring note Follow-ups on top of salvaged #13923 (@keifergu): - Print QR poll dot every 3s instead of every 18s so "Fetching configuration results..." doesn't look hung. - On "status=success but no bot_info" from the WeCom query endpoint, log the full payload at WARNING and tell the user we're falling back to manual entry (was previously a single opaque line). - Document in the qr_scan_for_bot_info() docstring that the work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI flow, not the public developer API, and may change without notice. Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so release notes attribute the feature correctly.	2026-04-22 05:15:32 -07:00
keifergu	8bcd77a9c2	feat(wecom): add QR scan flow and interactive setup wizard for bot credentials	2026-04-22 05:15:32 -07:00
Teknium	2aa983e2f2	feat(gateway): recognize .pdf in MEDIA: tag extraction (#13683 ) PDFs emitted by tools (report generators, document exporters, etc.) now deliver as native attachments when wrapped in MEDIA: — same as images, audio, and video. Bare .pdf paths are intentionally NOT added to extract_local_files(), so the agent can still reference PDFs in text without auto-sending them.	2026-04-21 13:48:10 -07:00
Teknium	16accd44bd	fix(telegram): require TELEGRAM_WEBHOOK_SECRET in webhook mode (#13527 ) When TELEGRAM_WEBHOOK_URL was set but TELEGRAM_WEBHOOK_SECRET was not, python-telegram-bot received secret_token=None and the webhook endpoint accepted any HTTP POST. Anyone who could reach the listener could inject forged updates — spoofed user IDs, spoofed chat IDs, attacker-controlled message text — and trigger handlers as if Telegram delivered them. The fix refuses to start the adapter in webhook mode without the secret. Polling mode (default, no webhook URL) is unaffected — polling is authenticated by the bot token directly. BREAKING CHANGE for webhook-mode deployments that never set TELEGRAM_WEBHOOK_SECRET. The error message explains remediation: export TELEGRAM_WEBHOOK_SECRET="$(openssl rand -hex 32)" and instructs registering it with Telegram via setWebhook's secret_token parameter. Release notes must call this out. Reported in GHSA-3vpc-7q5r-276h by @bupt-Yy-young. Hardening — not CVE per SECURITY.md §3 "Public Exposure: Deploying the gateway to the public internet without external authentication or network protection" covers the historical default, but shipping a fail-open webhook as the default was the wrong choice and the guard aligns us with the SECURITY.md threat model.	2026-04-21 06:23:09 -07:00
unlinearity	155b619867	fix(agent): normalize socks:// env proxies for httpx/anthropic WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed. Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them: - run_agent._get_proxy_from_env - agent.auxiliary_client._validate_proxy_env_urls - agent.anthropic_adapter.build_anthropic_client - gateway.platforms.base.resolve_proxy_url Regression coverage: - run_agent proxy env resolution - auxiliary proxy env normalization - gateway proxy URL resolution Verified with: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py 39 passed.	2026-04-21 05:52:46 -07:00
pinion05	b0939d9210	fix: slash commands now respect require_mention in Telegram groups When require_mention is enabled, slash commands no longer bypass mention checks. Bare /command without @mention is filtered in groups, while /command@botname (bot menu) and @botname /command still pass. Commands still pass unconditionally when require_mention is disabled, preserving backward compatibility. Closes #6033	2026-04-21 03:06:56 -07:00
alt-glitch	28b3f49aaa	refactor: remove remaining redundant local imports (comprehensive sweep) Full AST-based scan of all .py files to find every case where a module or name is imported locally inside a function body but is already available at module level. This is the second pass — the first commit handled the known cases from the lint report; this one catches everything else. Files changed (19): cli.py — 16 removals: time as _time/_t/_tmod (×10), re / re as _re (×2), os as _os, sys, partial os from combo import, from model_tools import get_tool_definitions gateway/run.py — 8 removals: MessageEvent as _ME / MessageType as _MT (×3), os as _os2, MessageEvent+MessageType (×2), Platform, BasePlatformAdapter as _BaseAdapter run_agent.py — 6 removals: get_hermes_home as _ghh, partial (contextlib, os as _os), cleanup_vm, cleanup_browser, set_interrupt as _sif (×2), partial get_toolset_for_tool hermes_cli/main.py — 4 removals: get_hermes_home, time as _time, logging as _log, shutil hermes_cli/config.py — 1 removal: get_hermes_home as _ghome hermes_cli/runtime_provider.py — 1 removal: load_config as _load_bedrock_config hermes_cli/setup.py — 2 removals: importlib.util (×2) hermes_cli/nous_subscription.py — 1 removal: from hermes_cli.config import load_config hermes_cli/tools_config.py — 1 removal: from hermes_cli.config import load_config, save_config cron/scheduler.py — 3 removals: concurrent.futures, json as _json, from hermes_cli.config import load_config batch_runner.py — 1 removal: list_distributions as get_all_dists (kept print_distribution_info, not at top level) tools/send_message_tool.py — 2 removals: import os (×2) tools/skills_tool.py — 1 removal: logging as _logging tools/browser_camofox.py — 1 removal: from hermes_cli.config import load_config tools/image_generation_tool.py — 1 removal: import fal_client environments/tool_context.py — 1 removal: concurrent.futures gateway/platforms/bluebubbles.py — 1 removal: httpx as _httpx gateway/platforms/whatsapp.py — 1 removal: import asyncio tui_gateway/server.py — 2 removals: from datetime import datetime, import time All alias references (_time, _t, _tmod, _re, _os, _os2, _json, _ghh, _ghome, _sif, _ME, _MT, _BaseAdapter, _load_bedrock_config, _httpx, _logging, _log, get_all_dists) updated to use the top-level names.	2026-04-21 00:50:58 -07:00
alt-glitch	1010e5fa3c	refactor: remove redundant local imports already available at module level Sweep ~74 redundant local imports across 21 files where the same module was already imported at the top level. Also includes type fixes and lint cleanups on the same branch.	2026-04-21 00:50:58 -07:00
Yukipukii1	3f10c27cc0	fix(gateway/api_server): deduplicate concurrent idempotent requests	2026-04-20 22:13:07 -07:00
Es1la	3821921ef7	fix(whatsapp): kill bridge process tree on Windows disconnect	2026-04-20 20:49:32 -07:00
Dylan Socolobsky	2008e997dc	fix(discord): handle properly /slash commands in channels	2026-04-20 14:56:04 -07:00
Dylan Socolobsky	11369a78f9	fix(telegram): handle parentheses in URLs during MarkdownV2 link conversion The link regex in format_message used [^)]+ for the URL portion, which stopped at the first ) character. URLs with nested parentheses (e.g. Wikipedia links like Python_(programming_language)) were improperly parsed. Use a better regex, which is the same the Slack adapter uses.	2026-04-20 14:56:04 -07:00
Teknium	b65f6ca7fe	fix(telegram): actionable error for DM topics when Topics mode not enabled (#13162 ) When createForumTopic fails with 'not a forum' in a private chat, the error now tells the user exactly what to do: enable Topics in the DM chat settings from the Telegram app. Also adds a Prerequisites callout to the docs explaining this client-side requirement before the config section.	2026-04-20 12:29:22 -07:00
MassiveMassimo	7972ff2a2c	feat(whatsapp): add dm_policy and group_policy parity with WeCom/Weixin/QQ adapters Add dm_policy and group_policy to the WhatsApp adapter, bringing parity with WeCom/Weixin/QQ. Allows independent control of DM and group access: disable DMs entirely, allowlist specific senders/groups, or keep open. - dm_policy: open (default) \| allowlist \| disabled - group_policy: open (default) \| allowlist \| disabled - Config bridging for YAML → env vars - 22 tests covering all policy combinations Backward compatible — defaults preserve existing behavior. Cherry-picked from PR #11597 by @MassiveMassimo. Dropped the run.py group auth bypass (would have skipped user auth for ALL platforms, not just WhatsApp).	2026-04-20 11:56:19 -07:00
JP Lew	9fdfb09aed	fix(telegram): cache inbound videos and accept mp4 uploads	2026-04-20 05:10:23 -07:00
sprmn24	ed76185c15	feat(whatsapp): implement send_voice for audio message delivery WhatsApp already receives incoming voice messages (audio/ogg via the bridge) but lacked a send_voice implementation, so TTS and audio responses fell back to the base class send_image path instead of being delivered as native audio messages. Route send_voice through the existing _send_media_to_bridge helper with media_type='audio', matching the pattern used by send_video and send_document.	2026-04-20 05:00:30 -07:00
Teknium	f683132c1d	feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969 ) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text\|image_url, image_url: {url, detail?}} Responses: {type: input_text\|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text\|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>	2026-04-20 04:16:13 -07:00
Roy-oss1	520edd3499	feat(feishu): show processing state via reactions on user messages Replaces the permanent "OK" receipt reaction with a 3-phase visual lifecycle: - Typing animation appears when the agent starts processing. - Cleared when processing succeeds — the reply message is the signal. - Replaced with CrossMark when processing fails. - Cleared when processing is cancelled or interrupted. When Feishu rejects the reaction-delete call, we keep the Typing in place and skip adding CrossMark. Showing both at once would leave the user seeing both "still working" and "done/failed" simultaneously, which is worse than a stuck Typing. A FEISHU_REACTIONS env var (default on) disables the whole lifecycle. User-added reactions with the same emoji still route through to the agent; only bot-origin reactions are filtered to break the feedback loop. Change-Id: I527081da31f0f9d59b451f45de59df4ddab522ba	2026-04-20 02:04:57 -07:00
Ruzzgar	f23123e7b4	fix(gateway): prevent scoped lock and resource leaks on connection failure	2026-04-20 01:44:36 -07:00
Junass1	4c50b4689e	fix(gateway): make Telegram DM topic config writes atomic	2026-04-20 00:57:53 -07:00
helix4u	e96758291b	fix(signal): normalize direct recipients to UUIDs	2026-04-20 00:35:55 -07:00
Teknium	e330112aa8	refactor(telegram): use entity-only mention detection Replaces the word-boundary regex scan with pure MessageEntity-based detection. Telegram's server emits MENTION entities for real @username mentions and TEXT_MENTION entities for @FirstName mentions; the text- scanning fallback was both redundant (entities are always present for real mentions) and broken (matched raw substrings like email addresses, URLs, code-block contents, and forwarded literal text). Entity-only detection: - Closes bug #12545 ("foo@hermes_bot.example" false positive). - Also fixes edge cases the regex fix would still miss: @handles inside URLs and code blocks, where Telegram does not emit mention entities. Tests rewritten to exercise realistic Telegram payloads (real mentions carry entities; substring false positives don't).	2026-04-20 00:10:22 -07:00
Tranquil-Flow	1e18e0503f	fix(telegram): use word-boundary matching for bot mention detection (#12545 )	2026-04-20 00:10:22 -07:00
JackJin	6c0c625952	fix(gateway): accept finalize kwarg in all platform edit_message overrides stream_consumer._send_or_edit unconditionally passes finalize= to adapter.edit_message(), but only DingTalk's override accepted the kwarg. Streaming on Telegram/Discord/Slack/Matrix/Mattermost/Feishu/ WhatsApp raised TypeError the first time a segment break or final edit fired. The REQUIRES_EDIT_FINALIZE capability flag only gates the redundant final edit (and the identical-text short-circuit), not the kwarg itself — so adapters that opt out of finalize still receive the keyword argument and must accept it. Add *, finalize: bool = False to the 7 non-DingTalk signatures; the body ignores the arg since those platforms treat edits as stateless (consistent with the base class contract in base.py). Add a parametrized signature check over every concrete adapter class so a future override cannot silently drop the kwarg — existing tests use MagicMock which swallows any kwarg and cannot catch this. Fixes #12579	2026-04-19 22:46:47 -07:00
Tranquil-Flow	6a228d52f7	fix(webhook): validate HMAC signature before rate limiting (#12544 )	2026-04-19 22:45:08 -07:00
Teknium	014248567b	fix(feishu): hydrate bot open_id for manual-setup users Extends _hydrate_bot_identity() to also populate _bot_open_id (not just _bot_name) by probing /open-apis/bot/v3/info — the same endpoint the scan-to-create wizard uses. No extra scopes required beyond the tenant access token. Closes the manual-setup gap in #12450: users who configured Feishu without running the wizard, and never set FEISHU_BOT_OPEN_ID, now get a bot identity that _is_self_sent_bot_message() can actually use to filter the adapter's own bot-sent events. Each field is hydrated independently: - Env vars (FEISHU_BOT_OPEN_ID / FEISHU_BOT_USER_ID / FEISHU_BOT_NAME) still take precedence and skip their respective probe. - /bot/v3/info provides open_id + name. - Application-info endpoint remains as a best-effort fallback for bot_name only (needs admin:app.info:readonly scope). Tests: 5 new cases covering env-var precedence, probe success, probe failure fallback, and the end-to-end self-send filter gate after hydration.	2026-04-19 11:36:04 -07:00
Bingo	2d54e17b82	fix(feishu): allow bot-originated mentions from other bots	2026-04-19 11:36:04 -07:00
Teknium	7e3b356574	refactor(discord): slim down the race-polish fix (#12644 ) PR #12558 was heavy for what the fix actually is — essay-length comments, a dedicated helper method where a setdefault would do, and a source-inspection test with no real behavior coverage. The genuine code change is ~5 lines of new logic (1 field, 2 async with, an on_ready wait block). Trimmed: - Replaced the 12-line _voice_lock_for helper with a setdefault one-liner at each call site (join_voice_channel, leave_voice_channel). - Collapsed the 12-line comment on on_message's _ready_event wait to 3 lines. Dropped the warning log on timeout — pass-on-timeout is fine; if on_ready hangs that long, the bot is already broken and the log wouldn't help. - Dropped the source-inspection test (greps the module source for expected substrings). It was low-value scaffolding; the voice-serialization test covers actual behavior. Net: -73 lines vs PR #12558. Same two guarantees preserved, same test passes (verified by stashing the fix and confirming failure).	2026-04-19 11:08:10 -07:00
Teknium	a521005fe5	fix(discord): close two low-severity adapter races (#12558 ) Two small races in gateway/platforms/discord.py, bundled together since they're adjacent in the adapter and both narrow in impact. 1. on_message vs _resolve_allowed_usernames (startup window) DISCORD_ALLOWED_USERS accepts both numeric IDs and raw usernames. At connect-time, _resolve_allowed_usernames walks the bot's guilds (fetch_members can take multiple seconds) to swap usernames for IDs. on_message can fire during that window; _is_allowed_user compares the numeric author.id against a set that may still contain raw usernames — legitimate users get silently rejected for a few seconds after every reconnect. Fix: on_message awaits _ready_event (with a 30s timeout) when it isn't already set. on_ready sets the event after the resolve completes. In steady state this is a no-op (event already set); only the startup / reconnect window ever blocks. 2. join_voice_channel check-and-connect The existing-connection check at _voice_clients.get() and the channel.connect() call straddled an await boundary with no lock. Two concurrent /voice channel invocations could both see None and both call connect(); discord.py raises ClientException ("Already connected") on the loser. Same race class for leave running concurrently with _voice_timeout_handler. Fix: per-guild asyncio.Lock (_voice_locks dict with lazy alloc via _voice_lock_for). join_voice_channel and leave_voice_channel both run their body under the lock. Sequential within a guild, still fully concurrent across guilds. Both: LOW severity. The first only affects username-based allowlists on fast-follow-up messages at startup; the second is a narrow exception on simultaneous voice commands. Bundled so the adapter gets a single coherent polish pass. Tests (tests/gateway/test_discord_race_polish.py): 2 regression cases. - test_concurrent_joins_do_not_double_connect: two concurrent join_voice_channel calls on the same guild result in exactly one channel.connect() invocation. - test_on_message_blocks_until_ready_event_set: asserts the expected wait pattern is present in on_message (source inspection, since full discord.py client setup isn't practical here). Regression-guard validated: against unpatched gateway/platforms/discord.py both tests fail. With the fix they pass. Full Discord suite (118 tests) green.	2026-04-19 05:45:59 -07:00
Teknium	206a449b29	feat(webhook): direct delivery mode for zero-LLM push notifications (#12473 ) External services can now push plain-text notifications to a user's chat via the webhook adapter without invoking the agent. Set deliver_only=true on a route and the rendered prompt template becomes the literal message body — dispatched directly to the configured target (Telegram, Discord, Slack, GitHub PR comment, etc.). Reuses all existing webhook infrastructure: HMAC-SHA256 signature validation, per-route rate limiting, idempotency cache, body-size limits, template rendering with dot-notation, home-channel fallback. No new HTTP server, no new auth scheme, no new port. Use cases: Supabase/Firebase webhooks → user notifications, monitoring alert forwarding, inter-agent pings, background job completion alerts. Changes: - gateway/platforms/webhook.py: new _direct_deliver() helper + early dispatch branch in _handle_webhook when deliver_only=true. Startup validation rejects deliver_only with deliver=log. - hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on subscribe; list/show output marks direct-delivery routes. - website/docs/user-guide/messaging/webhooks.md: new Direct Delivery Mode section with config example, CLI example, response codes. - skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only with use cases (bumped to v1.1.0). - tests/gateway/test_webhook_deliver_only.py: 14 new tests covering agent bypass, template rendering, status codes, HMAC still enforced, idempotency still applies, rate limit still applies, startup validation, and direct-deliver dispatch. Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified with real aiohttp server + real urllib POST — agent not invoked, target adapter.send() called with rendered template, duplicate delivery_id suppressed. Closes the gap identified in PR #12117 (thanks to @H1an1 / Antenna team) without adding a second HTTP ingress server.	2026-04-19 05:18:19 -07:00
kshitijk4poor	957ca79e8e	fix(feishu): drop dead helper and cover repeated fenced blocks	2026-04-19 03:30:36 -07:00
kshitijk4poor	a9debf10ff	fix(feishu): harden fenced post row splitting	2026-04-19 03:30:36 -07:00
sgaofen	cc59d133dc	fix(feishu): split fenced code blocks in post payload	2026-04-19 03:30:36 -07:00
kshitijk4poor	4b6ff0eb7f	fix: tighten gateway interrupt salvage follow-ups Follow-up on top of the helix4u #12388 cherry-picks: - make deferred post-delivery callbacks generation-aware end-to-end so stale runs cannot clear callbacks registered by a fresher run for the same session - bind callback ownership to the active session event at run start and snapshot that generation inside base adapter processing so later event mutation cannot retarget cleanup - pass run_generation through proxy mode and drop stale proxy streams / final results the same way local runs are dropped - centralize stop/new interrupt cleanup into one helper and replace the open-coded branches with shared logic - unify internal control interrupt reason strings via shared constants - remove the return from base.py's finally block so cleanup no longer swallows cancellation/exception flow - add focused regressions for generation forwarding, proxy stale suppression, and newer-callback preservation This addresses all review findings from the initial #12388 review while keeping the fix scoped to stale-output/typing-loop interrupt handling.	2026-04-19 03:03:57 -07:00
helix4u	8466268ca5	fix(gateway): keep typing loop overrides backward-compatible	2026-04-19 03:03:57 -07:00
helix4u	150382e8b7	fix(gateway): stop typing loops on session interrupt	2026-04-19 03:03:57 -07:00
kshitijk4poor	ff63e2e005	fix: tighten telegram docker-media salvage follow-ups Follow-up on top of the helix4u #6392 cherry-pick: - reuse one helper for actionable Docker-local file-not-found errors across document/image/video/audio local-media send paths - include /outputs/... alongside /output/... in the container-local path hint - soften the gateway startup warning so it does not imply custom host-visible mounts are broken; the warning now targets the specific risky pattern of emitting container-local MEDIA paths without an explicit export mount - add focused regressions for /outputs/... and non-document media hint coverage This keeps the salvage aligned with the actual MEDIA delivery problem on current main while reducing false-positive operator messaging.	2026-04-19 01:55:33 -07:00
helix4u	588333908c	fix(telegram): warn on docker-only media paths	2026-04-19 01:55:33 -07:00
Teknium	62ce6a38ae	fix(gateway): cancel_background_tasks must drain late-arrivals (#12471 ) During gateway shutdown, a message arriving while cancel_background_tasks is mid-await (inside asyncio.gather) spawns a fresh _process_message_background task via handle_message and adds it to self._background_tasks. The original implementation's _background_tasks.clear() at the end of cancel_background_tasks dropped the reference; the task ran untracked against a disconnecting adapter, logged send-failures, and lingered until it completed on its own. Fix: wrap the cancel+gather in a bounded loop (MAX_DRAIN_ROUNDS=5). If new tasks appeared during the gather, cancel them in the next round. The .clear() at the end is preserved as a safety net for any task that appeared after MAX_DRAIN_ROUNDS — but in practice the drain stabilizes in 1-2 rounds. Tests: tests/gateway/test_cancel_background_drain.py — 3 cases. - test_cancel_background_tasks_drains_late_arrivals: spawn M1, start cancel, inject M2 during M1's shielded cleanup, verify M2 is cancelled. - test_cancel_background_tasks_handles_no_tasks: no-op path still terminates cleanly. - test_cancel_background_tasks_bounded_rounds: baseline — single task cancels in one round, loop terminates. Regression-guard validated: against the unpatched implementation, the late-arrival test fails with exactly the expected message ('task leaked'). With the fix it passes. Blast radius is shutdown-only; the audit classified this as MED. Shipping because the fix is small and the hygiene is worth it. While investigating the audit's other MEDs (busy-handler double-ack, Discord ExecApprovalView double-resolve, UpdatePromptView double-resolve), I verified all three were false positives — the check-and-set patterns have no await between them, so they're atomic on single-threaded asyncio. No fix needed for those.	2026-04-19 01:48:42 -07:00
Teknium	7c10761dd2	fix(discord): shield text-batch flush from follow-up cancel (#12444 ) When Discord splits a long message at 2000 chars, _enqueue_text_event buffers each chunk and schedules a _flush_text_batch task with a short delay. If another chunk lands while the prior flush task is already inside handle_message, _enqueue_text_event calls prior_task.cancel() — and without asyncio.shield, CancelledError propagates from the flush task into handle_message → the agent's streaming request, aborting the response the user was waiting on. Reproducer: user sends a 3000-char prompt (split by Discord into 2 messages). Chunk 1 lands, flush delay starts, chunk 2 lands during the brief window when chunk 1's flush has already committed to handle_message. Agent's current streaming response is cancelled with CancelledError, user sees a truncated or missing reply. Fix (gateway/platforms/discord.py): - Wrap the handle_message call in asyncio.shield so the inner dispatch is protected from the outer task's cancel. - Add an except asyncio.CancelledError clause so the outer task still exits cleanly when cancel lands during the sleep window (before the pop) — semantics for that path are unchanged. The new flush task spawned by the follow-up chunk still handles its own batch via the normal pending-message / active-session machinery in base.py, so follow-ups are not lost. Tests: tests/gateway/test_text_batching.py — test_shield_protects_handle_message_from_cancel. Tracks a distinct first_handle_cancelled event so the assertion fails cleanly when the shield is missing (verified by stashing the fix and re-running). Live E2E on the live-loaded DiscordAdapter: first_handle_cancelled: False (shield worked) first_handle_completed: True (handle_message ran to completion)	2026-04-19 00:09:38 -07:00
Teknium	3a6351454b	fix(gateway): close pending-drain and late-arrival races in base adapter (#12371 ) Two related race conditions in gateway/platforms/base.py that could produce duplicate agent runs or silently drop messages. Neither is specific to any one platform — all adapters inherit this logic. R5 (HIGH) — duplicate agent spawn on turn chain In _process_message_background, the pending-drain path deleted _active_sessions[session_key] before awaiting typing_task.cancel() and then recursively awaiting _process_message_background for the queued event. During the typing_task await, a fresh inbound message M3 could pass the Level-1 guard (entry now missing), set its own Event, and spawn a second _process_message_background for the same session_key — two agents running simultaneously, duplicate responses, duplicate tool calls. Fix: keep the _active_sessions entry populated and only clear() the Event. The guard stays live, so any concurrent inbound message takes the busy-handler path (queue + interrupt) as intended. R6 (MED-HIGH) — message dropped during finally cleanup The finally block has two await points (typing_task, stop_typing) before it deletes _active_sessions. A message arriving in that window passes the guard (entry still live), lands in _pending_messages via the busy-handler — and then the unconditional del removes the guard with that message still queued. Nothing drains it; the user never gets a reply. Fix: before deleting _active_sessions in finally, pop any late pending_messages entry and spawn a drain task for it. Only delete _active_sessions when no pending is waiting. Tests: tests/gateway/test_pending_drain_race.py — three regression cases. Validated: without the fix, two of the three fail exactly where the races manifest (duplicate-spawn guard loses identity, late-arrival 'LATE' message not in processed list).	2026-04-18 19:32:26 -07:00
Teknium	632a807a3e	fix(gateway): slash commands never interrupt a running agent (#12334 ) Any recognized slash command now bypasses the Level-1 active-session guard instead of queueing + interrupting. A mid-run /model (or /reasoning, /voice, /insights, /title, /resume, /retry, /undo, /compress, /usage, /provider, /reload-mcp, /sethome, /reset) used to interrupt the agent AND get silently discarded by the slash-command safety net — zero-char response, dropped tool calls. Root cause: - Discord registers 41 native slash commands via tree.command(). - Only 14 were in ACTIVE_SESSION_BYPASS_COMMANDS. - The other ~15 user-facing ones fell through base.py:handle_message to the busy-session handler, which calls running_agent.interrupt() AND queues the text. - After the aborted run, gateway/run.py:9912 correctly identifies the queued text as a slash command and discards it — but the damage (interrupt + zero-char response) already happened. Fix: - should_bypass_active_session() now returns True for any resolvable slash command. ACTIVE_SESSION_BYPASS_COMMANDS stays as the subset with dedicated Level-2 handlers (documentation + tests). - gateway/run.py adds a catch-all after the dedicated handlers that returns a user-visible "agent busy — wait or /stop first" response for any other resolvable command. - Unknown text / file-path-like messages are unchanged — they still queue. Also: - gateway/platforms/discord.py logs the invoker identity on every slash command (user id + name + channel + guild) so future ghost-command reports can be triaged without guessing. Tests: - 15 new parametrized cases in test_command_bypass_active_session.py cover every previously-broken Discord slash command. - Existing tests for /stop, /new, /approve, /deny, /help, /status, /agents, /background, /steer, /update, /queue still pass. - test_steer.py's ACTIVE_SESSION_BYPASS_COMMANDS check still passes. Fixes #5057. Related: #6252, #10370, #4665.	2026-04-18 18:53:22 -07:00
Nish	1a9a2d7fe8	fix(gateway/telegram): fall back to chat.id when from_user is None in DMs When `message.from_user` is None — which can happen for forwarded messages, anonymous admin mode in groups, or certain Telegram client edge cases — `_build_message_event` set `source.user_id` to None. This caused: 1. `_is_user_authorized()` to early-return False (`if not user_id: return False`) 2. The access check never compared against `TELEGRAM_ALLOWED_USERS` even when the user actually was in the allowlist 3. The pairing flow fired and generated a code for `user_id=None` 4. The pairing approval saved an entry under the literal string key "null" 5. The user was effectively locked out because their real user_id never matched the "null" key on subsequent messages For DMs (`chat_type == "dm"`), Telegram guarantees `chat.id == user.id` — they are the same numeric ID for private chats. Falling back to `chat.id` when `from_user` is None for DMs restores the expected access-control behavior without weakening it (group/channel chats correctly stay None). Also adds a parallel `user_name` fallback to `chat.full_name` so the display name still works in the same edge case.	2026-04-18 18:18:01 -07:00
Teknium	2edebedc9e	feat(steer): /steer <prompt> injects a mid-run note after the next tool call (#12116 ) * feat(steer): /steer <prompt> injects a mid-run note after the next tool call Adds a new slash command that sits between /queue (turn boundary) and interrupt. /steer <text> stashes the message on the running agent and the agent loop appends it to the LAST tool result's content once the current tool batch finishes. The model sees it as part of the tool output on its next iteration. No interrupt is fired, no new user turn is inserted, and no prompt cache invalidation happens beyond the normal per-turn tool-result churn. Message-role alternation is preserved — we only modify an existing role:"tool" message's content. Wiring ------ - hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS. - run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(), _apply_pending_steer_to_tool_results(); drain at end of both parallel and sequential tool executors; clear on interrupt; return leftover as result['pending_steer'] if the agent exits before another tool batch. - cli.py: /steer handler — route to agent.steer() when running, fall back to the regular queue otherwise; deliver result['pending_steer'] as next turn. - gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent path strips the prefix and forwards as a regular user message. - tui_gateway/server.py: new session.steer JSON-RPC method. - ui-tui: SessionSteerResponse type + local /steer slash command that calls session.steer when ui.busy, otherwise enqueues for the next turn. Fallbacks --------- - Agent exits mid-steer → surfaces in run_conversation result as pending_steer so CLI/gateway deliver it as the next user turn instead of silently dropping it. - All tools skipped after interrupt → re-stashes pending_steer for the caller. - No active agent → /steer reduces to sending the text as a normal message. Tests ----- - tests/run_agent/test_steer.py — accept/reject, concatenation, drain, last-tool-result injection, multimodal list content, thread safety, cleared-on-interrupt, registry membership, bypass-set membership. - tests/gateway/test_steer_command.py — running agent, pending sentinel, missing steer() method, rejected payload, empty payload. - tests/gateway/test_command_bypass_active_session.py — /steer bypasses the Level-1 base adapter guard. - tests/test_tui_gateway_server.py — session.steer RPC paths. 72/72 targeted tests pass under scripts/run_tests.sh. * feat(steer): register /steer in Discord's native slash tree Discord's app_commands tree is a curated subset of slash commands (not derived from COMMAND_REGISTRY like Telegram/Slack). /steer already works there as plain text (routes through handle_message → base adapter bypass → runner), but registering it here adds Discord's native autocomplete + argument hint UI so users can discover and type it like any other first-class command.	2026-04-18 04:17:18 -07:00
Teknium	9527707f80	fix(signal): back off sendTyping spam for unreachable recipients (#12118 ) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.	2026-04-18 04:13:32 -07:00
Teknium	45acd9beb5	fix(gateway): ignore redelivered /restart after PTB offset ACK fails (#11940 ) When a Telegram /restart fires and PTB's graceful-shutdown `get_updates` ACK call times out ("When polling for updates is restarted, updates may be received twice" in gateway.log), the new gateway receives the same /restart again and restarts a second time — a self-perpetuating loop. Record the triggering update_id in `.restart_last_processed.json` when handling /restart. On the next process, reject a /restart whose update_id <= the recorded one as a stale redelivery. 5-minute staleness guard so an orphaned marker can't block a legitimately new /restart. - gateway/platforms/base.py: add `platform_update_id` to MessageEvent - gateway/platforms/telegram.py: propagate `update.update_id` through _build_message_event for text/command/location/media handlers - gateway/run.py: write dedup marker in _handle_restart_command; _is_stale_restart_redelivery checks it before processing /restart - tests/gateway/test_restart_redelivery_dedup.py: 9 new tests covering fresh restart, redelivery, staleness window, cross-platform, malformed-marker resilience, and no-update_id (CLI) bypass Only active for Telegram today (the one platform with monotonic cross-session update ordering); other platforms return False from _is_stale_restart_redelivery and proceed normally.	2026-04-17 21:17:33 -07:00
Teknium	607be54a24	fix(discord): forum channel media + polish Extend forum support from PR #10145: - REST path (_send_discord): forum thread creation now uploads media files as multipart attachments on the starter message in a single call. Previously media files were silently dropped on the forum path. - Websocket media paths (_send_file_attachment, send_voice, send_image, send_animation — covers send_image_file, send_video, send_document transitively): forum channels now go through a new _forum_post_file helper that creates a thread with the file as starter content, instead of failing via channel.send(file=...) which forums reject. - _send_to_forum chunk follow-up failures are collected into raw_response['warnings'] so partial-send outcomes surface. - Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids GET /channels/{id} on every uncached send after the first. - Dedup of TestSendDiscordMedia that the PR merge-resolution left behind. - Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md. Tests: 117 passed (22 new for forum+media, probe cache, warnings).	2026-04-17 20:25:48 -07:00
ChimingLiu	e5333e793c	feat(discord): support forum channels	2026-04-17 20:25:48 -07:00
pedh	4459913f40	feat(dingtalk): AI Cards streaming, emoji reactions, and media handling Cherry-picked from #10985 by pedh, adapted to current main: * Keeps main's full group-chat gating (require_mention + allowed_users + free_response_chats + mention_patterns) — PR's simpler subset dropped. * Keeps main's fire-and-forget process() dispatch + session_webhook fallback for SDK >= 0.24. * Picks up PR's REQUIRES_EDIT_FINALIZE capability flag on BasePlatformAdapter + finalize kwarg on edit_message(), plumbed through stream_consumer. Default False so Telegram/Slack/Discord/Matrix stay on the zero-overhead fast path. * DingTalk AI Card lifecycle: per-chat _message_contexts, two-card flow (tool-progress + final response) with sibling auto-close driven by reply_to, idempotent 🤔Thinking → 🥳Done swap, $alibabacloud-dingtalk$ for media URL resolution (replaces raw HTTP that was 403-ing). * pyproject: dingtalk extra now dingtalk-stream>=0.20,<1 + alibabacloud-dingtalk>=2.0.0 + qrcode. Closes #10991 Co-authored-by: pedh	2026-04-17 19:26:53 -07:00

1 2 3 4 5 ...

595 commits