hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-17 09:41:58 +00:00

Author	SHA1	Message	Date
Teknium	3e7e9b24d4	fix: harden salvaged session and browser improvements Polish salvaged contributor work before PR review: - read browser inactivity timeout from config with documented fallback - skip redundant v10 trigram backfill before v11 FTS rebuild - show delegate_task goals safely in progress previews - show gateway status model/context without redundant token wording - wire gateway /sessions to shared session-listing helpers - map Ravenwolf author emails for release attribution Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de> Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>	2026-06-15 07:46:34 -07:00
Wolfram Ravenwolf	ead38107a2	feat(status): restore model and context in gateway status PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat. SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative. Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py	2026-06-15 07:46:34 -07:00
Teknium	0d82060c74	fix: harden WhatsApp target alias salvage Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.	2026-06-15 05:51:47 -07:00
Keiron McCammon	ea49a79633	fix(messaging): route WhatsApp group JIDs to the target, not the home DM send_message(target="whatsapp:<group-jid>") silently delivered to the configured home DM instead of the requested group. Two gaps: 1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us), user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and broadcast/newsletter JIDs matched no pattern and fell through to `return None, None, False`, so the caller treated them as unresolvable and used the home channel. The bridge's /send endpoint accepts any chatId, so only the tool-side target parsing was at fault. Add a whatsapp branch that recognizes native JIDs as explicit targets. The pre-existing '+'-prefixed E.164 path is preserved. 2. WhatsApp groups have no human-friendly name — the channel directory is regenerated from session data on a timer, so a group shows up as its raw 18-digit JID and any hand-edit to channel_directory.json is clobbered on the next rebuild. Add a user-maintained alias overlay (~/.hermes/channel_aliases.json) re-applied on every build AND every load, giving durable friendly names and letting a freshly-created group be pre-named before its first message. Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser; TestChannelAliases (7 cases) for the overlay, plus an autouse fixture isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the existing directory tests.	2026-06-15 05:51:47 -07:00
Tharushka Dinujaya	ec05d2bc3e	fix(gateway): evict scoped lock when PID+start_time match but process is not a gateway On Linux, systemd spawns core services (cron, nginx, sshd) with deterministic PIDs and jiffy start_times across reboots. A service can land on the exact same PID and start_time as a previous gateway, causing acquire_scoped_lock to mistake it for a live gateway and block startup. The existing stale-detection paths only covered: - start_times both non-None and different (clear mismatch) - start_times both None (macOS/Windows fallback to cmdline check) The boot-time collision falls through both: times are non-None and equal, so neither branch fired. Add a third check: when both start_times are known and match but the live process fails _looks_like_gateway_process, read its cmdline. If the cmdline is readable (non-None), we have positive evidence of an impostor and mark the lock stale. Requiring a readable cmdline keeps the check conservative — if cmdline is unreadable we do not evict.	2026-06-15 05:25:07 -07:00
Teknium	a1f51feb72	fix(telegram): avoid rich final duplicate previews (#46206 )	2026-06-14 11:13:38 -07:00
kshitijk4poor	3bc4a2ff78	fix(gateway): re-baseline agent-cache message_count after each turn The #45966 cross-process coherence guard snapshots a session's on-disk message_count next to the cached agent and rebuilds the agent when the count changes. But the snapshot is taken at agent-BUILD time — before the turn writes its own user + assistant (+ tool) rows — and the cache entry is never rewritten on a reuse. So this process's OWN turn grows message_count, and the very next turn sees a mismatch and rebuilds the agent. That happens every turn, for every conversation, silently destroying the per-conversation prompt caching the cache exists to protect (AGENTS.md: prompt caching is sacred). Add _refresh_agent_cache_message_count(): after a turn completes and the agent has flushed its rows to the SessionDB, re-baseline the stored count to the now-current value. The guard then fires ONLY when a DIFFERENT process changes the transcript — preserving the #45966 fix while keeping the cache warm for normal single-process operation. Tests drive the real SessionDB + the real guard condition: 5 consecutive same-process turns now all REUSE the cached agent (0 before the fix); a cross-process append still invalidates; and the re-baseline is fail-safe (no DB, falsy session_id, raising probe, legacy 2-tuple, pending sentinel all no-op).	2026-06-14 22:58:55 +05:30
Teknium	2c174bce24	fix(gateway): preserve new input on interrupted replay cleanup	2026-06-14 05:10:39 -07:00
Teknium	efbe1635dd	fix(gateway): include replied-to media attachments (#46107 )	2026-06-14 04:51:50 -07:00
Teknium	9459057d7f	fix(telegram): guard rich details math crash (#46102 )	2026-06-14 04:22:22 -07:00
Teknium	cf7d5932f8	fix(email): make IPv4 SMTP fallback use supported sockets	2026-06-14 04:16:26 -07:00
liuhao1024	04d4471d79	fix(email): use SMTP_SSL for port 465 and fall back to IPv4 on timeout Port 465 expects implicit TLS (SMTP_SSL) from the first byte. The email adapter always used SMTP() + starttls(), which is correct for port 587 but hangs/fails on port 465 providers (e.g., Swiss ISPs). Additionally, when the SMTP host has AAAA DNS records but IPv6 is unreachable, socket.create_connection() tries IPv6 first and hangs until timeout. Add an IPv4 fallback via AF_INET socket. Extract _connect_smtp() helper to consolidate the 4 duplicate SMTP connection sites into a single method with correct protocol selection and IPv6 fallback logic.	2026-06-14 04:16:26 -07:00
Teknium	5105c3651a	perf(api-server): normalize chat content linearly (#46079 )	2026-06-14 03:25:49 -07:00
Aldo	293c04fef6	fix(gateway): suppress exact silence tokens without mutating history	2026-06-14 03:25:08 -07:00
Teknium	10bad2faf1	fix(gateway): serialize startup auto-resume before inbound (#46074 ) Gateway startup now queues real inbound messages until restart-interrupted auto-resume turns have completed, preventing duplicate agents for the same session after a restart.	2026-06-14 03:21:06 -07:00
Teknium	723c2331bd	fix: make profile subprocess HOME policy explicit	2026-06-14 03:20:21 -07:00
Teknium	afc8615509	perf(webhook): prune request caches incrementally (#46065 )	2026-06-14 02:40:54 -07:00
Justin Sunseri	12682d96b9	feat(telegram): restore rich messages opt-out Salvages PR #45840's client-compatibility opt-out while keeping rich messages enabled by default via telegram.extra.rich_messages: true.	2026-06-13 21:45:49 -07:00
ITheEqualizer	57c2a55be4	fix(telegram): harden rich message fallback handling Carry forward focused follow-ups from PR #45741: treat PTB's raw Bot API 10.1 response shapes safely, recognize real missing-endpoint errors, preserve link preview settings on rich sends, and lock the rich limit to Telegram's character-based cap.	2026-06-13 14:34:53 -07:00
ITheEqualizer	7c0605bf22	fix(telegram): preserve rich formatting on stream final	2026-06-13 13:44:45 -07:00
kshitijk4poor	63097ee0d7	test(gateway): cover auto-resume full-path no-regression; clarify guard docstring The salvaged fix's two regression tests mock adapter.handle_message, so they only assert the pre-claimed sentinel is set/cleaned around a stub — they never drive the real dispatch chain. Add a full-path test that exercises _schedule_resume_pending_sessions -> _guarded_handle_message -> adapter.handle_message -> _process_message_background -> _handle_message and asserts the resumed session's agent runs EXACTLY ONCE: not zero (the pre-claim must not self-bounce the resume into a queued no-op) and not twice (the duplicate-agent bug #45456 the fix targets). Also assert no leaked sentinel and no orphaned pending event after the drain settles. Tighten the _guarded_handle_message docstring: on current main the real sentinel is taken over inside _handle_message (not _process_message_background), and note the `is _AGENT_PENDING_SENTINEL` guard only releases the slot we ourselves placed, never one a live run owns.	2026-06-13 23:39:35 +05:30
liuhao1024	6e2fd955ca	fix(gateway): claim session slot before auto-resume task to prevent duplicate agents When the gateway restarts and auto-resumes an interrupted session, an inbound message arriving in the window between `asyncio.create_task()` and the task's first await could spin up a second AIAgent for the same session. Both agents would then process messages concurrently, producing interleaved duplicate responses (#45456). Fix: set `_AGENT_PENDING_SENTINEL` in `_running_agents` immediately after the "already running" check, before creating the task. This closes the race window — any inbound message sees the slot as occupied and queues behind the auto-resume. A `_guarded_handle_message` wrapper ensures the pre-claimed sentinel is always released, even if `handle_message` raises before reaching `_process_message_background` (whose `finally` block handles normal cleanup). (cherry picked from commit `85150c976b`)	2026-06-13 23:36:51 +05:30
Teknium	ad7436a5d9	fix(gateway): preserve WeCom per-group sender allowlists Keep the own-policy fail-closed hardening from PR #45444, but still trust WeCom groups.<id>.allow_from because the adapter already checked that sender allowlist before dispatching to gateway auth.	2026-06-13 07:18:54 -07:00
Que0x	fc46354580	fix(security): fail closed when an own-policy gateway adapter has no allowlist Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default dm_policy/group_policy to "open", which forwards every sender. The gateway's adapter-trust shortcut in _is_user_authorized blanket-trusted those platforms when no env allowlist was set, so an operator who enabled one with only credentials authorized the entire external network -- the fail-open SECURITY.md section 2.6 forbids ("an allowlist is required for every enabled network-exposed adapter"). Trust the adapter only when its effective policy for the chat type is an actual "allowlist" restriction (the case #34515 was protecting). "open"/"pairing"/anything else falls through to default-deny, where {PLATFORM}_ALLOW_ALL_USERS / GATEWAY_ALLOW_ALL_USERS and the pairing flow remain the explicit opt-ins.	2026-06-13 07:18:54 -07:00
Teknium	1185dfd773	test: cover legacy Office document extensions	2026-06-13 07:18:37 -07:00
Sarvesh	45f9099e51	fix(matrix): preserve markdown table structure	2026-06-13 06:57:08 -07:00
konsisumer	16fb573bae	fix(gateway): clear bloated compression binding on compression-exhaustion auto-reset After compression exhaustion the auto-reset created a fresh session but discarded reset_session()'s return value and left the Telegram topic binding pointing at the oversized compressed child. The next inbound message in that topic healed the binding forward and switch_session'd the freshly-reset lane back onto the bloated transcript, re-triggering compression exhaustion in a loop with a new session id each time. Capture the fresh entry and re-sync the topic binding to it so the next message starts clean. No-op on non-topic lanes. Regression of the #9893/#10063 auto-reset fix. Fixes #35809	2026-06-13 06:38:29 -07:00
Teknium	a59d5e37e8	feat(telegram): make rich messages always on (#45584 ) Remove the rich_messages config toggle entirely so Telegram replies always try the Bot API 10.1 rich-message path first, with the existing MarkdownV2 fallback/latch behavior for unsupported endpoints and per-message failures. Restore the Telegram platform hint to encourage rich Markdown tables/task lists/math now that the rich path is the default, and remove the config/docs surface for the old toggle.	2026-06-13 05:45:11 -07:00
Black-Kylin	202e318cb1	fix(gateway): sync compression session splits before failures Salvages PR #25747 by preserving gateway session rotation even when a post-compression model call fails before returning final content. Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>	2026-06-13 04:51:59 -07:00
Teknium	2a5dc0ef3d	fix(slack): make video attachments available to agents (#45512 )	2026-06-13 03:33:27 -07:00
Teknium	197337cc47	fix(gateway): suppress duplicate final stream sends (#45517 )	2026-06-13 03:23:44 -07:00
Teknium	dc467488a7	test: assert typing-stop-before-callback as an invariant, not a call count The shared _stop_typing_refresh cleanup makes up to two bounded stop_typing attempts; the old assertion pinned exactly one typing-stopped event before callback-start.	2026-06-12 12:02:41 -07:00
Flownium	331cb38e21	fix: stop Discord typing after replies	2026-06-12 12:02:41 -07:00
Teknium	652dd9c9f2	fix: rich messages follow-ups — reply_parameters, send latch, opt-in default - Use reply_parameters per the sendRichMessage spec instead of the undocumented reply_to_message_id scalar (silently ignored -> reply anchor quietly dropped). - Latch rich sends off after an endpoint-capability failure (old PTB / server without sendRichMessage) so every later reply doesn't pay a doomed extra roundtrip; per-message BadRequests do NOT latch. - Default rich_messages to OFF (opt-in) while the day-old Bot API 10.1 endpoint is validated live; revert the prompt-hint table guidance until the default flips on. - Tests: reply_parameters shape, send-latch behavior, BadRequest non-latch; rich tests opt in explicitly via extra.	2026-06-12 11:47:54 -07:00
ITheEqualizer	05b9c84ca4	Add Telegram Bot API 10.1 rich message support Introduce opportunistic support for Telegram Bot API 10.1 rich messages by sending raw agent Markdown via sendRichMessage and streaming previews via sendRichMessageDraft. Implements a rich-path fast‑path in gateway/platforms/telegram.py (RICH_MESSAGE_MAX_BYTES=32768, feature gate platforms.telegram.extra.rich_messages, bot capability checks, routing/thread handling, and conservative fallback rules: permanent/capability errors fall back to the legacy MarkdownV2 path, transient/network errors are surfaced without legacy-resend). Also add a latch for draft capability failures (_rich_draft_disabled) and preserve legacy chunking and draft behavior when needed. Update agent prompt hints (telegram encourages rich Markdown/tables), add CLI config example option, update English and Chinese docs to describe rich messages and fallbacks, and add/adjust tests for rich send and draft behavior.	2026-06-12 11:47:54 -07:00
ethernet	c41a6534cf	fix(tests): mock subprocess.Popen in all _handle_update_command tests	2026-06-12 13:42:42 -04:00
Teknium	0fd34e8c5a	fix(teams): cache document/video/audio attachments and classify as DOCUMENT (#44778 ) The Teams adapter only handled image/* attachments — documents (the application/vnd.microsoft.teams.file.download.info consent-free download payload and any direct-URL non-image attachment) never reached media_urls at all, so run.py's document-context injection had nothing to surface. Completes the class-wide sweep from PR #44695 (Signal/Email/SimpleX). - download.info attachments: fetch the pre-authed SharePoint downloadUrl (SSRF-guarded, same guard chain as base.py cache_*_from_url) and route through cache_media_bytes - direct-URL non-image attachments: same fetch + classify path - skip Teams' text/html message-body mirror and adaptive-card attachments - DOCUMENT > PHOTO > VIDEO > AUDIO precedence for mixed attachments, matching the Email precedence rationale from #44695	2026-06-12 02:05:41 -07:00
Siddharth Balyan	7ba5df0d52	feat(billing): /credits command — balance + portal top-up handoff (#44776 ) * feat(billing): /usage → portal top-up browser handoff Add the terminal side of the billing slice (phase 2a): start a top-up by throwing the user to the portal billing page with the top-up modal open. The terminal does not confirm, poll, or track payment — checkout completes in the browser and the next /usage shows the new balance. - nous_account.py: parse organisation.slug/name from /api/oauth/account into NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned {base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy {base}/billing?topup=open (never /orgs/None/...). - portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line (Topping up as <email> / org <name>), browser open with printed-URL fallback, no-wait closing copy. No polling/confirmation (deferred to 2b). - account_usage.py: the shared /usage credits block now links the org-pinned top-up URL (auto-opens the modal) + points to the command. Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until that is live on the target env; until then /api/oauth/account returns organisation: { id } only and the URL falls back to legacy. * feat(billing): /credits command for balance + top-up handoff Replace the standalone `hermes portal topup` subcommand with an in-session /credits slash command — a focused money surface (balance in, top-up out) that works in the CLI, TUI, and every messaging platform from one registry entry. - commands.py: register /credits (Info category). Slack is at its 50-slash cap, so /credits is routed via /hermes credits on Slack only (new _SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the native list and breaking Telegram parity; native everywhere else. - account_usage.py: build_credits_view() — one portal fetch → balance lines + identity line + org-pinned top-up URL + depleted flag, consumed by all surfaces. Reuses the same snapshot/URL builder as /usage so numbers match. - cli.py: _show_credits() — balance block + identity line + 3-button panel (Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal. ASK, never auto-launch; headless falls back to printing the URL. - gateway/slash_commands.py: _handle_credits_command() — renders the block + tappable top-up URL + no-wait copy; works on button and plain-text platforms. - /usage credits line now points to /credits. - Retire `hermes portal topup` (portal_cli.py back to baseline); the engine (slug/name parse + nous_portal_topup_url) stays as the shared core. No polling, no payment confirmation (billing phase 2a). Depends on NAS #409. * fix(credits): /credits works in the TUI slash-worker (non-interactive) In the TUI, /credits runs in the slash-worker subprocess where there is no live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called the 3-button modal unconditionally, which fell back to reading stdin → exception → slash.exec rejected → the command produced no output (only the pre-existing 'Credit access paused' banner showed). - _show_credits: when self._app is None (TUI worker / piped / non-interactive), render the text variant — balance block + tappable top-up URL + no-wait line, same affordance as the messaging surfaces — and skip the modal entirely. The 3-button panel still renders in the interactive CLI. - Depleted banner copy: 'run /usage for balance' → 'run /credits to top up' now that /credits is the dedicated money surface (+ tests). - Regression tests: _show_credits with self._app=None renders text and never invokes the modal; logged-out path. * feat(tui): credits.view RPC for the /credits tappable top-up button Add a credits.view JSON-RPC method returning the structured CreditsView (logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can render a clickable <Link> top-up button instead of plain text. Account- independent (portal fetch gated on a logged-in Nous account), fail-open to {logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern. Frontend (TUI-local /credits command + Ink component) lands separately. * feat(tui): /credits command with keyboard-driven top-up confirm TUI-local /credits: fetches the structured balance via the credits.view RPC, prints the balance + identity + top-up URL, then arms the EXISTING confirm overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel). Reuses ConfirmReq — no new overlay component/state/input handler. Headless (openExternalUrl returns false) falls back to printing the URL. - gatewayTypes.ts: CreditsViewResponse. - commands/credits.ts: the command (mirrors /status's rpc+guarded pattern). - registry.ts: register creditsCommands. - test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases). Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.	2026-06-12 08:51:10 +00:00
Teknium	74180ebf0b	fix(gateway): classify SimpleX non-image/non-audio files as DOCUMENT SimpleX tagged unknown files application/octet-stream in media_types but classification only handled audio/image, leaving msg_type TEXT — run.py never injected the document context. Same bug class as #12845.	2026-06-12 01:07:50 -07:00
Teknium	f03f161b39	fix(gateway): classify email document attachments as DOCUMENT Email cached document attachments and placed them in media_urls, but msg_type only flipped on image attachments — documents stayed TEXT and run.py's document-context injection (gated on MessageType.DOCUMENT) silently dropped them. Same bug class as Signal #12845. DOCUMENT wins over PHOTO for mixed attachments since image handling keys off per-path mime types while document injection gates strictly on message_type.	2026-06-12 01:07:50 -07:00
Teknium	1e29ab38c7	fix(gateway): classify Signal video attachments + catch-all DOCUMENT fallback Widen the salvaged #12851 fix to match the established classification pattern (WhatsApp/Slack/BlueBubbles/Mattermost): video/* -> VIDEO, and any remaining MIME type falls through to DOCUMENT instead of TEXT, so exotic types still trigger run.py's document-context injection.	2026-06-12 01:07:50 -07:00
Kyle Dunn	8e821cd2f5	test(gateway): verify Signal inbound text attachment sets MessageType.DOCUMENT	2026-06-12 01:07:50 -07:00
Kyle Dunn	ffef9da9b7	test(gateway): verify Signal inbound PDF attachment sets MessageType.DOCUMENT	2026-06-12 01:07:50 -07:00
Teknium	db7714d5f1	Merge pull request #44331 from NousResearch/hermes/hermes-6b48295e feat(whatsapp): WhatsApp Business Cloud API adapter (salvage #43921)	2026-06-11 22:48:06 -07:00
Kyssta	a942bfd9cc	fix(gateway): reset _last_flushed_db_idx when reusing cached agent (#44327 ) (#44518 ) Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-11 22:41:34 -07:00
Veritas-7	82d570165e	fix(slack): ack reaction lifecycle events Register no-op Slack event handlers for inbound reaction_added and reaction_removed events so Slack Bolt does not log unhandled-request warnings for events Hermes does not consume.	2026-06-12 10:54:07 +05:30
Brad Smith	08e8bedae8	fix(gateway): keep plugin action wrapper signature to (ack, body, action) The previous implementation captured loop vars via default arguments:: async def _wrapped(ack, body, action, _cb=_cb, _plugin_name=_plugin_name): slack_bolt's ``kwargs_injection`` introspects each listener's signature via ``inspect.signature`` and passes ``None`` for any parameter name it doesn't recognise (see ``slack_bolt/kwargs_injection/async_utils.py`` ``build_async_required_kwargs``). That clobbered ``_cb`` to ``None`` at dispatch time, so the wrapped plugin handler became ``NoneType`` — ``await _cb(...)`` then raised ``'NoneType' object is not callable`` and no plugin action handler ever fired. Replace the default-arg trick with a small closure factory so the wrapper's public signature is exactly ``(ack, body, action)``. Add a regression test that introspects the wrapped function's signature. Found via real Slack click on a Block Kit button registered through ``ctx.register_slack_action_handler`` — gateway log showed ``[Slack] Plugin 'None' action handler raised: 'NoneType' object is not callable`` despite the registration log line confirming the handler was wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-12 10:36:14 +05:30
Brad Smith	62e937bf2b	feat(plugins): expose register_slack_action_handler API Plugins that post Block Kit messages with interactive elements (buttons, overflow menus, datepickers, etc.) had no documented way to receive the resulting click events. The plugin API exposed register_tool, register_hook, register_command, register_platform, and register_context_engine, but nothing for slack_bolt action handlers. The only workaround was to monkey-patch SlackAdapter.connect from inside register(), which is fragile and breaks on every Hermes update. This change adds: * PluginContext.register_slack_action_handler(action_id, callback) — validates inputs and queues the handler on the PluginManager. action_id accepts whatever slack_bolt.App.action() accepts (literal string, compiled re.Pattern, or constraint dict). * PluginManager.get_slack_action_handlers() — accessor used by the Slack adapter at connect time. * SlackAdapter.connect — after wiring its built-in approval and slash-confirm buttons, iterates the plugin-registered handlers and registers each via self._app.action(matcher)(callback). Each callback is wrapped defensively so a misbehaving plugin cannot crash slack_bolt's dispatch loop, with a best-effort ack on exception so Slack stops retrying the click. * Defensive fallback when the plugin layer is unhealthy: a RuntimeError from get_plugin_manager() is logged and swallowed rather than blocking the gateway from starting. * Test coverage in tests/gateway/test_slack_plugin_action_handlers.py for input validation, multi-plugin registration, the connect-time wiring, defensive exception handling, and the plugin-loader- failure fallback path. * Documentation in website/docs/guides/build-a-hermes-plugin.md describing the new API alongside the existing register_command / dispatch_tool documentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-12 10:36:14 +05:30
Austin Pickett	c3464ecf45	fix(discord): recover from runtime gateway task exits (#44383 ) * fix(discord): recover from runtime gateway task exits Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit recovery; the original PR was 1081 commits behind with 28 unrelated commits. A post-ready discord.py WebSocket crash left the gateway split-brained: producers stayed active while Discord stopped responding. After this fix the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error() so the existing GatewayRunner reconnect watcher replaces the dead adapter. Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy errors, invalid tokens) surface fast instead of burning the full ready timeout. Because connect() no longer waits via asyncio.wait_for on that path, test_connect_releases_token_lock_on_timeout is updated to trigger the timeout through the new helper (same lock-release contract). 3 tests pass (2 new runtime-failure tests + the updated timeout test); test_discord_connect.py and test_discord_slash_commands.py green. Co-Authored-By: ameobius <ameobius@local.host> * fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test connect() no longer uses asyncio.wait_for for the ready handshake, so test_connect_timeout_cancels_bot_task was hanging for 30s in CI. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ameobius <ameobius@local.host> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-11 15:39:01 -04:00
Dineth Hettiarachchi	020ef76cf1	fix(discord): cancel _bot_task on connect() timeout to prevent zombie client When connect() times out waiting for the Discord ready event, the background asyncio.Task running client.start() was not cancelled. discord.py's internal reconnect loop can ignore client.close() while a WebSocket handshake is in flight, so the orphaned task eventually completes and fires on_ready. A later successful reconnect then leaves two live Discord clients in the same process — each with its own on_message handler and MessageDeduplicator instance — so every @mention creates two threads because the per-adapter dedup caches cannot catch cross-client duplicates. Fix: explicitly cancel and await _bot_task in two places: 1. The asyncio.TimeoutError handler inside connect() — catches the case where the adapter's own inner wait_for fires before the gateway's outer timeout. 2. The start of disconnect() — the load-bearing path, always reached via _dispose_unused_adapter regardless of which timeout fired first. Root cause confirmed from production logs: a Jun 8 network outage caused three consecutive connect() timeouts. The first attempt's bot_task completed its handshake 4 minutes later ("Connected as") with no preceding watcher line, then the watcher's real reconnect also connected 90 seconds after that. The two clients ran continuously for 41+ hours, confirmed by the same user message appearing as two separate inbound events in two different thread IDs 357ms apart. Regression tests added to tests/gateway/test_discord_connect.py: - test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a NeverReadyBot and asserts _bot_task is None afterward - test_disconnect_cancels_running_bot_task: injects a live zombie task, calls disconnect(), and asserts the task is cancelled and the attribute cleared	2026-06-11 12:09:18 -07:00

1 2 3 4 5 ...

1306 commits