hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Teknium	f03823014b	fix(telegram): kill 409 polling conflict loop by disarming PTB retry synchronously (#53941 ) Telegram polling entered a self-inflicted ~31s loop of 409 Conflict -> retry -> resume -> Conflict. The error_callback PTB invokes synchronously inside its internal network_retry_loop only scheduled our async recovery task (loop.create_task) and returned, so PTB kept polling getUpdates on its own while our handler concurrently ran stop -> sleep -> start_polling. The two polling sessions overlapped and Telegram returned a fresh 409. Fix: in the conflict branch of the error_callback, synchronously set PTB's private polling stop_event before scheduling recovery. PTB's loop exits on its next tick (it races that event in do_action), so our handler owns polling alone. The handler's await updater.stop() drains the task and PTB clears the event, so the subsequent start_polling() builds a fresh event and is not poisoned. Keeps the existing reconnect ladder intact (option B) — fixes only the race. Defensive: probes mangled + unmangled stop_event spellings and no-ops (prior behaviour) if neither exists; never flips _running, which would make the handler skip stop() and leave the loop wedged.	2026-06-27 20:46:08 -07:00
konsisumer	11b0be8d15	fix(gateway): avoid Matrix pending invite boot loops	2026-06-27 20:45:51 -07:00
xxxigm	6f1a176b33	fix(gateway/discord): REST liveness probe to detect zombie clients (#26656 ) The Discord adapter could enter a silent zombie state after a network outage / proxy stall: the process is alive, _client looks open, but the underlying socket is dead. discord.py's WebSocket reconnect never sees a RST through a wedged proxy/NAT, so client.start() spins forever without exiting — which means the bot-task done callback (which only fires on task completion) never trips either. The bot stays "offline" in Discord until a manual `hermes gateway restart`. Reported offline for 13-17h. Adds an out-of-band REST liveness probe in DiscordAdapter. Every `discord.liveness_interval_seconds` (default 60s) the adapter issues a cheap fetch_user(bot_id) — the same REST path as message delivery, so it fails when the proxy/NAT is wedged. After `discord.liveness_failure_threshold` consecutive failures (default 3) the probe closes the wedged client and surfaces a retryable fatal error, which trips the gateway's existing _platform_reconnect_watcher and rebuilds the adapter. Operators disable it by setting either knob to 0. Config lives in config.yaml (discord.liveness_) per the .env-is-secrets policy; _apply_yaml_config bridges it to internal env vars the adapter reads, matching the existing HERMES_DISCORD_TEXT_BATCH_ pattern. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-27 19:30:32 -07:00
Teknium	db16854f34	fix(telegram): surface failed media downloads to user and agent, not a silent empty turn (#53912 ) When a Telegram attachment download/cache fails (typically a transient httpx.ConnectError to Telegram's CDN), the except handler logged a warning and fell through to handle_message() with empty media and no text — the user thought the file was delivered, the agent saw a content-less turn with no signal an attachment was attempted, and the only record was a buried log line. Adds _surface_media_cache_failure(): replies to the user in Telegram so they know to retry, and appends an agent-visible notice to event.text via the existing _append_observed_note channel so the agent knows an attachment was attempted and failed. No new event fields (structured-event refactor is out of scope per #23045). Wired into all five cache-failure sites — photo, voice, audio, video, document — since they shared the identical silent fall-through. Bug 1 from #23045 (unsupported types routed as fake user messages) no longer exists on main: the document handler now accepts any file type, so there is no rejection branch to fix. Closes #23045	2026-06-27 19:12:57 -07:00
bykim0119	851f75d4df	fix(discord): honor "" wildcard in DISCORD_ALLOWED_USERS (#22334 ) DISCORD_ALLOWED_USERS="" now means "allow everyone", matching the SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and the value `claw migrate` emits. Previously _is_allowed_user did exact ID matching only, so "" matched no user and blocked every non-self sender — a P1 with no workaround. Three sites, all required for the fix to hold at runtime: - _is_allowed_user: short-circuit when "" is in the allowlist. - connect(): exclude "" from the intents.members trigger so the wildcard does not request the privileged Server Members intent (which can block the bot from coming online). - _resolve_allowed_usernames: preserve "" verbatim; otherwise it lands in the username-resolution bucket, matches no member, and is silently dropped from the set and env var on the first on_ready — quietly undoing the fix. Slash auth delegates to _is_allowed_user (auto-covered); component auth already honors "*" on main.	2026-06-27 19:11:30 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
Teknium	2ecca1e7d3	fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829 ) Follow-up to #53791 addressing review feedback: the footgun checker treated capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a Windows console. That invariant is false — stream redirection controls where a child's output goes, not whether a console is allocated. From a console-less parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem child still flashes a window even when fully captured. - check-windows-footguns.py: capture/redirect/check_output is no longer a blanket safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/ docker/powershell/…); calls to those are flagged even when captured. Non-flashing programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/ popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW). - Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional). - Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.	2026-06-27 14:49:41 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
Teknium	cd592c105c	feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598 ) send_message with MEDIA:/path to a WhatsApp target previously dropped the attachment: the WhatsApp branch never passed media_files, the plugin's _standalone_send accepted the param but only POSTed text, and WhatsApp was absent from the media-supported platform list. - send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that routes media_files through the whatsapp plugin's standalone_sender_fn, and add whatsapp to the supported-media list strings. - whatsapp adapter: _standalone_send now sends text first (skipped when the chunk is media-only), then uploads each file via the bridge /send-media endpoint with a mediaType derived from extension/is_voice/force_document, so images/videos/voice arrive as native bubbles instead of documents. - _bridge_media_type classifier maps ext -> image\|video\|audio\|document. Closes #19105 (remaining send_message gap). Other items in the report (inbound video paths, image_generate auto-deliver, history dedup, native gateway bubbles) already landed on main.	2026-06-27 04:40:05 -07:00
r266-tech	dbc925b755	Guard oversized Telegram video downloads	2026-06-27 04:39:48 -07:00
teknium1	7ee0b68973	fix(gateway,feishu): refuse executor resurrection during real shutdown Add an explicit _closing guard to both owned executors so the recreate-on-shutdown path only recovers from an external teardown of the loop default — never resurrects a pool the gateway/adapter itself stopped. _shutdown_executor() sets the flag; _get_executor() raises if closing; feishu connect() re-arms on reconnect. Updates the gateway recreate test to assert the refusal contract and adds feishu coverage.	2026-06-27 04:13:09 -07:00
teknium1	b296915c82	fix(feishu): route blocking SDK calls through an adapter-owned executor Feishu SDK calls ran on asyncio's shared default executor, so a torn-down default executor wedged every send with 'Executor shutdown has been called' and left the gateway a zombie (#10849). The adapter now owns a ThreadPoolExecutor recreated on demand if shut down, mirroring the gateway-owned executor change. Routes all 17 self._client SDK calls through _run_blocking; shuts the pool down on disconnect.	2026-06-27 04:13:09 -07:00
LeonSGP43	52a09d8faf	fix(byterover): honor auto extract config	2026-06-27 04:04:15 -07:00
teknium1	ab1f9b94c5	fix(telegram): accept @username chat_id in delivery paths (#13206 ) TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid literal for int()'. The Telegram Bot API accepts both a numeric chat_id and an @username string; Hermes was force-coercing every chat_id with int(). Add normalize_telegram_chat_id() (returns int for numeric values, passes @username strings through) and apply it at the Bot API send/edit sites in the Telegram adapter and the send_message tool. Username targets are now recognized as explicit targets in _parse_target_ref. Reapplies the approach from #13274 (season179), whose branch predated the gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah). Co-authored-by: season179 <season.saw@gmail.com>	2026-06-27 04:01:58 -07:00
Sahil-SS9	6fb25f86ac	fix(telegram): filter out bot's own messages from inbound processing (#52363 )	2026-06-27 03:56:52 -07:00
Mahesh Sanikommu	1b75b3fd90	feat(memory): add Supermemory setup connection summary Add post_setup() and get_status_config() to the Supermemory memory provider so `hermes memory setup` and `hermes memory status` print a one-line connection summary (container, profile fact count, auto_recall/auto_capture). Point API-key onboarding at the Hermes connect URL (app.supermemory.ai/integrations?connect=hermes). Salvage of #52988. Two fixes folded in: - Test isolation: the new probe/status tests mocked _SupermemoryClient but not the __import__("supermemory") guard inside _probe_supermemory_connection, so they passed only where the optional supermemory package was installed and failed on a clean checkout / CI (the PR shipped with red CI). Added _stub_supermemory_importable() mirroring the existing test_is_available_false_when_import_missing pattern; the suite now passes with supermemory absent. - post_setup: `if api_key and api_key not in os.environ` checked whether the key's value named an env var (always false in practice). Fixed to compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`. Verified: 38/38 in test_supermemory_provider.py and the full tests/plugins/memory/ suite green with supermemory not installed. Closes #52988	2026-06-27 15:07:34 +05:30
underthestars-zhy	8827300267	fix(photon): correlate tapbacks to bot message context Populate `reply_to_message_id`, `reply_to_text`, and `reply_to_is_own_message` on reaction events so the gateway injects `[Replying to your previous message: "..."]` when the agent receives a tapback. The sidecar now extracts a capped text preview from the hydrated reaction target (plain text and mixed group messages; null for attachment/voice-only targets), emitting it as `targetText` in the NDJSON reaction payload. The Python adapter reads this field and sets the reply correlation fields on the `MessageEvent`.	2026-06-27 00:51:34 -07:00
underthestars-zhy	4345b3e767	fix(photon): upgrade spectrum-ts sidecar to v8.0.0 v8 made `richlink` outbound-only; inbound rich links now arrive as plain `text`. Remove the `getBalloonBundleId`/`toRichlinkMessage` branches from the iMessage mapper patch and update the fixture, lockfile, and README accordingly.	2026-06-27 00:51:34 -07:00
underthestars-zhy	5636c22828	feat(photon): upgrade spectrum-ts sidecar to v7.0.0 Update the Photon platform plugin's Node.js sidecar from spectrum-ts 3.1.0 to 7.0.0, which splits the SDK into scoped `@spectrum-ts/*` packages with `spectrum-ts` as the umbrella re-export. - Bump exact pin in package.json/package-lock.json to 7.0.0 - Update mixed-attachments patch script to target the new `@spectrum-ts/imessage/dist/index.js` path and tab-indented output - Rewrite test fixture to match v7.x mapper shape (tab-indented, `const ... = async` declarations, single-line builder calls) and point at `@spectrum-ts/imessage/dist/index.js` - Update README upgrade guide to document the v5 package split and the postinstall patch validation step - Update comments in cli.py and index.mjs to reference v5/v7 changes	2026-06-27 00:51:34 -07:00
Ben Barclay	fbf748b282	fix(dashboard-auth): follow redirects on self-hosted OIDC discovery (#53399 ) The self-hosted OIDC provider fetched the discovery document with a bare httpx.get(). httpx defaults to follow_redirects=False (unlike curl -L or the requests library), so when an IDP answers GET /.well-known/openid-configuration with a 3xx — Authentik canonicalises the .well-known path, and any IDP behind a reverse proxy doing an http→https upgrade redirects too — the bare redirect (empty body) tripped the status != 200 guard and raised 'OIDC discovery returned 302', which routes.py maps to the provider_unreachable audit event and a 503. The browser surfaced 'Auth provider self-hosted unreachable'. The user's smoking gun (curl -o writing zero bytes from inside the container) is exactly a redirect with no body — the same wall the code hit. Add follow_redirects=True to the discovery GET only. It's safe: the issuer-pin check and _require_https_or_loopback still validate the resolved document and every endpoint, so a redirect can't smuggle in a bad issuer or a cleartext endpoint. The token/revocation POSTs deliberately keep the no-follow default (they carry an auth code / refresh token and the endpoint is already the canonical absolute URL). Existing discovery tests mocked httpx.get with a canned 200 and never exercised a real 3xx. Add a regression test that runs a real loopback server returning a 302 on the .well-known path — fails without the fix (ProviderError: discovery returned 302), passes with it.	2026-06-27 14:14:51 +10:00
Nacho Avecilla	dbe734beff	fix(dashboard-auth): exclude non-interactive providers from interactive login surfaces (#53239 ) * Return None instead of erroring on drain login failure * Fix login on drain * Remove login for drained endpoints flow and clean the code * chore: drop unrelated credits changes from this PR * Remove extra comments that were not really necessary	2026-06-27 10:08:13 +10:00
kshitijk4poor	6326d5c6f6	fix: remove duplicated table renderer from Telegram adapter The PR's original refactor commit only replaced the primitives (regex, is_table_row, split_markdown_table_row) with shared imports but left the verbatim-copied renderer (_render_table_block_for_telegram) and driver (_wrap_markdown_tables) in place. Both are logic-identical to the shared convert_table_to_bullets in gateway/platforms/helpers.py. Replace both with a direct import alias. _TABLE_SEPARATOR_RE is still imported separately because it's used by the rich-message routing logic (lines 1024, 1044) to detect whether content contains tables. Found by 3-agent parallel code-reuse review.	2026-06-27 03:57:24 +05:30
Yashiel Sookdeo	24a4df9cd1	refactor(telegram): import shared table-detection primitives from helpers.py Replace local _TABLE_SEPARATOR_RE, _is_table_row, and _split_markdown_table_row with imports from the shared module. Telegram-specific rendering stays local. Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>	2026-06-27 03:57:24 +05:30
Yashiel Sookdeo	cf7bf5bdc9	fix(discord): auto-convert markdown tables to bullet groups Discord does not render GFM pipe tables — raw pipe characters display as garbage text. format_message now rewrites tables into bold-heading + bullet groups using the shared helpers. Fixes #21168 Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>	2026-06-27 03:57:24 +05:30
liuhao1024	515192c4b9	fix(tools): use start_new_session instead of preexec_fn to prevent SIGSEGV in multi-threaded processes preexec_fn=os.setsid runs Python code in the forked child before exec, which is unsafe in multi-threaded processes (CPython docs). When the Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs) with active thread pools, the fork can SIGSEGV before the child execs. Replace all preexec_fn usage with start_new_session=True, which provides the same setsid/process-group semantics without running Python in the fork. This is already the pattern used throughout hermes_cli/gateway.py and hermes_cli/_subprocess_compat.py. Fixes #46789	2026-06-27 03:08:41 +05:30
Ben	2e322466b1	feat(dashboard-auth): drain shared-bearer-secret provider plugin Task 2.0b: the concrete shared-bearer-secret auth provider, the FIRST consumer of the generic token-auth capability (Task 2.0a). Implements decisions.md Q-A. plugins/dashboard_auth/drain/ (bundled, discovered like dashboard_auth/basic): - DrainSecretProvider: non-interactive provider, supports_token=True. Verifies an inbound Authorization bearer token against a per-agent shared secret with hmac.compare_digest (constant-time, no timing oracle) and, on a match, vouches for the caller as the "drain-control" principal scoped to "drain". The five interactive ABC methods raise NotImplementedError; verify_session returns None (stacks harmlessly in the cookie-verify loop). - assess_secret_strength(): fail-closed entropy gate. Rejects secrets shorter than 43 url-safe-b64 chars (~256 bits), with < 16 distinct characters, or below 128 bits Shannon entropy — so a weak/structured/repeated secret can never be silently accepted. Enforced both at register() (friendly skip reason) and in __init__ (raises — defence in depth). - register(ctx): no-op + skip reason when HERMES_DASHBOARD_DRAIN_SECRET is unset; rejects a weak secret fail-closed (drain endpoint stays gated). On a strong secret, registers the provider AND opts /api/gateway/drain into the generic token-auth seam via register_token_route(). Config: the secret is a CREDENTIAL → carried via HERMES_DASHBOARD_DRAIN_SECRET (per-agent, provisioned by NAS at deploy). Behavioural knobs only (dashboard.drain_auth.{scope,min_secret_chars}) live in config.yaml — added to DEFAULT_CONFIG with the .env-is-for-secrets rationale documented inline. Tests: tests/plugins/dashboard_auth/test_drain_provider.py — entropy gate (strong pass; empty/short/repeated/few-distinct/custom-min reject), verify_token (match → scoped principal, wrong/empty → None, custom scope), protocol compliance, interactive-methods-raise, and register() (skip-no-secret, fail-closed-weak-secret, strong-env-secret registers + route opt-in, config scope + min_secret_chars). 21 new tests; drain + token-auth suites 44 passed. Verified the plugin is discovered as dashboard_auth/drain alongside basic/nous. Intentionally deferred: - The begin/cancel-drain endpoint handler itself — Task 2.1. - The dashboard→gateway control channel — Task 2.2. Build status: dashboard-auth + drain-plugin suites green.	2026-06-26 00:47:19 -07:00
teknium1	43b8ba4181	fix(telegram): preserve Bot API update queue on watcher reconnect After a prolonged outage the in-process network-error ladder escalates to fatal and GatewayRunner._platform_reconnect_watcher rebuilds a fresh adapter that reconnects through the bootstrap path. That path called start_polling(drop_pending_updates=True), discarding every update Telegram queued during the outage — all messages sent while the bot was down were silently lost. The in-process ladder and 409-conflict handler already passed drop_pending_updates=False; only bootstrap did not distinguish a cold first boot from a reconnect. Thread an is_reconnect signal from the watcher through _connect_adapter_with_timeout into adapter.connect(). The base BasePlatformAdapter.connect() gains a keyword-only is_reconnect=False so every adapter inherits a tolerant signature (no per-platform breakage when the runner forwards the kwarg). Telegram translates is_reconnect into drop_pending_updates=not is_reconnect on both the polling and webhook bootstrap calls. Cold boot still drops the stale queue; a watcher reconnect preserves it. Fixes #46621. Co-authored-by: annguyenNous <annguyen@nousresearch.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com> Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>	2026-06-25 21:29:57 -07:00
teknium1	85e084d60d	fix(email): reject spoofed From: header for authorization (GHSA-rxqh-5572-8m77) The email adapter authorized senders entirely off the From: header, which is attacker-controlled and unauthenticated by IMAP. An attacker could forge From: an-allowlisted-address and pass both the adapter's EMAIL_ALLOWED_USERS pre-filter and the gateway's allowlist authz (both key on the same spoofable sender_addr), getting unauthorized commands executed by the agent. Verify the From: domain against the trusted Authentication-Results header the receiving mail server stamps (SPF/DKIM/DMARC) before trusting it for authorization. Enforced only when an allowlist is in effect and allow-all is off — fail-closed. Operators whose server does not stamp the header can opt out via platforms.email.require_authenticated_sender: false (or EMAIL_TRUST_FROM_HEADER=true).	2026-06-25 21:11:02 -07:00
Teknium	ce802e932c	fix(telegram): heartbeat loop exits cleanly when bot has no get_me CI shard test_telegram_conflict.py timed out (140s) because the new _polling_heartbeat_loop, started by connect(), busy-spun under those tests: they monkeypatch asyncio.sleep to instant and pass a bot double with no get_me(), so the probe raised AttributeError (swallowed) and the loop re-entered immediately with no real pacing, starving the event loop. Guard the loop to return when bot.get_me is not callable — a real PTB Bot always exposes it, so this only triggers on a torn-down app or a test double, where there is nothing to probe. Also cancel the heartbeat task in the conflict tests that call connect() without disconnect(), matching the production disconnect() teardown. Verified: test_telegram_conflict.py now runs in ~4.5s; the 22 heartbeat/reconnect tests still pass; E2E confirms a hanging get_me still fires the reconnect ladder while a missing get_me exits without spinning.	2026-06-25 18:50:11 -07:00
agt-user	8501caf51f	fix(telegram): persistent heartbeat loop to detect CLOSE-WAIT polling sockets When a Telegram long-poll TCP socket enters CLOSE-WAIT (remote sent FIN but httpx hasn't noticed), epoll still reports it readable so no exception is raised. PTB's error_callback never fires, the reconnect ladder never engages, and the gateway silently stops receiving messages while the process stays alive — until a manual systemctl restart. The existing recovery only covers two cases: error_callback-driven reconnects (which require an exception PTB never gets) and a one-shot _verify_polling_after_reconnect probe (which runs only right after an explicit reconnect). A socket that wedges during steady-state operation is never detected. Add _polling_heartbeat_loop: a background asyncio.Task started in connect() (polling mode only) that probes get_me() every 90s on the general request pool (not the getUpdates pool, so healthy long-polls are never interrupted). On asyncio.TimeoutError/OSError it hands off to the existing _handle_polling_network_error ladder; other errors are swallowed. disconnect() cancels and awaits the task. Worst-case detection window ~105s. Complementary to #51541 (general-pool keepalive limits / fd leak) — that recycles idle pooled connections; this detects a wedged active read. Fixes #48495 Co-authored-by: agt-user <267614622+agt-user@users.noreply.github.com>	2026-06-25 18:50:11 -07:00
infinitycrew39	9d225fbf4e	fix(telegram): auto-rich pipe tables and topic routing for sendRichMessage Pipe-only markdown tables now use sendRichMessage even when rich_messages is off, and resumed DM-topic sends route via direct_messages_topic_id without requiring a reply anchor. Rich finalize edits forward topic kwargs.	2026-06-25 13:10:54 -07:00
rob-maron	2c02583c2b	fix shape	2026-06-25 12:38:33 -07:00
rob-maron	525ee58b43	krea	2026-06-25 12:38:33 -07:00
xxxigm	0aea0c3654	fix(utils): unify YAML list indent across all config writers (#31999 ) atomic_yaml_write used default yaml.dump which emits indentless sequences (list items at column 0), while atomic_roundtrip_yaml_update (ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to the same config.yaml toggled indentation on every save, eventually producing a mixed-indent file that js-yaml rejects with 'bad indentation of a mapping entry', silently dropping custom_providers and breaking model switching. Add IndentDumper SafeDumper subclass that forces indentless=False, route atomic_yaml_write through it. Route tui_gateway._save_cfg and the Telegram adapter's config writer through atomic_yaml_write so all paths emit the same 2-indent layout. Salvaged from #32034 by @xxxigm. Adapted to current main which already has allow_unicode=True (from #51356) but was missing IndentDumper. Closes #31999	2026-06-25 23:27:44 +05:30
Brooklyn Nicholson	7078d9d1e2	fix(pets): raise generation timeouts for the slow quality-first model path The quality-first default (OpenAI image via OpenRouter) is slow, and a full hatch fans out ~8 rows with up to 3 retries each (300s/call) across 2 parallel waves, so the absolute backend worst case is ~30 min. The old ceilings fired mid-run: - per-image HTTP call: 180s -> 300s (a single cold row can exceed 3 min) - drafts RPC: 240s -> 420s (single wave, no retries — 7 min is ample) - hatch RPC: 420s -> 1hr (sits above the ~30 min backend worst case) The hatch ceiling is intentionally well above the realistic max so the frontend never throws "request timed out" before the backend has exhausted its own retries. The background-resumable notification path remains the real UX safety net — the user can close the modal and get pinged on completion.	2026-06-25 00:34:52 -05:00
brooklyn!	0c442fa1d3	Merge pull request #52303 from NousResearch/bb/pets-gen-qa feat(pets): quality-first OpenRouter chain, stronger atlas gates, global pet-gen notifications	2026-06-24 23:16:40 -05:00
Brooklyn Nicholson	e92b5c6af8	feat(pets): quality-first OpenRouter model chain + stronger atlas gates + global pet-gen notifications OpenRouter/Nous image gen now runs a quality-first model chain by default: attempt the highest-fidelity OpenAI image model first, then fall back to Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback. Atlas validation rejects malformed model output instead of shipping it: adds a per-state collapse guard (a single sliver/fragment row no longer passes because other rows are healthy), on top of the existing postage-stamp + multi-pose checks. Desktop: pet-gen native notifications are now "global" (not tied to a chat session), so a background generation started from the command center fires an OS notification when the user is away even with no active session. Adds a neutral "This can take up to 5 minutes." banner on step 1, and lets the provider picker auto-size. Tests updated/added for the OpenRouter fallback chain, the collapse guard, and the global notification path.	2026-06-24 23:11:21 -05:00
helix4u	17beb55e3c	fix(telegram): gate rich draft previews separately	2026-06-24 18:11:14 -07:00
brooklyn!	7157b213f5	Merge pull request #47959 from NousResearch/bb/pets-gen Pet generation: frame-perfect hatch flow, backend picker, CPU-safe chroma, and CI-hardening	2026-06-24 19:41:34 -05:00
liuhao1024	404b06ac4f	fix(gateway): honor server retry_after in _send_with_retry for Telegram flood control (#46762 ) When Telegram's sendRichMessage returns a FloodWait/RetryAfter error, _try_send_rich() now extracts the server-provided retry_after value and propagates it through SendResult.retry_after. The base _send_with_retry() layer honors this value instead of using its default short exponential backoff (~2s, ~4s), preventing the retry budget from being exhausted against a server that demands a 25-37s wait. Salvaged from #46774 by @liuhao1024. Telegram adapter path moved from gateway/platforms/telegram.py to plugins/platforms/telegram/adapter.py since the original PR. Closes #46762	2026-06-25 02:43:47 +05:30
uzunkuyruk	489b85ee1e	fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776 ) A single ddgs (DuckDuckGo) search could hang indefinitely and block the shared agent loop — and therefore every platform (CLI, Telegram, Matrix...). The DDGS constructor's timeout only bounds individual HTTP requests; ddgs's multi-engine retry loop has no overall cap, so a slow/rate-limited response could spin for 20+ minutes with no output and no error. Run the synchronous ddgs call in a single-worker ThreadPoolExecutor and cap it with future.result(timeout=_SEARCH_TIMEOUT_SECS=30). On timeout, return a clear failure ("DuckDuckGo search timed out ... try a different provider") instead of blocking; the pool is shut down with cancel_futures so a hung worker is never awaited. Salvaged from #37422 by @uzunkuyruk (authorship preserved). Re-applied on current main (the PR's provider.py base had diverged). Added a load-bearing timeout regression test (the original PR only updated the fake's constructor and had no timeout-behavior test) — mutation-verified to fail without the cap. Closes #36776.	2026-06-25 01:45:06 +05:30
Brooklyn Nicholson	3faf768cde	feat(pets): OpenRouter + Nous Portal image backend Reference-grounded image provider over the OpenRouter-compatible chat-completions image protocol (Gemini Flash Image et al.). Nous Portal proxies OpenRouter, so one provider serves both — giving pet generation a reference-capable backend beyond OpenAI gpt-image.	2026-06-24 13:48:38 -05:00
kshitij	c42d44cb2f	revert(plugins): restore user dashboard plugin backend API auto-import (#43719 ) (#51950 ) * Revert "refactor(security): centralize non-bundled plugin sources in one constant" This reverts commit `e2bea0abe6`. * Revert "fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)" This reverts commit `8845f3316c`.	2026-06-24 07:46:54 -07:00
kshitijk4poor	ab9134bf16	feat(openviking): add full recall prefetch policy Salvage of PR #48927 by @ehz0ah, which consolidates OpenViking recall work from #41706 (@huangxun375-stack), #33260, #49975, and #32444. Replaces stale background post-turn prefetch warming with synchronous current-query recall. The old queue_prefetch warmed the PREVIOUS user message while turn-start recall consumed the CURRENT one, so injected context was always about the wrong topic. Changes: - prefetch() now does session-aware /api/v1/search/search with the current query, falls back to /api/v1/search/find on failure - Contract-safe payloads: limit, score_threshold, context_type, session_id — no top_k, no search-body mode, no target_uri - L2 content reads for items with level=2 or empty abstracts, capped at full_read_limit (default 2) - Local ranking (score + query-token overlap + leaf boost), dedup, score threshold, and injected-char budget - queue_prefetch() is now a no-op (background warming removed) - Additive batched viking_read: uris param accepts up to 3 URIs - Per-request timeout support on _VikingClient.get/post/delete - Removes stale _prefetch_result/_prefetch_thread/_prefetch_generation state and _invalidate_prefetch_state() - Strengthened system_prompt_block guidance Salvage follow-up fixes: - Expose all 8 recall config knobs in get_config_schema() (PR #48927 had removed them; #41706 correctly exposed them). Env vars remain as internal mechanism but are now visible in setup wizard. - Lower default timeout 8s→4s, request_timeout 6s→3s, full_read_limit 3→2 to reduce per-turn blocking latency. Co-authored-by: Hao Zhe <haozhe4547@gmail.com> Co-authored-by: Eurekaxun <eurekaxun@163.com>	2026-06-24 18:53:49 +05:30
Teknium	a911bcda18	docs: stop recommending pip install; curl installer is the only supported path (#51743 ) * docs: stop recommending pip install hermes-agent; point to install script The install script is the only supported install path (it provisions a managed, isolated uv environment). Replace bare `pip install hermes-agent` primary-install recommendations with the curl install script, and rewrite optional-extra snippets (`pip install "hermes-agent[X]"`) to the managed-env form `cd ~/.hermes/hermes-agent && uv pip install -e ".[X]"` that matches the installer and the English quickstart. Covers English docs + zh-Hans mirrors, the achievements plugin README, and realigns the zh-Hans quickstart to the English Desktop-installer-first layout (dropping its stale "Method A — pip (simplest)" section). * docs: drop pip as a supported install/update method Removes the 'pip installs' supported-method sections from updating.md and cli-commands.md (EN + zh-Hans): the curl install script is the only supported way to install/update the Hermes CLI. The _cmd_update_pip pip/pipx branches remain in code as an undocumented safety net for users who already have such an install, but the docs no longer advertise pip as a path. Also normalizes a bare `pip install -e '.[acp]'` to the managed-env form. Leaves python-library.md untouched: importing AIAgent as a library dependency into your own project is a distinct use case where pip is correct.	2026-06-24 00:14:32 -07:00
Tranquil-Flow	73a20a6ad6	fix(telegram): clip mid-stream overflow instead of splitting (#48648 )	2026-06-24 00:00:46 -07:00
justemu	4aa793345e	fix(matrix): use member_count as DM signal for named DM rooms Most Matrix clients auto-set a room name when creating a DM (e.g. "Alice & Bot" from participant display names), so the old `is_direct and not has_explicit_name` heuristic classified virtually all client-created DM rooms as "room", forcing require_mention gating in legitimate one-on-one DMs. member_count is now the primary DM signal: <=2 members means the room is necessarily a 1:1 conversation, regardless of m.direct or an explicit name. A room that grew to 3+ members but is still in stale m.direct is still classified as a room (conflict flag set). Falls back to the m.direct + name heuristic when the count is unavailable. Also hardens _get_room_member_count with a joined_members API fallback when the cache-backed state_store is empty. Salvaged from #48554 by @justemu onto the current plugin adapter path (gateway/platforms/matrix.py -> plugins/platforms/matrix/adapter.py). Fixes #48551	2026-06-23 23:57:38 -07:00
liuhao1024	7ff48a6291	fix(discord): check pairing store for component button auth Component button interactions (approve/deny, slash confirm, model picker, clarify) were not checking the pairing store for authorization. Users approved via `hermes pairing approve` could send messages and use slash commands (which go through the gateway authz_mixin), but button clicks were rejected because `_component_check_auth` only checked env-var allowlists (DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS, etc.) and not the pairing store. This was a regression from commit `f6f363662` which intentionally made component auth fail-closed when no allowlist is set (security fix for GHSA-mc26-p6fw-7pp6), but did not account for pairing-based auth. Fix: add a `PairingStore.is_approved("discord", uid)` check to `_component_check_auth`, mirroring `authz_mixin._check_authorization`. The pairing store check runs after all allowlist checks, preserving the fail-closed behavior for non-paired, non-allowed users. Fixes #50627	2026-06-23 23:55:18 -07:00
teknium1	d4be583d98	fix(telegram): raise default command-menu cap to 60 so skills stay visible The 30-slot default could not fit Hermes's ~50 built-in commands, so every skill command (and 20 built-ins) were silently dropped from the Telegram \`/\` menu by default — they only worked when typed manually. Raising the default to 60 keeps all built-ins plus common skill commands visible out of the box while staying under Telegram's ~4KB payload limit. Users can still tune it via platforms.telegram.extra.command_menu.	2026-06-23 23:49:22 -07:00

1 2 3 4 5 ...

614 commits