hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Teknium	74541beb9c	fix(security): cap WeCom callback body size before pre-auth XML parse (#54615 ) The WeCom callback endpoint (internet-facing, 0.0.0.0) parsed untrusted request bodies before signature verification. defusedxml already guards the entity-expansion class on main, but there was no cap on raw body size, so an unauthenticated POST could still force unbounded read work pre-auth. Set client_max_size=64KB on the aiohttp app (413 at the framework layer) plus an explicit length guard in _handle_callback as defense in depth. WeCom callbacks are small encrypted XML envelopes — media is delivered out-of-band via MediaId, never inline — so 64KB is ample for legitimate traffic. Adds tests for oversized (413) and normal-sized (not 413) bodies. Salvaged from #10192 by @memosr (body-size limit half; defusedxml half already superseded on main).	2026-06-28 22:35:43 -07:00
aaronagent	d836b2bac4	fix(matrix,mattermost): invite auth check + API path traversal guard Two platform-security hardenings: - Matrix: _on_invite now checks the inviter against the existing allow-list (_allowed_user_ids / GATEWAY_ALLOW_ALL_USERS) before auto-joining. Without this any federated Matrix user could invite the bot into arbitrary rooms, exposing its presence and metadata. The message and reaction paths already enforce this allow-list; the invite path bypassed it. - Mattermost: _api_get / _api_post / _api_put reject any path containing '..'. WebSocket-event values (channel_id, post_id, file_id) are interpolated directly into API paths, so a malicious or compromised server could craft traversal payloads to make the bot issue authenticated requests to arbitrary endpoints with its bearer token. The configurable-E2EE-passphrase change from the original PR is dropped: the matrix adapter was rewritten onto mautrix and the passphrase-protected key-export file no longer exists.	2026-06-28 20:47:33 -07:00
teknium1	c648ecdca5	fix(telegram): reject unauthorized users before event construction (#40863 ) Removed/unauthorized Telegram users could inject prompt content before the per-user auth gate fired. The adapter ran `_should_process_message`, `_build_message_event`, and text/photo batching — and dispatched to the runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected the sender. Unmentioned group chatter from a removed user was also persisted into the session transcript via `_observe_unmentioned_group_message`, leaking into the agent's observed context independent of dispatch. Add `_is_user_authorized_from_message()` as an intake prefilter that runs in `_handle_text_message`, `_handle_command`, `_handle_location_message`, and `_handle_media_message` BEFORE batching, event construction, and the unmentioned-group observe branch. It reuses the runner's `_is_user_authorized()` with a correctly-shaped SessionSource (group vs forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists), falls back to env allowlists, and only rejects when an allowlist actually exists — unknown DMs with no allowlist still reach the pairing flow. Channel posts authorize via `sender_chat` identity when `from_user` is absent. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com> Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>	2026-06-28 14:25:15 -07:00
Brooklyn Nicholson	eeca59f489	fix(windows): hide remaining backend console-flash legs missed on main main (`cb982ad99`) wired windows_hide_flags() into the auxiliary git/gh/wmic/ bash/powershell/taskkill legs but left two it didn't reach, plus the Electron backend-launch leg it explicitly deferred. Cover them the same way: - apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a console python.exe and flashes a conhost the desktop backend can't suppress. Both backend creators put the venv site-packages on PYTHONPATH so imports still resolve under the base interpreter. (main's commit said this Electron leg "needs a Windows-tested change of its own".) - tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord: ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags(). - plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via windows_hide_flags(). All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg legs pass CREATE_NO_WINDOW.	2026-06-28 10:19:21 -05:00
teknium1	d5ba374c03	fix(telegram): detect wedged getUpdates consumer via pending_update_count The merged CLOSE-WAIT heartbeat (#52744) only probes get_me(), which uses the general request path and stays healthy while PTB's getUpdates consumer is silently wedged (updater.running=True but the long-poll task is stuck, observed on WSL2). DMs then queue in the Bot API and never reach handlers (#42909). Augment the existing _polling_heartbeat_loop to also probe get_webhook_info().pending_update_count. After two consecutive probes that see a non-draining queue while the updater claims to be running, escalate into the existing _handle_polling_network_error recovery ladder — no new restart machinery. No-ops in webhook mode, when the updater is not running, or when a reconnect is already in flight. Credit to @gazzumatteo, whose PR #42959 identified the pending_update_count signal as the missing liveness probe. This reuses the existing heartbeat + recovery path rather than adding a parallel watchdog. Fixes #42909.	2026-06-28 02:44:17 -07:00
Teknium	7c0a5def58	fix(memory/holographic): close DB connection on shutdown instead of leaking to GC (#54133 ) HolographicMemoryProvider.shutdown() dropped its MemoryStore reference without calling the existing MemoryStore.close(). Since the connection is opened check_same_thread=False (one per session), its fd was released by refcount/GC at a non-deterministic time on a non-deterministic thread, churning a DB fd through the kernel free pool on every session teardown. Call close() so the fd is released deterministically. Reported by @alfranli123 (#44037), who pinpointed the exact code location. Note: the report's TLS-fd-recycle corruption attribution could not be reproduced from the code — dropping a sqlite connection flushes valid SQLite pages via the VFS, never TLS framing, and the provider is at most a releaser of DB fds, not a TLS-flushing socket owner. This change is correct resource hygiene that removes per-session fd churn regardless.	2026-06-28 02:41:52 -07:00
liuhao1024	14baeefe1d	fix(matrix): record DM rooms in m.direct on invite to prevent group misclassification Rebase onto plugins/platforms/matrix/adapter.py (code moved from gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct on invite events and calls _record_dm_room to persist in m.direct account data. Fixes #44679	2026-06-28 02:37:52 -07:00
yungchentang	7e2ca7f68d	fix(telegram): reset send pool after pool timeouts	2026-06-28 02:34:17 -07:00
Teknium	2ecb6f7fe6	fix(telegram): clear send_path_degraded on successful reconnect (#35205 ) (#54076 ) * fix(telegram): clear send_path_degraded on successful reconnect _send_path_degraded was cleared only in _verify_polling_after_reconnect, 60s after reconnect and only if scheduled. A clean start_polling() reconnect left the flag stuck True, short-circuiting send() and blocking all outbound messages until the deferred probe ran (or forever if it never did). Clear the flag the moment start_polling() succeeds — that is the recovery signal. The deferred probe remains a defensive re-check that re-enters the reconnect ladder (re-setting the flag) if it detects a silent wedge. Fixes #35205. * docs: add infographic for #35205 telegram send-path fix	2026-06-28 01:38:17 -07:00
konsisumer	3f543229f2	fix(telegram): notify user when clarify button tap arrives after expiry	2026-06-28 01:07:53 -07:00
sweetcornna	fc70d023d8	fix(telegram): apply bot auth policy to Telegram sources # Conflicts: # gateway/config.py	2026-06-28 00:57:03 -07:00
Teknium	f03823014b	fix(telegram): kill 409 polling conflict loop by disarming PTB retry synchronously (#53941 ) Telegram polling entered a self-inflicted ~31s loop of 409 Conflict -> retry -> resume -> Conflict. The error_callback PTB invokes synchronously inside its internal network_retry_loop only scheduled our async recovery task (loop.create_task) and returned, so PTB kept polling getUpdates on its own while our handler concurrently ran stop -> sleep -> start_polling. The two polling sessions overlapped and Telegram returned a fresh 409. Fix: in the conflict branch of the error_callback, synchronously set PTB's private polling stop_event before scheduling recovery. PTB's loop exits on its next tick (it races that event in do_action), so our handler owns polling alone. The handler's await updater.stop() drains the task and PTB clears the event, so the subsequent start_polling() builds a fresh event and is not poisoned. Keeps the existing reconnect ladder intact (option B) — fixes only the race. Defensive: probes mangled + unmangled stop_event spellings and no-ops (prior behaviour) if neither exists; never flips _running, which would make the handler skip stop() and leave the loop wedged.	2026-06-27 20:46:08 -07:00
konsisumer	11b0be8d15	fix(gateway): avoid Matrix pending invite boot loops	2026-06-27 20:45:51 -07:00
xxxigm	6f1a176b33	fix(gateway/discord): REST liveness probe to detect zombie clients (#26656 ) The Discord adapter could enter a silent zombie state after a network outage / proxy stall: the process is alive, _client looks open, but the underlying socket is dead. discord.py's WebSocket reconnect never sees a RST through a wedged proxy/NAT, so client.start() spins forever without exiting — which means the bot-task done callback (which only fires on task completion) never trips either. The bot stays "offline" in Discord until a manual `hermes gateway restart`. Reported offline for 13-17h. Adds an out-of-band REST liveness probe in DiscordAdapter. Every `discord.liveness_interval_seconds` (default 60s) the adapter issues a cheap fetch_user(bot_id) — the same REST path as message delivery, so it fails when the proxy/NAT is wedged. After `discord.liveness_failure_threshold` consecutive failures (default 3) the probe closes the wedged client and surfaces a retryable fatal error, which trips the gateway's existing _platform_reconnect_watcher and rebuilds the adapter. Operators disable it by setting either knob to 0. Config lives in config.yaml (discord.liveness_) per the .env-is-secrets policy; _apply_yaml_config bridges it to internal env vars the adapter reads, matching the existing HERMES_DISCORD_TEXT_BATCH_ pattern. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-27 19:30:32 -07:00
Teknium	db16854f34	fix(telegram): surface failed media downloads to user and agent, not a silent empty turn (#53912 ) When a Telegram attachment download/cache fails (typically a transient httpx.ConnectError to Telegram's CDN), the except handler logged a warning and fell through to handle_message() with empty media and no text — the user thought the file was delivered, the agent saw a content-less turn with no signal an attachment was attempted, and the only record was a buried log line. Adds _surface_media_cache_failure(): replies to the user in Telegram so they know to retry, and appends an agent-visible notice to event.text via the existing _append_observed_note channel so the agent knows an attachment was attempted and failed. No new event fields (structured-event refactor is out of scope per #23045). Wired into all five cache-failure sites — photo, voice, audio, video, document — since they shared the identical silent fall-through. Bug 1 from #23045 (unsupported types routed as fake user messages) no longer exists on main: the document handler now accepts any file type, so there is no rejection branch to fix. Closes #23045	2026-06-27 19:12:57 -07:00
bykim0119	851f75d4df	fix(discord): honor "" wildcard in DISCORD_ALLOWED_USERS (#22334 ) DISCORD_ALLOWED_USERS="" now means "allow everyone", matching the SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and the value `claw migrate` emits. Previously _is_allowed_user did exact ID matching only, so "" matched no user and blocked every non-self sender — a P1 with no workaround. Three sites, all required for the fix to hold at runtime: - _is_allowed_user: short-circuit when "" is in the allowlist. - connect(): exclude "" from the intents.members trigger so the wildcard does not request the privileged Server Members intent (which can block the bot from coming online). - _resolve_allowed_usernames: preserve "" verbatim; otherwise it lands in the username-resolution bucket, matches no member, and is silently dropped from the set and env var on the first on_ready — quietly undoing the fix. Slash auth delegates to _is_allowed_user (auto-covered); component auth already honors "*" on main.	2026-06-27 19:11:30 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
Teknium	2ecca1e7d3	fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829 ) Follow-up to #53791 addressing review feedback: the footgun checker treated capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a Windows console. That invariant is false — stream redirection controls where a child's output goes, not whether a console is allocated. From a console-less parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem child still flashes a window even when fully captured. - check-windows-footguns.py: capture/redirect/check_output is no longer a blanket safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/ docker/powershell/…); calls to those are flagged even when captured. Non-flashing programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/ popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW). - Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional). - Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.	2026-06-27 14:49:41 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
Teknium	cd592c105c	feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598 ) send_message with MEDIA:/path to a WhatsApp target previously dropped the attachment: the WhatsApp branch never passed media_files, the plugin's _standalone_send accepted the param but only POSTed text, and WhatsApp was absent from the media-supported platform list. - send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that routes media_files through the whatsapp plugin's standalone_sender_fn, and add whatsapp to the supported-media list strings. - whatsapp adapter: _standalone_send now sends text first (skipped when the chunk is media-only), then uploads each file via the bridge /send-media endpoint with a mediaType derived from extension/is_voice/force_document, so images/videos/voice arrive as native bubbles instead of documents. - _bridge_media_type classifier maps ext -> image\|video\|audio\|document. Closes #19105 (remaining send_message gap). Other items in the report (inbound video paths, image_generate auto-deliver, history dedup, native gateway bubbles) already landed on main.	2026-06-27 04:40:05 -07:00
r266-tech	dbc925b755	Guard oversized Telegram video downloads	2026-06-27 04:39:48 -07:00
teknium1	7ee0b68973	fix(gateway,feishu): refuse executor resurrection during real shutdown Add an explicit _closing guard to both owned executors so the recreate-on-shutdown path only recovers from an external teardown of the loop default — never resurrects a pool the gateway/adapter itself stopped. _shutdown_executor() sets the flag; _get_executor() raises if closing; feishu connect() re-arms on reconnect. Updates the gateway recreate test to assert the refusal contract and adds feishu coverage.	2026-06-27 04:13:09 -07:00
teknium1	b296915c82	fix(feishu): route blocking SDK calls through an adapter-owned executor Feishu SDK calls ran on asyncio's shared default executor, so a torn-down default executor wedged every send with 'Executor shutdown has been called' and left the gateway a zombie (#10849). The adapter now owns a ThreadPoolExecutor recreated on demand if shut down, mirroring the gateway-owned executor change. Routes all 17 self._client SDK calls through _run_blocking; shuts the pool down on disconnect.	2026-06-27 04:13:09 -07:00
LeonSGP43	52a09d8faf	fix(byterover): honor auto extract config	2026-06-27 04:04:15 -07:00
teknium1	ab1f9b94c5	fix(telegram): accept @username chat_id in delivery paths (#13206 ) TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid literal for int()'. The Telegram Bot API accepts both a numeric chat_id and an @username string; Hermes was force-coercing every chat_id with int(). Add normalize_telegram_chat_id() (returns int for numeric values, passes @username strings through) and apply it at the Bot API send/edit sites in the Telegram adapter and the send_message tool. Username targets are now recognized as explicit targets in _parse_target_ref. Reapplies the approach from #13274 (season179), whose branch predated the gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah). Co-authored-by: season179 <season.saw@gmail.com>	2026-06-27 04:01:58 -07:00
Sahil-SS9	6fb25f86ac	fix(telegram): filter out bot's own messages from inbound processing (#52363 )	2026-06-27 03:56:52 -07:00
Mahesh Sanikommu	1b75b3fd90	feat(memory): add Supermemory setup connection summary Add post_setup() and get_status_config() to the Supermemory memory provider so `hermes memory setup` and `hermes memory status` print a one-line connection summary (container, profile fact count, auto_recall/auto_capture). Point API-key onboarding at the Hermes connect URL (app.supermemory.ai/integrations?connect=hermes). Salvage of #52988. Two fixes folded in: - Test isolation: the new probe/status tests mocked _SupermemoryClient but not the __import__("supermemory") guard inside _probe_supermemory_connection, so they passed only where the optional supermemory package was installed and failed on a clean checkout / CI (the PR shipped with red CI). Added _stub_supermemory_importable() mirroring the existing test_is_available_false_when_import_missing pattern; the suite now passes with supermemory absent. - post_setup: `if api_key and api_key not in os.environ` checked whether the key's value named an env var (always false in practice). Fixed to compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`. Verified: 38/38 in test_supermemory_provider.py and the full tests/plugins/memory/ suite green with supermemory not installed. Closes #52988	2026-06-27 15:07:34 +05:30
underthestars-zhy	8827300267	fix(photon): correlate tapbacks to bot message context Populate `reply_to_message_id`, `reply_to_text`, and `reply_to_is_own_message` on reaction events so the gateway injects `[Replying to your previous message: "..."]` when the agent receives a tapback. The sidecar now extracts a capped text preview from the hydrated reaction target (plain text and mixed group messages; null for attachment/voice-only targets), emitting it as `targetText` in the NDJSON reaction payload. The Python adapter reads this field and sets the reply correlation fields on the `MessageEvent`.	2026-06-27 00:51:34 -07:00
underthestars-zhy	4345b3e767	fix(photon): upgrade spectrum-ts sidecar to v8.0.0 v8 made `richlink` outbound-only; inbound rich links now arrive as plain `text`. Remove the `getBalloonBundleId`/`toRichlinkMessage` branches from the iMessage mapper patch and update the fixture, lockfile, and README accordingly.	2026-06-27 00:51:34 -07:00
underthestars-zhy	5636c22828	feat(photon): upgrade spectrum-ts sidecar to v7.0.0 Update the Photon platform plugin's Node.js sidecar from spectrum-ts 3.1.0 to 7.0.0, which splits the SDK into scoped `@spectrum-ts/*` packages with `spectrum-ts` as the umbrella re-export. - Bump exact pin in package.json/package-lock.json to 7.0.0 - Update mixed-attachments patch script to target the new `@spectrum-ts/imessage/dist/index.js` path and tab-indented output - Rewrite test fixture to match v7.x mapper shape (tab-indented, `const ... = async` declarations, single-line builder calls) and point at `@spectrum-ts/imessage/dist/index.js` - Update README upgrade guide to document the v5 package split and the postinstall patch validation step - Update comments in cli.py and index.mjs to reference v5/v7 changes	2026-06-27 00:51:34 -07:00
Ben Barclay	fbf748b282	fix(dashboard-auth): follow redirects on self-hosted OIDC discovery (#53399 ) The self-hosted OIDC provider fetched the discovery document with a bare httpx.get(). httpx defaults to follow_redirects=False (unlike curl -L or the requests library), so when an IDP answers GET /.well-known/openid-configuration with a 3xx — Authentik canonicalises the .well-known path, and any IDP behind a reverse proxy doing an http→https upgrade redirects too — the bare redirect (empty body) tripped the status != 200 guard and raised 'OIDC discovery returned 302', which routes.py maps to the provider_unreachable audit event and a 503. The browser surfaced 'Auth provider self-hosted unreachable'. The user's smoking gun (curl -o writing zero bytes from inside the container) is exactly a redirect with no body — the same wall the code hit. Add follow_redirects=True to the discovery GET only. It's safe: the issuer-pin check and _require_https_or_loopback still validate the resolved document and every endpoint, so a redirect can't smuggle in a bad issuer or a cleartext endpoint. The token/revocation POSTs deliberately keep the no-follow default (they carry an auth code / refresh token and the endpoint is already the canonical absolute URL). Existing discovery tests mocked httpx.get with a canned 200 and never exercised a real 3xx. Add a regression test that runs a real loopback server returning a 302 on the .well-known path — fails without the fix (ProviderError: discovery returned 302), passes with it.	2026-06-27 14:14:51 +10:00
Nacho Avecilla	dbe734beff	fix(dashboard-auth): exclude non-interactive providers from interactive login surfaces (#53239 ) * Return None instead of erroring on drain login failure * Fix login on drain * Remove login for drained endpoints flow and clean the code * chore: drop unrelated credits changes from this PR * Remove extra comments that were not really necessary	2026-06-27 10:08:13 +10:00
kshitijk4poor	6326d5c6f6	fix: remove duplicated table renderer from Telegram adapter The PR's original refactor commit only replaced the primitives (regex, is_table_row, split_markdown_table_row) with shared imports but left the verbatim-copied renderer (_render_table_block_for_telegram) and driver (_wrap_markdown_tables) in place. Both are logic-identical to the shared convert_table_to_bullets in gateway/platforms/helpers.py. Replace both with a direct import alias. _TABLE_SEPARATOR_RE is still imported separately because it's used by the rich-message routing logic (lines 1024, 1044) to detect whether content contains tables. Found by 3-agent parallel code-reuse review.	2026-06-27 03:57:24 +05:30
Yashiel Sookdeo	24a4df9cd1	refactor(telegram): import shared table-detection primitives from helpers.py Replace local _TABLE_SEPARATOR_RE, _is_table_row, and _split_markdown_table_row with imports from the shared module. Telegram-specific rendering stays local. Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>	2026-06-27 03:57:24 +05:30
Yashiel Sookdeo	cf7bf5bdc9	fix(discord): auto-convert markdown tables to bullet groups Discord does not render GFM pipe tables — raw pipe characters display as garbage text. format_message now rewrites tables into bold-heading + bullet groups using the shared helpers. Fixes #21168 Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>	2026-06-27 03:57:24 +05:30
liuhao1024	515192c4b9	fix(tools): use start_new_session instead of preexec_fn to prevent SIGSEGV in multi-threaded processes preexec_fn=os.setsid runs Python code in the forked child before exec, which is unsafe in multi-threaded processes (CPython docs). When the Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs) with active thread pools, the fork can SIGSEGV before the child execs. Replace all preexec_fn usage with start_new_session=True, which provides the same setsid/process-group semantics without running Python in the fork. This is already the pattern used throughout hermes_cli/gateway.py and hermes_cli/_subprocess_compat.py. Fixes #46789	2026-06-27 03:08:41 +05:30
Ben	2e322466b1	feat(dashboard-auth): drain shared-bearer-secret provider plugin Task 2.0b: the concrete shared-bearer-secret auth provider, the FIRST consumer of the generic token-auth capability (Task 2.0a). Implements decisions.md Q-A. plugins/dashboard_auth/drain/ (bundled, discovered like dashboard_auth/basic): - DrainSecretProvider: non-interactive provider, supports_token=True. Verifies an inbound Authorization bearer token against a per-agent shared secret with hmac.compare_digest (constant-time, no timing oracle) and, on a match, vouches for the caller as the "drain-control" principal scoped to "drain". The five interactive ABC methods raise NotImplementedError; verify_session returns None (stacks harmlessly in the cookie-verify loop). - assess_secret_strength(): fail-closed entropy gate. Rejects secrets shorter than 43 url-safe-b64 chars (~256 bits), with < 16 distinct characters, or below 128 bits Shannon entropy — so a weak/structured/repeated secret can never be silently accepted. Enforced both at register() (friendly skip reason) and in __init__ (raises — defence in depth). - register(ctx): no-op + skip reason when HERMES_DASHBOARD_DRAIN_SECRET is unset; rejects a weak secret fail-closed (drain endpoint stays gated). On a strong secret, registers the provider AND opts /api/gateway/drain into the generic token-auth seam via register_token_route(). Config: the secret is a CREDENTIAL → carried via HERMES_DASHBOARD_DRAIN_SECRET (per-agent, provisioned by NAS at deploy). Behavioural knobs only (dashboard.drain_auth.{scope,min_secret_chars}) live in config.yaml — added to DEFAULT_CONFIG with the .env-is-for-secrets rationale documented inline. Tests: tests/plugins/dashboard_auth/test_drain_provider.py — entropy gate (strong pass; empty/short/repeated/few-distinct/custom-min reject), verify_token (match → scoped principal, wrong/empty → None, custom scope), protocol compliance, interactive-methods-raise, and register() (skip-no-secret, fail-closed-weak-secret, strong-env-secret registers + route opt-in, config scope + min_secret_chars). 21 new tests; drain + token-auth suites 44 passed. Verified the plugin is discovered as dashboard_auth/drain alongside basic/nous. Intentionally deferred: - The begin/cancel-drain endpoint handler itself — Task 2.1. - The dashboard→gateway control channel — Task 2.2. Build status: dashboard-auth + drain-plugin suites green.	2026-06-26 00:47:19 -07:00
teknium1	43b8ba4181	fix(telegram): preserve Bot API update queue on watcher reconnect After a prolonged outage the in-process network-error ladder escalates to fatal and GatewayRunner._platform_reconnect_watcher rebuilds a fresh adapter that reconnects through the bootstrap path. That path called start_polling(drop_pending_updates=True), discarding every update Telegram queued during the outage — all messages sent while the bot was down were silently lost. The in-process ladder and 409-conflict handler already passed drop_pending_updates=False; only bootstrap did not distinguish a cold first boot from a reconnect. Thread an is_reconnect signal from the watcher through _connect_adapter_with_timeout into adapter.connect(). The base BasePlatformAdapter.connect() gains a keyword-only is_reconnect=False so every adapter inherits a tolerant signature (no per-platform breakage when the runner forwards the kwarg). Telegram translates is_reconnect into drop_pending_updates=not is_reconnect on both the polling and webhook bootstrap calls. Cold boot still drops the stale queue; a watcher reconnect preserves it. Fixes #46621. Co-authored-by: annguyenNous <annguyen@nousresearch.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com> Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>	2026-06-25 21:29:57 -07:00
teknium1	85e084d60d	fix(email): reject spoofed From: header for authorization (GHSA-rxqh-5572-8m77) The email adapter authorized senders entirely off the From: header, which is attacker-controlled and unauthenticated by IMAP. An attacker could forge From: an-allowlisted-address and pass both the adapter's EMAIL_ALLOWED_USERS pre-filter and the gateway's allowlist authz (both key on the same spoofable sender_addr), getting unauthorized commands executed by the agent. Verify the From: domain against the trusted Authentication-Results header the receiving mail server stamps (SPF/DKIM/DMARC) before trusting it for authorization. Enforced only when an allowlist is in effect and allow-all is off — fail-closed. Operators whose server does not stamp the header can opt out via platforms.email.require_authenticated_sender: false (or EMAIL_TRUST_FROM_HEADER=true).	2026-06-25 21:11:02 -07:00
Teknium	ce802e932c	fix(telegram): heartbeat loop exits cleanly when bot has no get_me CI shard test_telegram_conflict.py timed out (140s) because the new _polling_heartbeat_loop, started by connect(), busy-spun under those tests: they monkeypatch asyncio.sleep to instant and pass a bot double with no get_me(), so the probe raised AttributeError (swallowed) and the loop re-entered immediately with no real pacing, starving the event loop. Guard the loop to return when bot.get_me is not callable — a real PTB Bot always exposes it, so this only triggers on a torn-down app or a test double, where there is nothing to probe. Also cancel the heartbeat task in the conflict tests that call connect() without disconnect(), matching the production disconnect() teardown. Verified: test_telegram_conflict.py now runs in ~4.5s; the 22 heartbeat/reconnect tests still pass; E2E confirms a hanging get_me still fires the reconnect ladder while a missing get_me exits without spinning.	2026-06-25 18:50:11 -07:00
agt-user	8501caf51f	fix(telegram): persistent heartbeat loop to detect CLOSE-WAIT polling sockets When a Telegram long-poll TCP socket enters CLOSE-WAIT (remote sent FIN but httpx hasn't noticed), epoll still reports it readable so no exception is raised. PTB's error_callback never fires, the reconnect ladder never engages, and the gateway silently stops receiving messages while the process stays alive — until a manual systemctl restart. The existing recovery only covers two cases: error_callback-driven reconnects (which require an exception PTB never gets) and a one-shot _verify_polling_after_reconnect probe (which runs only right after an explicit reconnect). A socket that wedges during steady-state operation is never detected. Add _polling_heartbeat_loop: a background asyncio.Task started in connect() (polling mode only) that probes get_me() every 90s on the general request pool (not the getUpdates pool, so healthy long-polls are never interrupted). On asyncio.TimeoutError/OSError it hands off to the existing _handle_polling_network_error ladder; other errors are swallowed. disconnect() cancels and awaits the task. Worst-case detection window ~105s. Complementary to #51541 (general-pool keepalive limits / fd leak) — that recycles idle pooled connections; this detects a wedged active read. Fixes #48495 Co-authored-by: agt-user <267614622+agt-user@users.noreply.github.com>	2026-06-25 18:50:11 -07:00
infinitycrew39	9d225fbf4e	fix(telegram): auto-rich pipe tables and topic routing for sendRichMessage Pipe-only markdown tables now use sendRichMessage even when rich_messages is off, and resumed DM-topic sends route via direct_messages_topic_id without requiring a reply anchor. Rich finalize edits forward topic kwargs.	2026-06-25 13:10:54 -07:00
rob-maron	2c02583c2b	fix shape	2026-06-25 12:38:33 -07:00
rob-maron	525ee58b43	krea	2026-06-25 12:38:33 -07:00
xxxigm	0aea0c3654	fix(utils): unify YAML list indent across all config writers (#31999 ) atomic_yaml_write used default yaml.dump which emits indentless sequences (list items at column 0), while atomic_roundtrip_yaml_update (ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to the same config.yaml toggled indentation on every save, eventually producing a mixed-indent file that js-yaml rejects with 'bad indentation of a mapping entry', silently dropping custom_providers and breaking model switching. Add IndentDumper SafeDumper subclass that forces indentless=False, route atomic_yaml_write through it. Route tui_gateway._save_cfg and the Telegram adapter's config writer through atomic_yaml_write so all paths emit the same 2-indent layout. Salvaged from #32034 by @xxxigm. Adapted to current main which already has allow_unicode=True (from #51356) but was missing IndentDumper. Closes #31999	2026-06-25 23:27:44 +05:30
Brooklyn Nicholson	7078d9d1e2	fix(pets): raise generation timeouts for the slow quality-first model path The quality-first default (OpenAI image via OpenRouter) is slow, and a full hatch fans out ~8 rows with up to 3 retries each (300s/call) across 2 parallel waves, so the absolute backend worst case is ~30 min. The old ceilings fired mid-run: - per-image HTTP call: 180s -> 300s (a single cold row can exceed 3 min) - drafts RPC: 240s -> 420s (single wave, no retries — 7 min is ample) - hatch RPC: 420s -> 1hr (sits above the ~30 min backend worst case) The hatch ceiling is intentionally well above the realistic max so the frontend never throws "request timed out" before the backend has exhausted its own retries. The background-resumable notification path remains the real UX safety net — the user can close the modal and get pinged on completion.	2026-06-25 00:34:52 -05:00
brooklyn!	0c442fa1d3	Merge pull request #52303 from NousResearch/bb/pets-gen-qa feat(pets): quality-first OpenRouter chain, stronger atlas gates, global pet-gen notifications	2026-06-24 23:16:40 -05:00
Brooklyn Nicholson	e92b5c6af8	feat(pets): quality-first OpenRouter model chain + stronger atlas gates + global pet-gen notifications OpenRouter/Nous image gen now runs a quality-first model chain by default: attempt the highest-fidelity OpenAI image model first, then fall back to Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback. Atlas validation rejects malformed model output instead of shipping it: adds a per-state collapse guard (a single sliver/fragment row no longer passes because other rows are healthy), on top of the existing postage-stamp + multi-pose checks. Desktop: pet-gen native notifications are now "global" (not tied to a chat session), so a background generation started from the command center fires an OS notification when the user is away even with no active session. Adds a neutral "This can take up to 5 minutes." banner on step 1, and lets the provider picker auto-size. Tests updated/added for the OpenRouter fallback chain, the collapse guard, and the global notification path.	2026-06-24 23:11:21 -05:00
helix4u	17beb55e3c	fix(telegram): gate rich draft previews separately	2026-06-24 18:11:14 -07:00

1 2 3 4 5 ...

625 commits