hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

Author	SHA1	Message	Date
teknium1	43b8ba4181	fix(telegram): preserve Bot API update queue on watcher reconnect After a prolonged outage the in-process network-error ladder escalates to fatal and GatewayRunner._platform_reconnect_watcher rebuilds a fresh adapter that reconnects through the bootstrap path. That path called start_polling(drop_pending_updates=True), discarding every update Telegram queued during the outage — all messages sent while the bot was down were silently lost. The in-process ladder and 409-conflict handler already passed drop_pending_updates=False; only bootstrap did not distinguish a cold first boot from a reconnect. Thread an is_reconnect signal from the watcher through _connect_adapter_with_timeout into adapter.connect(). The base BasePlatformAdapter.connect() gains a keyword-only is_reconnect=False so every adapter inherits a tolerant signature (no per-platform breakage when the runner forwards the kwarg). Telegram translates is_reconnect into drop_pending_updates=not is_reconnect on both the polling and webhook bootstrap calls. Cold boot still drops the stale queue; a watcher reconnect preserves it. Fixes #46621. Co-authored-by: annguyenNous <annguyen@nousresearch.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com> Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>	2026-06-25 21:29:57 -07:00
teknium1	85e084d60d	fix(email): reject spoofed From: header for authorization (GHSA-rxqh-5572-8m77) The email adapter authorized senders entirely off the From: header, which is attacker-controlled and unauthenticated by IMAP. An attacker could forge From: an-allowlisted-address and pass both the adapter's EMAIL_ALLOWED_USERS pre-filter and the gateway's allowlist authz (both key on the same spoofable sender_addr), getting unauthorized commands executed by the agent. Verify the From: domain against the trusted Authentication-Results header the receiving mail server stamps (SPF/DKIM/DMARC) before trusting it for authorization. Enforced only when an allowlist is in effect and allow-all is off — fail-closed. Operators whose server does not stamp the header can opt out via platforms.email.require_authenticated_sender: false (or EMAIL_TRUST_FROM_HEADER=true).	2026-06-25 21:11:02 -07:00
Teknium	ce802e932c	fix(telegram): heartbeat loop exits cleanly when bot has no get_me CI shard test_telegram_conflict.py timed out (140s) because the new _polling_heartbeat_loop, started by connect(), busy-spun under those tests: they monkeypatch asyncio.sleep to instant and pass a bot double with no get_me(), so the probe raised AttributeError (swallowed) and the loop re-entered immediately with no real pacing, starving the event loop. Guard the loop to return when bot.get_me is not callable — a real PTB Bot always exposes it, so this only triggers on a torn-down app or a test double, where there is nothing to probe. Also cancel the heartbeat task in the conflict tests that call connect() without disconnect(), matching the production disconnect() teardown. Verified: test_telegram_conflict.py now runs in ~4.5s; the 22 heartbeat/reconnect tests still pass; E2E confirms a hanging get_me still fires the reconnect ladder while a missing get_me exits without spinning.	2026-06-25 18:50:11 -07:00
agt-user	8501caf51f	fix(telegram): persistent heartbeat loop to detect CLOSE-WAIT polling sockets When a Telegram long-poll TCP socket enters CLOSE-WAIT (remote sent FIN but httpx hasn't noticed), epoll still reports it readable so no exception is raised. PTB's error_callback never fires, the reconnect ladder never engages, and the gateway silently stops receiving messages while the process stays alive — until a manual systemctl restart. The existing recovery only covers two cases: error_callback-driven reconnects (which require an exception PTB never gets) and a one-shot _verify_polling_after_reconnect probe (which runs only right after an explicit reconnect). A socket that wedges during steady-state operation is never detected. Add _polling_heartbeat_loop: a background asyncio.Task started in connect() (polling mode only) that probes get_me() every 90s on the general request pool (not the getUpdates pool, so healthy long-polls are never interrupted). On asyncio.TimeoutError/OSError it hands off to the existing _handle_polling_network_error ladder; other errors are swallowed. disconnect() cancels and awaits the task. Worst-case detection window ~105s. Complementary to #51541 (general-pool keepalive limits / fd leak) — that recycles idle pooled connections; this detects a wedged active read. Fixes #48495 Co-authored-by: agt-user <267614622+agt-user@users.noreply.github.com>	2026-06-25 18:50:11 -07:00
infinitycrew39	9d225fbf4e	fix(telegram): auto-rich pipe tables and topic routing for sendRichMessage Pipe-only markdown tables now use sendRichMessage even when rich_messages is off, and resumed DM-topic sends route via direct_messages_topic_id without requiring a reply anchor. Rich finalize edits forward topic kwargs.	2026-06-25 13:10:54 -07:00
xxxigm	0aea0c3654	fix(utils): unify YAML list indent across all config writers (#31999 ) atomic_yaml_write used default yaml.dump which emits indentless sequences (list items at column 0), while atomic_roundtrip_yaml_update (ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to the same config.yaml toggled indentation on every save, eventually producing a mixed-indent file that js-yaml rejects with 'bad indentation of a mapping entry', silently dropping custom_providers and breaking model switching. Add IndentDumper SafeDumper subclass that forces indentless=False, route atomic_yaml_write through it. Route tui_gateway._save_cfg and the Telegram adapter's config writer through atomic_yaml_write so all paths emit the same 2-indent layout. Salvaged from #32034 by @xxxigm. Adapted to current main which already has allow_unicode=True (from #51356) but was missing IndentDumper. Closes #31999	2026-06-25 23:27:44 +05:30
helix4u	17beb55e3c	fix(telegram): gate rich draft previews separately	2026-06-24 18:11:14 -07:00
liuhao1024	404b06ac4f	fix(gateway): honor server retry_after in _send_with_retry for Telegram flood control (#46762 ) When Telegram's sendRichMessage returns a FloodWait/RetryAfter error, _try_send_rich() now extracts the server-provided retry_after value and propagates it through SendResult.retry_after. The base _send_with_retry() layer honors this value instead of using its default short exponential backoff (~2s, ~4s), preventing the retry budget from being exhausted against a server that demands a 25-37s wait. Salvaged from #46774 by @liuhao1024. Telegram adapter path moved from gateway/platforms/telegram.py to plugins/platforms/telegram/adapter.py since the original PR. Closes #46762	2026-06-25 02:43:47 +05:30
Tranquil-Flow	73a20a6ad6	fix(telegram): clip mid-stream overflow instead of splitting (#48648 )	2026-06-24 00:00:46 -07:00
justemu	4aa793345e	fix(matrix): use member_count as DM signal for named DM rooms Most Matrix clients auto-set a room name when creating a DM (e.g. "Alice & Bot" from participant display names), so the old `is_direct and not has_explicit_name` heuristic classified virtually all client-created DM rooms as "room", forcing require_mention gating in legitimate one-on-one DMs. member_count is now the primary DM signal: <=2 members means the room is necessarily a 1:1 conversation, regardless of m.direct or an explicit name. A room that grew to 3+ members but is still in stale m.direct is still classified as a room (conflict flag set). Falls back to the m.direct + name heuristic when the count is unavailable. Also hardens _get_room_member_count with a joined_members API fallback when the cache-backed state_store is empty. Salvaged from #48554 by @justemu onto the current plugin adapter path (gateway/platforms/matrix.py -> plugins/platforms/matrix/adapter.py). Fixes #48551	2026-06-23 23:57:38 -07:00
liuhao1024	7ff48a6291	fix(discord): check pairing store for component button auth Component button interactions (approve/deny, slash confirm, model picker, clarify) were not checking the pairing store for authorization. Users approved via `hermes pairing approve` could send messages and use slash commands (which go through the gateway authz_mixin), but button clicks were rejected because `_component_check_auth` only checked env-var allowlists (DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS, etc.) and not the pairing store. This was a regression from commit `f6f363662` which intentionally made component auth fail-closed when no allowlist is set (security fix for GHSA-mc26-p6fw-7pp6), but did not account for pairing-based auth. Fix: add a `PairingStore.is_approved("discord", uid)` check to `_component_check_auth`, mirroring `authz_mixin._check_authorization`. The pairing store check runs after all allowlist checks, preserving the fail-closed behavior for non-paired, non-allowed users. Fixes #50627	2026-06-23 23:55:18 -07:00
teknium1	d4be583d98	fix(telegram): raise default command-menu cap to 60 so skills stay visible The 30-slot default could not fit Hermes's ~50 built-in commands, so every skill command (and 20 built-ins) were silently dropped from the Telegram \`/\` menu by default — they only worked when typed manually. Raising the default to 60 keeps all built-ins plus common skill commands visible out of the box while staying under Telegram's ~4KB payload limit. Users can still tune it via platforms.telegram.extra.command_menu.	2026-06-23 23:49:22 -07:00
Thestral	dbe14ce35d	feat(gateway): configure Telegram command menu priority Adds a configurable Telegram BotCommand menu cap and priority list via platforms.telegram.extra.command_menu (max_commands clamped 1..100; priority_mode prepend\|append\|replace). Default cap stays 30; hidden commands remain invokable when typed and /commands lists the full set. Salvaged from PR #42021. Cherry-picked onto current main; the original edited gateway/platforms/telegram.py, now relocated to plugins/platforms/telegram/adapter.py.	2026-06-23 23:49:22 -07:00
Teknium	d539cd9004	fix(config): write config.yaml as UTF-8 to stop emoji/personality corruption (#51676 ) atomic_yaml_write (and two sibling config writers) called yaml.dump without allow_unicode=True. The default personalities shipped in cli.py contain emoji/kaomoji, so PyYAML escaped astral-plane chars as 8-digit \\UXXXXXXXX sequences inside multi-line double-quoted strings wrapped with \\ line-continuations. Stricter/non-PyYAML parsers, editors, and hand-edits break that structure into unclosed quotes, failing the whole config parse -> silent fallback to defaults -> custom_providers lost. Add allow_unicode=True to the canonical writer plus tui_gateway/server.py and the telegram adapter's atomic config write so config is written as readable UTF-8 with no escape/fold artifacts. Fixes #51356	2026-06-23 23:28:21 -07:00
teknium1	7f1c278db8	fix(photon): intercept console.log so 'stream interrupted' bursts escalate spectrum-ts routes stream telemetry through @photon-ai/otel's createLogger, which sends severity>=ERROR to console.error and WARN/INFO to console.log. The two lines the health monitor keys off land on different channels: log.error("stream persistently failing") -> console.error (caught), but log.warn("stream interrupted; reconnecting") -> console.log (was missed). The original interception patched console.error only, so the recovering-> degraded escalation counter never saw the interrupt bursts that are the primary silent-inbound symptom. Verified live against spectrum-ts 3.1.0 + @photon-ai/otel: 3 real log.warn('stream interrupted') calls now escalate to degraded -> process.exit(75) -> adapter reconnect. Adds a shared classifyStreamLog() fed by both console.error and console.log, plus a regression test asserting both channels are intercepted.	2026-06-23 21:33:10 -07:00
XU SUN	0952acbf4d	fix(photon): label upstream CatchUpEvents failures	2026-06-23 21:33:10 -07:00
helix4u	06cbc3bae9	fix(photon): recover degraded upstream stream	2026-06-23 21:33:10 -07:00
manusjs	807bdc17f6	fix(gateway): prevent double dispatch of Discord messages via thread-starter dedup When _auto_create_thread() creates a thread from a user message via message.create_thread(), Discord fires a second MESSAGE_CREATE event for the 'thread starter message'. That starter message carries message.id == thread.id and may arrive with type=default instead of type=21 (thread_starter_message), so the existing type filter in on_message does not catch it — triggering a second call into _handle_message and thus a second agent run and response. Fix: after _auto_create_thread succeeds and returns a thread, pre-seed the dedup cache with str(thread.id) via self._dedup.is_duplicate(). The dedup cache is the same TTL-based MessageDeduplicator that already guards against Discord RESUME event replays. Calling is_duplicate() marks the ID as seen; when the duplicate thread-starter MESSAGE_CREATE arrives, on_message's guard returns True and the event is dropped. This is a minimal, targeted fix: - No new state: reuses the existing _dedup instance - No timing/race: the pre-seed happens synchronously inside the async _handle_message, before the thread-starter event can be dispatched - Scoped: only fires when auto-threading is enabled AND thread creation succeeds (thread object is not None) Also adds tests in tests/gateway/test_discord_double_dispatch.py covering the pre-seed behaviour, failure modes (thread creation fails, auto-thread disabled), and dedup cache integrity. Closes #51057	2026-06-24 03:25:33 +05:30
kshitijk4poor	4b7f3826c2	fix(telegram): wire platform_httpx_limits into general-pool HTTPXRequest (#31599 ) PTB's HTTPXRequest builds its httpx.AsyncClient with `limits = httpx.Limits(max_connections=connection_pool_size)` and no keepalive tuning, so httpx's default keepalive_expiry=5.0 applies. Behind an HTTP proxy (Cloudflare Warp etc.) a peer-initiated FIN can sit in CLOSE_WAIT longer than that, leaking fds in the general request pool (_request[1], which routes bot.send_message/set_my_commands) — the pool _drain_polling_connections never resets. Telegram was the lone holdout adapter not using the shared #18451 CLOSE_WAIT helper. Wire gateway.platforms._http_client_limits.platform_httpx_limits() into the httpx client across ALL THREE request-construction branches — fallback-transport, proxy, and plain — via httpx_kwargs["limits"], which PTB spreads last into its client kwargs so our tuned limits win. PTB's connection_pool_size (max_connections) is preserved; only keepalive behaviour is tightened (max_keepalive_connections + keepalive_expiry<5.0). The fix is macOS-import-safe: no Linux-only socket TCP_KEEPIDLE/INTVL/CNT constants at module scope (unlike the broken candidate which crashed on import on the reporter's OS), and it patches the actual proxy path the repro hits rather than TelegramFallbackTransport, which the proxy repro never instantiates. Adds a mutation-survivable behavior-contract test asserting every HTTPXRequest built by connect() receives httpx_kwargs["limits"] with keepalive_expiry < httpx's 5.0 default, across both the proxy and plain branches. Reverting the limits wiring fails the test. Co-authored-by: indigokarasu <mx.indigo.karasu@gmail.com>	2026-06-24 02:15:47 +05:30
kshitijk4poor	5ecf3bf0e0	fix(slack): report ext-matched audio mimetype for rerouted voice clips Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Has been cancelled Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Has been cancelled Details Lint (ruff + ty) / Windows footguns (blocking) (push) Has been cancelled Details Tests / test (1) (push) Has been cancelled Details Tests / test (2) (push) Has been cancelled Details Tests / test (3) (push) Has been cancelled Details Tests / test (4) (push) Has been cancelled Details Tests / test (5) (push) Has been cancelled Details Tests / test (6) (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details Typecheck / typecheck (apps/bootstrap-installer) (push) Has been cancelled Details Typecheck / typecheck (apps/desktop) (push) Has been cancelled Details Typecheck / typecheck (apps/shared) (push) Has been cancelled Details Typecheck / typecheck (ui-tui) (push) Has been cancelled Details Typecheck / typecheck (web) (push) Has been cancelled Details Typecheck / desktop-build (push) Has been cancelled Details Tests / save-durations (push) Has been cancelled Details Follow-up to the salvaged voice-clip fix: the rerouted video/mp4 branch used {".m4a": "audio/mp4"}.get(ext, "audio/mp4"), whose sole key's value equals the default, so it always returned "audio/mp4" regardless of the cached extension (dead lookup + a throwaway dict per inbound voice clip). Replace it with a module-level _SLACK_EXT_TO_AUDIO_MIME map so the reported media_type matches the bytes we cached (e.g. a clip cached as .wav now reports audio/wav instead of audio/mp4). STT routing already keys on the audio/ prefix + cached filename extension, so behavior is unchanged; this just removes the dead construct and keeps the reported mimetype coherent.	2026-06-23 14:44:12 +05:30
Ben	2196584161	fix(slack): transcribe in-app voice messages (audio/mp4) instead of failing Slack in-app voice clips ("record a clip") arrive as MP4/AAC containers (mimetype audio/mp4, filename audio_message.mp4), and Slack sometimes labels them video/mp4. The inbound audio handler derived the cache extension from the mimetype and fell back to ".ogg" for anything not in {.ogg,.mp3,.wav,.webm,.m4a} — so audio/mp4 voice messages were cached as .ogg. OpenAI STT (whisper-1, gpt-4o-transcribe) sniffs the container from the FILENAME extension, so it received MP4 bytes named .ogg and rejected them. WhatsApp .ogg and uploaded .m4a worked only because their extension happened to match the bytes. Fix: - _resolve_slack_audio_ext(): pick the cache extension from the real filename first, then a mimetype map (audio/mp4 -> .m4a), defaulting to .m4a — never the bogus .ogg fallback. Mirrors the video branch and the audio map already in gateway/platforms/bluebubbles.py. - _is_slack_voice_clip(): detect audio-only clips mislabeled video/mp4 via the slack_audio subtype / audio_message filename, and route them through the audio path (cached as audio, reported as audio/*) so they reach STT instead of video understanding. Genuine videos (and slack_video screen recordings) are left on the video path. Verified end-to-end against a real audio-only MP4: old path cached it as .ogg (ffprobe shows MP4 bytes -> container mismatch -> OpenAI rejects); new path caches it as .mp4 (extension matches bytes -> accepted). Adds inbound-audio tests (previously none): helper unit tests plus _handle_slack_message E2E coverage for audio/mp4, video/mp4-mislabeled voice clips, and a real video staying on the video path. Confirmed the two voice-message tests fail without the fix (mutation check).	2026-06-23 14:44:12 +05:30
Teknium	e9b86f352f	fix(discord): delete obsolete slash commands before creating new ones Discord enforces a hard 100-command limit per app and rejects an upsert that would push the live total over 100 (error 30032), which silently breaks ALL slash commands. The sync deleted obsolete commands AFTER creating new ones, so an app already at the cap momentarily exceeded it and the whole sync failed. Reorder: delete no-longer-desired commands up front, then create/update. Removes the now-redundant trailing delete loop. Adapts @infinitycrew39 PR #50890 to current main (the original adapter diff no longer applied after the platform refactor); test commit cherry-picked with authorship preserved.	2026-06-22 13:58:33 -07:00
xxxigm	142a5751a2	gateway/telegram: prune stale DM topic binding on Thread-not-found (#31501 ) Both fallback sites that currently log "Thread X not found, retrying without message_thread_id" now also drop the ``telegram_dm_topic_bindings`` row keyed on ``(chat_id, thread_id)``: * The streaming send loop (``send`` body) — fires on the second failure, after the same-thread one-shot retry confirms the thread really is gone (the first attempt is left alone because Bot API has been observed to return a transient "Thread not found" that recovers on immediate retry). * The control-message helper ``_send_message_with_thread_fallback`` (approval prompts, model picker, update prompts) — single-shot retry, prune unconditionally on the BadRequest match. Without this prune, a user who deletes a Telegram DM topic in the client keeps getting their next inbound message recovered back to the dead thread by ``_recover_telegram_topic_thread_id`` in ``gateway/run.py``, which walks the per-user binding list newest-first and treats the deleted thread as authoritative. The reproduction in the bug report is exactly this: tool progress, approvals, activity messages and replies all land in the wrong place until the user manually runs DELETE on state.db. Cleanup is best-effort — we log at INFO when it succeeds, swallow any exception from the SessionDB call, and the user-facing send proceeds either way. Refs #31501	2026-06-22 12:29:05 -07:00
iaji	441bd6d8db	fix(slack): split csv mention pattern fallback	2026-06-22 09:44:52 -07:00
devorun	4966268764	fix(slack): honor documented `mention_patterns` wake words The Slack docs document `slack.mention_patterns` as custom wake words that trigger the bot alongside `@mention`, and the config layer bridges the key into the Slack adapter's `config.extra` — but the adapter never read it. With `require_mention` on, a channel message containing a configured wake word (and no literal `<@BOTUID>`) was silently ignored. Every other adapter that documents `mention_patterns` (Telegram, DingTalk, Mattermost, WhatsApp, BlueBubbles, Photon) implements it; Slack was the odd one out. Add `_slack_mention_patterns()` (compiled, cached; reads `slack.mention_patterns` as a list/string or `SLACK_MENTION_PATTERNS` as a JSON/CSV/newline list, invalid regexes warned and skipped) and `_slack_message_matches_mention_patterns()`, mirroring the existing adapters. Channel mention detection now also triggers on a wake-word match, so the documented field works as described. Adds tests for pattern compilation (list/string/env/invalid-regex) and for the channel-trigger gating with a wake word under require_mention.	2026-06-22 09:44:52 -07:00
teknium	e9cd8c5bf3	fix(delivery): drop env-var knob, flag all chunking adapters Follow-up to ScotterMonk's cron-truncation fix: - Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config belongs in config.yaml, not a new HERMES_* env var (.env is secrets only). The actual bug is fixed entirely by the adapter-aware skip; the configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant again, collapsing the max_output=0 disable branch and the audit-vs-truncation threshold divergence. - Flag the remaining verified-chunking adapters (slack, matrix, feishu, mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles, yuanbao) with splits_long_messages=True so the fix covers the whole bug class, not just Discord/Telegram. Each verified to chunk in its own send() via truncate_message(). - SMS deliberately left False: it chunks for normal replies but a multi-segment cron blast is cost-bearing; the 4000-cap + file save is the safer default there. - Update tests: drop the two env-override tests, add a test asserting a save failure during truncation (non-chunking) propagates.	2026-06-22 05:41:22 -07:00
ScotterMonk	86e4521cb1	fix(delivery): make cron output truncation configurable + adapter-aware Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting adapter-side message splitting. Discord and Telegram both chunk long content natively in their send() via truncate_message(), but the delivery router truncated to 3800 chars + footer before the adapter ever saw the full payload — so long cron output was cut short instead of being delivered as multiple messages (issue #50126). Changes: - HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable (default 4000, backward compatible). Set to 0 to disable truncation. - TRUNCATED_VISIBLE (3800) removed — visible portion now derived dynamically from max_output minus the actual footer length. - New BasePlatformAdapter.splits_long_messages capability flag (default False). Adapters that chunk in send() set True; delivery skips truncation for them but still saves full output to disk as audit. - Flagged Discord and Telegram (both verified to chunk in send()). Fixes #50126	2026-06-22 05:41:22 -07:00
teknium1	b5bd66eac9	fix(telegram): observed/replied group docs of any type are cached too Follow-up to the accept-any-file-type change. The observe-unmentioned and replied-media paths relied on cache_media_bytes() returning None for unsupported document types to emit an 'unsupported, not cached' note. Now that any file type is always cached, those docs are cached and surfaced with a path-pointing note — consistent with the main document path. The remaining cached-is-None branch is image-validation-failure only; its note is reworded accordingly. Updates the group-gating test to the new contract.	2026-06-21 22:43:45 -07:00
teknium1	4314d451ca	fix(gateway): accept any inbound file type across all messaging platforms Authorization to message the agent is the gate, not the file extension. Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass at all on Telegram/Slack — so an .html (or any non-allowlisted type) was dropped or hard-rejected before the agent saw it. Now every authorized upload is cached and surfaced to the agent regardless of type: - base.cache_media_bytes(): unknown types cache as octet-stream (or the caller-supplied MIME) instead of returning None — fixes the chokepoint that Teams/Telegram-media route through. - discord/telegram/slack adapters: removed the allowlist reject/skip; any non-media attachment is typed DOCUMENT and cached. Known types keep their precise MIME. - Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text + code + config + markup) instead of a blind UTF-8 decode, so binary formats (PDF/zip/docx) with ASCII headers are never inlined. - gateway/run.py emits the path-pointing context note for every DOCUMENT, including non text/application MIME types. - discord.allow_any_attachment is now a documented no-op kept for config back-compat. Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types cache, known types stay precise, PDFs are not inlined.	2026-06-21 22:43:45 -07:00
teknium1	615a8e6516	fix(whatsapp): add missing re import + fix test import path after adapter relocation Follow-up to the salvaged #43846 commits: the WhatsApp adapter moved from gateway/platforms/whatsapp.py to plugins/platforms/whatsapp/adapter.py since the PR was authored. The cherry-pick brought _listener_pids_on_port's `re.finditer` ss-fallback and the new test's import, but the new module location doesn't import `re` (latent NameError on the lsof-absent fallback path) and the test imported the old module path. Add `import re` to the adapter and repoint the test import.	2026-06-21 17:23:33 -07:00
valentt	069ab40c5f	fix(whatsapp): only kill LISTENers when freeing the bridge port, never clients This is the bug that was actually closing Firefox. `_kill_port_process`, run on every bridge (re)start to free the port, used `lsof -ti :PORT` / `fuser PORT/tcp` — both of which match a process whose socket merely involves that port number in ANY state, including ESTABLISHED client connections. It then SIGTERMed every match. The bridge defaults to port 3000 — a ubiquitous local dev-server port. With a browser tab open on localhost:3000, `lsof -ti :3000` returned Firefox's PID, so each restart of the (crash-looping) WhatsApp bridge SIGTERMed Firefox, closing the whole browser at irregular intervals with no crash and no coredump. Proven live with the kernel `signal:signal_generate` tracepoint: hermes-gateway(3396516) -> sig=15 (code=0/SI_USER) -> comm=firefox pid=3371585 captured immediately after a gateway start, while Firefox held a socket on the bridge port. Demonstrated over-match: `lsof -ti :8080` returns the listener AND the gateway's own client connection; `lsof -ti tcp:8080 -sTCP:LISTEN` returns only the listener. Fix: `_listener_pids_on_port` resolves only LISTEN-state sockets (`lsof -ti tcp:PORT -sTCP:LISTEN`, with an `ss -ltnp` fallback) and `_kill_port_process` signals just those. A client whose connection happens to involve the port number is never touched — which is also more correct, since a client never blocks the new bridge from binding. Windows already filtered LISTENING; the broad `fuser -k` path is removed. Adds TestKillPortProcess: real-socket tests proving a separate client process is excluded from the listener lookup and survives port cleanup. 9 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 17:23:33 -07:00
valentt	77fdbbfe81	fix(whatsapp): validate bridge PID identity before killing stale pidfile entry `_kill_stale_bridge_by_pidfile` SIGTERMed the PID recorded in `bridge.pid` after only a bare liveness check. Once the bridge exits and is reaped the kernel recycles that PID onto an unrelated process; because the WhatsApp bridge crash-loops ("Bridge process died (exit code 1)" repeating), this cleanup ran on every restart and could SIGTERM a recycled PID that had landed on the user's browser — closing Firefox at irregular intervals with no crash and no coredump (a clean kill of a stranger). Same PID-recycling class as the MCP reaper (`7bd1f8a2d`) and the process-registry host-PID guard (e6a99cef2); this was the third, and most actively-fired, path. Fix: `_write_bridge_pidfile` now also records the leader's kernel start time (line 2). `_kill_stale_bridge_by_pidfile` re-validates identity via `_bridge_pid_is_ours` before signalling — the (pid, start time) pair must match, or for legacy single-line pidfiles the live cmdline must name `node` + this session's unique path. A recycled PID (different start time / cmdline) is logged and skipped, never signalled. Legacy pidfiles stay readable. Adds TestWhatsappBridgePidfile: real-process tests proving a genuine bridge is reaped while a recycled PID (start-time mismatch, or non-bridge cmdline) is spared. 7 new + 108 gateway/registry tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 17:23:33 -07:00
teknium1	f79e0a7060	fix(email): mark missing-config as non-retryable + reject blank env vars (#40715 ) Fold in the #40715 blank-env OOM fix on top of the host-resolution change: - connect() now sets a non-retryable fatal error when required settings are missing, so the gateway stops reconnecting against an empty host instead of looping forever and leaking memory until the host OOM-kills. - check_email_requirements() treats blank/whitespace-only EMAIL_* values as missing, so an abandoned setup with empty keys no longer enables the platform. Credits the parallel fixes by zerone0x (#40745) and liuhao1024 (#40829).	2026-06-21 13:33:52 -07:00
devorun	b7f6cb9c8b	fix(email): resolve IMAP/SMTP host from config and validate before connecting The email adapter read address/host purely from env vars and never stripped them, so a missing or whitespace-padded EMAIL_IMAP_HOST reached imaplib.IMAP4_SSL("") and surfaced as the misleading "[Errno 8] nodename nor servname provided, or not known" — sending users down a DNS rabbit hole when the real problem was an empty/dirty host string. A config.yaml-only setup also left the host empty because __init__ ignored PlatformConfig.extra, even though the "connected" check, the send helper, and `hermes config show` already read address/imap_host/smtp_host from it. Resolve address/imap_host/smtp_host from the env var first, then fall back to config.extra, and strip surrounding whitespace — matching the send helper's existing pattern. Validate the required settings at the start of connect() and return False with an actionable message instead of attempting a connection with an empty host. Adds regression tests for whitespace stripping, config.extra fallback, and the no-IMAP-attempt-on-missing-host path.	2026-06-21 13:33:52 -07:00
sgaofen	a4b1554c73	fix(whatsapp): normalize bare phone targets to JIDs before bridge send Baileys' jidDecode crashes ("Cannot destructure property 'user' of jidDecode(...) as it is undefined") when handed a bare phone number, so sending a WhatsApp message to +50766715226 / 50766715226 returned HTTP 500 and never delivered (#8637). Add to_whatsapp_jid() to gateway/whatsapp_identity.py — the outbound inverse of normalize_whatsapp_identifier: it builds the JID a send must use (bare phone -> <digits>@s.whatsapp.net) and passes through already qualified JIDs (@g.us, @lid, status@broadcast, @newsletter) unchanged. Wire it at every outbound bridge call site in the WhatsApp adapter (send, edit, media, typing, get_chat_info, and the standalone cron / send_message sender). Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-21 13:32:22 -07:00
natehale	565b7c8d9d	fix(telegram): stop typing indicator lingering after final reply After the agent's final response, the '...typing' bubble persisted ~5s. send() re-triggers send_typing() after every delivery so the bubble survives intermediate progress messages (Telegram clears typing on each delivered message). But that re-trigger also fired on the FINAL send, re-arming Telegram's ~5s timer AFTER the gateway had already torn down its typing-refresh loop — and Telegram exposes no stop-typing API, so nothing cancelled it. Gate the post-send re-trigger on the absence of metadata['notify'] (set only on the final user-visible reply via _mark_notify_metadata). Both the rich-message and legacy send paths are covered; intermediate progress sends still re-trigger so the bubble stays alive mid-response. Fixes #48678	2026-06-21 12:36:26 -07:00
Teknium	c0409a87ff	feat(gateway): typed send-error classification (SendResult.error_kind) (#50342 ) Add a platform-neutral send-failure vocabulary so consumers can branch on a typed category instead of substring-matching the raw provider message. - base.py: SEND_ERROR_KINDS + classify_send_error() (too_long / bad_format / forbidden / not_found / rate_limited / transient / unknown), and an optional SendResult.error_kind field (defaults None — fully backward compatible). - telegram.py: populate error_kind on send() failures; message_too_long keeps its existing error token plus error_kind='too_long'. Purely additive: no behavioral change to the existing degrade-and-deliver paths (MarkdownV2->plain-text fallback, overflow split, retry classification all untouched). 22 new tests + 210 adapter regression tests green.	2026-06-21 12:34:22 -07:00
joaomarcos	9578e52795	fix(photon): detect unexpected sidecar death and trigger reconnect When the Node spectrum-ts sidecar process exited mid-session (crash, OOM, upstream overflow escalation), _supervise_sidecar returned silently — readline hit EOF, the log-pump loop broke, and nothing notified the gateway. _inbound_loop entered an infinite retry loop against a dead port, _running stayed True, and the adapter remained in self.adapters with no path to self-recovery short of a manual gateway restart. Add a death-detection tail to _supervise_sidecar: after the log-pump exits (EOF or exception), guard on _inbound_running to distinguish unexpected death from a deliberate disconnect(). On unexpected exit, call _set_fatal_error("SIDECAR_CRASHED", retryable=True) followed by _notify_fatal_error() so the reconnect watcher picks up the platform within 30 s and retries with exponential backoff (30 s → 300 s cap) until the sidecar comes back up. All other platforms remain unaffected. The _inbound_running guard is safe against races: disconnect() sets _inbound_running = False before _stop_sidecar() cancels the supervisor task. CancelledError is BaseException, not Exception, so it bypasses the except clause and propagates normally — the detection block never runs during a clean shutdown.	2026-06-21 12:15:44 -07:00
joaomarcos	2a4542333e	fix(photon): classify Envoy overflow errors as retryable; add typing cooldown Closes #50185 Two independent gaps let a transient Photon/Spectrum upstream overflow degrade message delivery and amplify gRPC pressure: 1. _is_retryable_error did not recognise Photon- or Envoy-specific error strings ("internal sidecar error", "upstream connect error", "reset reason: overflow"), so _send_with_retry fell through to the plain-text fallback immediately instead of backing off and retrying. 2. send_typing had no rate gate, so a burst of typing-indicator calls during an overflow event kept hitting the upstream gRPC connection and widened the failure window. Fix: - Add _PHOTON_RETRYABLE_PATTERNS with the three high-specificity Envoy / sidecar substrings and override _is_retryable_error on PhotonAdapter to check them after delegating to the base-class patterns. base.py and all other adapters are untouched. - Add a 5 s per-chat cooldown in send_typing backed by _typing_last_sent. stop_typing clears the entry so the next start after a completed turn fires immediately — only rapid consecutive starts without a stop are suppressed. - Reduce PhotonAdapter._send_with_retry default max_retries from 2 to 1 (single 2 s back-off check) — enough to confirm whether the Envoy circuit-breaker has opened, without adding unnecessary latency. All changes are scoped to plugins/platforms/photon/adapter.py.	2026-06-21 12:15:44 -07:00
kn8-codes	6183e8ce1b	fix(telegram): make Bot API 10.1 rich messages opt-in (default off) Rich messages are not ready for primetime: current Telegram clients can render Bot API 10.1 rich messages as blank/unsupported bubbles and make them hard to copy as plain text, which is worse than the legacy MarkdownV2 path for command snippets and mobile handoffs. Default the rich_messages toggle to False so replies stay on the copyable legacy path; users opt in per bot via platforms.telegram.extra.rich_messages: true. Updates adapter, gateway config default, example config, English + zh-Hans docs, and the default/opt-in tests.	2026-06-21 12:03:24 -07:00
sgaofen	93ea9b04af	fix(gateway): cap inbound media download size to prevent memory exhaustion Inbound image/audio/video payloads were buffered fully into process memory before being written to the cache, with no size limit. A large upload (Discord Nitro allows 500 MB) or a remote media URL in an inbound message pointing at a huge file could spike RAM and OOM-kill the gateway. Enforce a configurable cap in the shared cache helpers (gateway/platforms/ base.py) so the protection holds across every platform adapter, not one: - cache_image/audio/video_from_bytes reject oversized payloads before writing (video was the gap in the original report — now covered). - cache_image/audio_from_url stream the body, rejecting on an oversized Content-Length header and re-checking the running total per chunk so an absent/lying header can't smuggle an unbounded body past the cap. - Discord's _read_attachment_bytes checks att.size up front, so an oversized attachment is rejected before any bytes are pulled into memory. Configurable via gateway.max_inbound_media_bytes in config.yaml (default 128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml. Salvaged and extended from @sgaofen's PR #13341 (the original report and the shared-helper approach). Reapplied onto current main (Discord adapter has since moved to plugins/platforms/discord/), the configurable knob moved from an env var to config.yaml, and the video cache helper added. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-21 11:56:46 -07:00
tt-a1i	ea056b0559	fix(telegram): avoid rich messages for CJK text Telegram Mac/Desktop Bot API 10.1 rich-message rendering leaves garbled overlapping draft/overlay glyphs for CJK text (#47653), affecting every message containing CJK characters. The legacy MarkdownV2 path renders the same text cleanly, so skip the rich send / draft / final-edit paths up front for content containing CJK (incl. astral-plane extensions) until affected clients age out. Non-CJK rich rendering is preserved. Fixes #47653	2026-06-21 11:10:37 -07:00
Teknium	a966932392	fix(telegram): exempt tables from rich newline hard-breaks The newline normalization is the shared chokepoint for every rich send (sendRichMessage, draft, and editMessageText). Injecting a Markdown hard break (two trailing spaces) into a GFM table row separator corrupts the natively-rendered table — the rich path's headline feature. Protect both fenced code blocks AND pipe-table blocks as bare regions; only prose between them gets hard breaks. Verified RICH_CONTENT and the existing rich-table tests stay byte-identical.	2026-06-21 08:26:28 -07:00
Tranquil-Flow	31e59fe44d	fix(telegram): preserve newlines in rich slash-command output (#46070 ) Bot API 10.1 sendRichMessage treats a lone newline as a soft break, so multi-line content joined with "\n".join(lines) — slash-command lists, etc. — collapses into a single paragraph. Normalize single newlines to Markdown hard breaks (two trailing spaces) in _rich_message_payload, leaving paragraph breaks and fenced code blocks untouched. Fixes #46070	2026-06-21 08:26:28 -07:00
miha	796f618f99	fix(telegram): keep chunk markers outside code fences When truncate_message appends a (N/M) chunk indicator to a chunk that had to close an in-progress fenced code block, the marker lands on the closing fence line (``` \(1/2\) after MarkdownV2 escaping). Telegram does not treat that as a clean closing fence and rejects the MarkdownV2, falling back to plain text. Move the indicator onto its own line right after the closing fence at all three legacy-send call sites. Fixes #48517	2026-06-21 07:25:37 -07:00
Teknium	c1f11f8c69	fix(telegram): index streamed rich finals via editMessageText too The native echo recovery handles replies to most rich messages, but messages sent before the bot's first rich send have no echo to read. record() was only called on the fresh-send path (_try_send_rich); a streamed final finalized via _try_edit_rich/editMessageText was never indexed, so a reply to it had neither a native echo nor an index entry. Mirror the fresh-send record() into the edit success path to close that gap.	2026-06-20 23:42:47 -07:00
izumi0uu	29e5e127c6	fix(telegram): recover reply text from native rich echo Telegram DOES echo a rich message's content back in reply_to_message.api_kwargs['rich_message']['blocks'] when a user replies to it. Read that native field first in _build_message_event, keeping the local send-time index only as a fallback. Duck-type api_kwargs via .get() since it is a mappingproxy, not a dict. Fixes #49534	2026-06-20 23:42:47 -07:00
teknium1	79f297834a	fix(gateway): widen cron namespace-collision fix to all migrated adapters #49431 corrected parents[2]->parents[3] for discord + raft only. The same bug existed in slack, whatsapp, and telegram adapters (migrated from gateway/platforms/ in `5600105478`): each inserts parents[2] = plugins/ onto sys.path[0], shadowing the real cron/ package with plugins/cron/ so 'import cron.scheduler_provider' raises ModuleNotFoundError on gateway start. Fixes #49410, #49824.	2026-06-20 20:45:12 -07:00
kyssta-exe	4c206b972d	fix(gateway): correct sys.path insertion in plugins to prevent cron namespace collision (#49410 )	2026-06-20 20:45:12 -07:00
Zheng Tao	491579fa05	fix(whatsapp): resolve bridge dir with HERMES_HOME mirror in Docker In Docker the install tree (/opt/hermes) is read-only, so npm install for the WhatsApp bridge fails with EACCES. Add resolve_whatsapp_bridge_dir() in whatsapp_common.py: when the install dir is read-only, mirror the bridge source into a writable HERMES_HOME location and use that. Both the adapter and the 'hermes whatsapp' CLI resolve through the shared helper so the install and runtime paths agree. Fixes #49561	2026-06-20 17:05:27 -07:00

1 2 3 4

180 commits