The Discord adapter silently dropped any attachment whose extension wasn't
in the SUPPORTED_DOCUMENT_TYPES allowlist (PDF, text family, zip, office).
Users uploading .wav / .bin / other unrecognized formats saw nothing in
their conversation — the file got logged as 'Unsupported document type'
and discarded before the agent ever saw it.
Add discord.allow_any_attachment (default false) to bypass the allowlist.
When on:
- Any file is downloaded, cached under ~/.hermes/cache/documents/, and
surfaced as a DOCUMENT-typed event with application/octet-stream MIME
- gateway/run.py already emits a context note with the cached path,
auto-translated via to_agent_visible_cache_path() for Docker/Modal
sandboxed terminals
- File body is NOT inlined — only the path — so binary uploads don't
blow up the context window
- Allowlisted text formats (.txt/.md/.log) keep their 100 KiB inline
behavior unchanged
Also adds discord.max_attachment_bytes (default 32 MiB matches the
historical hardcoded cap; 0 = unlimited) since users opting into arbitrary
types may want to raise the cap. The whole attachment is held in memory
while being cached, so unlimited carries a real memory cost.
Env overrides: DISCORD_ALLOW_ANY_ATTACHMENT, DISCORD_MAX_ATTACHMENT_BYTES.
Discord-only by deliberate scope. Telegram has hard 20 MB API limits and
Slack has its own caps — extending the same flag there is a separate
follow-up if/when requested.
Emit a grep-friendly '[MEMORY] rss=...MB ...' line in agent.log /
gateway.log every N minutes (default 5) so slow leaks in the long-lived
gateway process show up as a time series. Based on
https://github.com/cline/cline/pull/10343
(src/standalone/memory-monitor.ts).
- gateway/memory_monitor.py: new module. Daemon thread, baseline on
start, final snapshot on stop. Uses resource.getrusage() (stdlib)
first, falls back to psutil, disables itself with one WARNING if
neither is available.
- gateway/run.py: start monitor right after setup_logging() in
start_gateway(); stop it in the shutdown block next to MCP teardown.
- hermes_cli/config.py: logging.memory_monitor { enabled, interval_seconds }
defaults under the existing logging section.
- tests/gateway/test_memory_monitor.py: 10 unit tests covering format,
baseline/shutdown snapshots, double-start noop, periodic timer,
daemon thread invariant, and unavailable-RSS warn-and-skip path.
Adapted from TypeScript/Node to Python (threading.Event-based daemon
thread instead of setInterval/unref), added Python-specific gc + thread
counts to the log line (handier than ext/arrayBuffers for diagnosing
Python gateway leaks), and gated behind a config.yaml toggle so users
can silence the periodic line if they want.
No heap-snapshot-on-OOM equivalent — CPython doesn't have V8's
--heapsnapshot-near-heap-limit; tracemalloc would be the Python
equivalent but adds non-trivial overhead, so leaving that out.
Port from qwibitai/nanoclaw#1962: modern Signal V2-only groups surface on
dataMessage.groupV2.id, not groupInfo.groupId. signal-cli versions differ
in which field they expose for V2 groups — some forward the underlying
libsignal envelope verbatim (groupV2), others normalize everything into
groupInfo. Without a groupV2 read, V2-only groups appear as DMs because
groupInfo is undefined and the adapter misroutes them to the sender's
DM session.
Reads groupV2.id first, falls back to groupInfo.groupId. Also hardens
chat_name extraction against non-dict groupInfo payloads (crashed with
AttributeError under malformed envelopes).
6 new tests cover V2 routing, V1 legacy compatibility, V2-preferred
precedence, no-group DM path, allowlist enforcement, and malformed
payloads.
When the agent is running and the user sends multiple TEXT messages in
rapid succession, base.py's active-session branch stored the pending
event as a single-slot replacement:
self._pending_messages[session_key] = event
Three rapid messages A, B, C landed as: A (interrupts), B (replaces A
before consumer reads), C (replaces B). Only C reached the next turn —
A and B were silently dropped. This is the symptom in #4469.
Route the follow-up through merge_pending_message_event(..., merge_text=True)
so TEXT events accumulate into the existing pending event's text instead
of clobbering it. Photo and media bursts already merged through the same
helper; this just extends the merge_text path (already used by the
Telegram bursty-grace branch in gateway/run.py) to all platforms.
Test exercises BasePlatformAdapter.handle_message directly with the
session marked active and asserts three rapid TEXT events merge to
'part two\\npart three' rather than dropping the middle message.
Sanity-checked the test would fail without the fix.
Credits @devorun for the original investigation and analysis in #4491
that surfaced the underlying queue handling, though their fix targeted
GatewayRunner._pending_messages which is now dead state on main.
Stop the gateway from exiting (or systemd-restart-looping) when a single
messaging adapter fails at startup or runtime. A misconfigured WhatsApp
(npm install timeout, unpaired bridge, missing creds.json) used to take
the entire gateway down, killing cron jobs and any other connected
platforms with it.
Changes:
• Startup (gateway/run.py): when connected_count==0 but the only
errors are retryable, log a degraded-state warning and keep the
gateway alive instead of returning False. Reconnect watcher then
recovers platforms as their underlying problem clears.
• Runtime (gateway/run.py _handle_adapter_fatal_error): when the last
adapter goes down with a retryable error and is queued for
reconnection, stay alive instead of exit-with-failure. Previously
this triggered systemd Restart=on-failure, which created infinite
restart loops on persistent retryable failures (proxy outage,
repeated bridge crashes).
• Reconnect watcher (gateway/run.py _platform_reconnect_watcher):
replace the 20-attempt hard drop with a circuit-breaker pause.
After _PAUSE_AFTER_FAILURES (10) consecutive retryable failures, the
platform stays in _failed_platforms with paused=True so the watcher
skips it but the operator can still see and resume it. Non-retryable
errors still drop out of the queue immediately. Resolves#17063
(gateway giving up on Telegram after 20 attempts).
• WhatsApp preflight (gateway/platforms/whatsapp.py): refuse to start
the Node bridge when creds.json is missing. Sets a non-retryable
whatsapp_not_paired fatal error so the watcher drops it cleanly
with a single 'run hermes whatsapp' log line instead of paying the
30s bridge bootstrap timeout on every gateway start.
• WhatsApp setup ordering (hermes_cli/main.py cmd_whatsapp): only set
WHATSAPP_ENABLED=true once pairing actually succeeds. Previously
the wizard wrote the env var at step 2 (before npm install and QR
pairing), so any Ctrl+C left .env claiming WhatsApp was ready when
the bridge had no creds.json. Also propagate the env var when the
user keeps an existing pairing on a re-run.
• /platform slash command (hermes_cli/commands.py + gateway/run.py):
new gateway-only command for manual circuit-breaker control.
/platform list — show connected + failed/paused platforms
/platform pause <name> — silence a known-broken platform
/platform resume <name> — re-queue a paused platform
Tests:
• New: pause/resume helpers, /platform list|pause|resume command,
WhatsApp creds.json preflight, WhatsApp setup ordering.
• Updated: stale assertions that codified the old 'exit and let
systemd restart' behavior in test_runner_fatal_adapter.py,
test_runner_startup_failures.py, and test_platform_reconnect.py
(the 20-attempt give-up test became a circuit-breaker pause test).
5488 tests pass in tests/gateway/.
SimpleX Chat (https://simplex.chat) is a private, decentralised messenger
with no persistent user IDs — every contact is identified by an opaque
internal ID generated at connection time. This adds it as a Hermes
gateway platform via the plugin system.
The adapter connects to a local simplex-chat daemon via WebSocket,
listens for inbound messages, and sends replies. Originally proposed in
PR #2558 as a core-modifying integration; reshaped here as a self-
contained plugin under plugins/platforms/simplex/ with no edits to any
core file. Discovery is filesystem-based (scanned by gateway.config),
and the platform identity is resolved on demand via Platform("simplex").
Plugin contract:
- check_requirements() requires SIMPLEX_WS_URL AND the websockets package
- validate_config() / is_connected() accept env or config.yaml input
- _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel)
- _standalone_send() supports out-of-process cron delivery
- interactive_setup() provides a stdin wizard for hermes gateway setup
- register() wires the adapter into the registry with required_env,
install_hint, cron_deliver_env_var, allowed_users_env, and a
platform_hint for the LLM.
Lazy dependency: the websockets Python package is imported inside the
functions that need it. The plugin is importable and discoverable even
when websockets is missing — check_requirements() simply returns False
until `pip install websockets` is run. No new pyproject extras are
introduced.
Environment variables:
SIMPLEX_WS_URL WebSocket URL of the daemon (required)
SIMPLEX_ALLOWED_USERS Comma-separated allowed contact IDs
SIMPLEX_ALLOW_ALL_USERS Set true to allow all contacts
SIMPLEX_HOME_CHANNEL Default contact for cron delivery
SIMPLEX_HOME_CHANNEL_NAME Human label for the home channel
Closes#2557.
ResponseStore.put() and .delete() now remove conversations rows that
reference evicted or deleted response IDs, preventing 404 errors when
a conversation name is reused after its backing response was purged.
Adds regression tests for delete, eviction, and handler-level reuse.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
was_auto_reset, auto_reset_reason, and reset_had_activity were not
included in SessionEntry.to_dict() / from_dict(), so a gateway restart
between session expiry and the user's next message would silently drop
the auto-reset notification and context note.
Add the three fields to the serialization roundtrip with safe defaults
(False / None / False) so existing sessions.json files load cleanly.
Add three roundtrip tests to test_session_reset_notify.py.
Follow-up to snav's PR #25463 contribution: flip default to on, broaden
scope so backfill fires whenever require_mention gates the bot (not just
shared-session channels).
Why:
- The mention-gate creates a session-transcript gap regardless of whether
the channel is shared or per-user. In per-user sessions, Alice's session
is still missing other participants' messages and her own pre-mention
messages — backfill fills both gaps.
- Threads naturally scope to thread-only history because discord.py's
channel.history() on a thread returns only that thread's messages.
- DMs still skip — every DM triggers the bot, so the session transcript
is already complete.
Changes:
- hermes_cli/config.py: discord.history_backfill default → true
- gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm
skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL
default → 'true'
- cli-config.yaml.example + website docs: update defaults and prose;
add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were
documented in the PR description but missing from the env-var table
- tests/gateway/test_discord_free_response.py:
- flip test_discord_per_user_channel_does_not_backfill →
test_discord_per_user_channel_backfills_too (new behavior)
- add test_discord_dm_does_not_backfill (DM skip is invariant)
- give FakeThread a no-op history() so existing thread tests don't hit
a fake discord.Forbidden when backfill now fires on threads too
Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.
Adds optional channel-context backfill for Discord shared-channel sessions
so the agent can see recent messages it missed between its own turns
(typically when require_mention=true filters out most traffic).
Previously the agent only saw the @mention message that triggered it, which
led to disorienting replies in active multi-user channels where the
conversation context was invisible. With backfill enabled, a configurable
number of recent messages are fetched per-turn and prepended to the trigger
message as a context block, kept separate from sender-prefix logic so
attribution remains clean.
This re-opens the work from #13063 (approved by @OutThisLife on 2026-04-20,
closed when I closed the branch to address the simpolism:main head-branch
issue plus an ordering bug I caught later in live use). Filing against the
freshly-rewritten problem statement in #13054 so the design is grounded in
the failure mode rather than the implementation shape.
The implementation follows the **push-mode last-self-anchored** design from
the two options laid out in #13054. See the issue for the trade-off
discussion vs pull-mode (#13120 was an earlier closed PR using that shape).
Treating this as a reference implementation — happy to rewrite as
last-trigger anchoring or as a hybrid with #13120 if maintainers prefer.
Changes:
- gateway/platforms/discord.py:
- new `_discord_history_backfill()` / `_discord_history_backfill_limit()`
helpers (config.extra > env > default), mirroring the existing
`_discord_require_mention()` shape
- new `_fetch_channel_context()` that scans `channel.history()` backwards
from the trigger to the bot's last message (or limit), formats as
`[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS,
skips system messages
- per-channel `_last_self_message_id` cache to narrow the fetch window
on hot paths (avoids full history scan when the bot has spoken recently)
- **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`.
discord.py 2.x silently flips the default to True when `after=` is supplied,
which would select the EARLIEST N messages after our last response instead
of the LATEST N before the trigger. In high-traffic windows this would
return stale tool traces and drop the actual final answer the user is
asking about. See regression test below. Caught in live use during a
Codex tool-trace burst on May 13 2026.
- gateway/config.py: discord_history_backfill + discord_history_backfill_limit
settings + yaml→env bridge
- gateway/platforms/base.py: channel_context field on MessageEvent
- gateway/run.py: prepend channel_context after sender-prefix so the
[sender name] tag applies to the trigger message alone, not to the backfill
- hermes_cli/config.py: defaults for new discord.history_backfill and
discord.history_backfill_limit keys
- cli-config.yaml.example: documented defaults
- tests/gateway/test_discord_free_response.py: 7 new tests covering
cold-start backfill, self-message stop boundary, other-bot filtering,
cache hot-path narrowing, stale-cache fallback, shared-channel +
per-user backfill paths, and the ordering regression test
(`test_fetch_channel_context_cache_uses_latest_window_when_after_set`)
- tests/gateway/test_config.py: yaml→env bridge tests
- tests/gateway/test_session.py: prefix-order edge cases
- website/docs/user-guide/messaging/discord.md: env vars + config keys +
usage docs
Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord
research server for the past three weeks.
Fixes#13054
Supersedes #13063 (closed)
When the stream consumer's got_done handler successfully delivers the
final response content via _send_or_edit but the subsequent edit
(e.g. cursor removal) fails, final_response_sent remains False even
though the user has already received the final answer. The gateway's
fallback send path then re-delivers the same content, causing the
user to see the response twice on Telegram.
Introduce a new _final_content_delivered flag on the stream consumer,
set by the got_done handler when the final content has reached the
user. The _run_agent suppression logic now treats this flag as an
additional signal (alongside final_response_sent and
response_previewed) that final delivery is already complete.
This preserves the existing behavior for intermediate-text-only
streams (where already_sent=True but no final content has been
delivered) — those still receive the gateway's fallback send, matching
the test expectation in test_partial_stream_output_does_not_set_already_sent.
Adds TestFinalContentDeliveredSuppression with two cases covering
both the suppression (content delivered + edit failed) and the
non-suppression (intermediate text only) branches.
WhatsApp pseudo-chats (Status updates / Stories, Channels / Newsletters,
broadcast lists) were being routed through the full agent pipeline. A
user's gateway.log showed the agent replying to a contact's Story
('status@broadcast') with 345 chars plus title-generation cost, which
also shows up in the contact's status feed.
Drop these JIDs at _should_process_message() before the policy gate so
they're filtered regardless of dm_policy or allowlist state. Covers:
- status@broadcast (Stories)
- *@newsletter (Channels)
- *@broadcast (broadcast lists, future-proofing)
The bridge.js already filters these on the fromMe outbound path, but
inbound events on self-chat mode skipped that check.
Tests:
- status@broadcast dropped on open policy
- broadcast filter wins over allowlisted senders
- real DMs still pass through
- helper unit cases (case-insensitive, whitespace-tolerant)
26/26 tests/gateway/test_whatsapp_group_gating.py pass; 59/59 adjacent
WhatsApp test suites pass.
The cherry-picked PR over-indented the edit_message_text block for
the mm: (model selected → switch) success path so the confirmation
edit lived inside the preceding 'except Exception as exc' branch and
only fired when the callback raised. Dedent the try/except back to
12-space indent so it runs after the callback succeeds, restoring
the original flow that removes the inline buttons and shows the
'Switched to ...' confirmation.
Add a regression test (test_model_selected_edits_message_on_success)
that asserts edit_message_text is awaited and the result text is
routed through format_message (MARKDOWN_V2 + backtick survival).
Add phuongvm to scripts/release.py AUTHOR_MAP.
Use MarkdownV2 formatting for Telegram callback follow-ups and interactive prompts where dynamic names or user text can break legacy Markdown parsing. Add regression coverage for reload-mcp, model picker, approval callbacks, and update prompts.
Brings Discord to parity with Telegram on the clarify tool's interactive
UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach
a button view when choices are present.
- ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's
25-component view cap leaves one slot for Other) plus a final
'Other (type answer)' button.
- Numeric click -> tools.clarify_gateway.resolve_gateway_clarify(
clarify_id, choice_text) using the canonical choice text from the
gateway entry (falls back to the button label if the entry vanished).
- Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so
the gateway's text-intercept captures the next user message in this
session as the response.
- Auth via the shared _component_check_auth helper (same OR-semantics as
ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView).
- Open-ended (no choices) path renders the prompt as a plain embed and
relies on the existing text-intercept resolution.
- Single-use: first valid click disables every button and updates the
embed footer with who answered and what they chose.
No changes to BasePlatformAdapter.send_clarify or the gateway's
clarify_callback wiring -- the existing scaffolding already drives all
adapters; Discord just inherits the default text fallback today and gains
buttons by virtue of this override.
Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs
so tests can construct embedded views without monkey-patching per-test.
Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's
work onto current main's clarify infrastructure (clarify_id + entry-based
resolution shared with Telegram, instead of a parallel on_answer-closure
mechanism). The button view structure and UX shape are preserved.
Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py.
391/391 existing Discord gateway tests still pass.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
By default, once Hermes participates in a Discord thread (auto-created on
@mention or replied in once) it auto-responds to every subsequent message
in that thread without requiring further @mentions. That's the right default
for one-on-one conversations and isolated channel threads.
But it's a confirmed footgun in multi-bot threads. When a user invokes one
bot per turn — addressing Codex first, then Hermes — every other bot in the
thread also fires on every message, burning credits and spamming the channel.
Author has hit this personally in active multi-bot research-team threads.
Add a new `discord.thread_require_mention` config key (env:
`DISCORD_THREAD_REQUIRE_MENTION`), default `false` to preserve existing
behavior. When `true`, the in-thread mention shortcut is disabled and
threads are gated the same way channels are. Explicit @mentions still pass
through as expected.
Mirrors the existing helper shape (config.extra > env > default) and the
existing yaml→env bridge pattern used by `require_mention`.
Changes:
- gateway/platforms/discord.py: new `_discord_thread_require_mention()`
helper; in_bot_thread shortcut now AND's with `not _discord_thread_require_mention()`
- gateway/config.py: bridge `discord.thread_require_mention` from config.yaml
to `DISCORD_THREAD_REQUIRE_MENTION` env var (mirrors the existing
`require_mention` bridge two lines above)
- hermes_cli/config.py: add `thread_require_mention: False` default to
DEFAULT_CONFIG['discord']
- tests/gateway/test_discord_free_response.py: 4 new tests covering default
behaviour (in-thread shortcut still works), enabled behaviour (mention
required in threads), enabled+mentioned (mention still passes through),
and yaml-via-config.extra path. Also clears DISCORD_* env vars in the
`adapter` fixture so process-env state from the contributor's shell
doesn't leak into per-test behaviour.
- tests/gateway/test_config.py: 2 new tests covering the yaml→env bridge
(both the apply-from-yaml and env-precedence-over-yaml paths)
- website/docs/user-guide/messaging/discord.md: document the new env var
+ config key with multi-bot rationale; cross-link from `auto_thread`
section
Tested on Ubuntu 24.04.
Free-response channels are intended as lightweight chat surfaces — the bot
responds to every message without requiring an @mention. But the auto-thread
gate only checked DISCORD_NO_THREAD_CHANNELS, not DISCORD_FREE_RESPONSE_CHANNELS,
so every message in a free-response channel still spawned a brand-new thread.
That turns a chat channel into a thread-spawning machine: 1 thread per message.
The user-facing docs at website/docs/user-guide/messaging/discord.md already
describe the intended behavior ("Free-response channels also skip auto-threading
— the bot replies inline rather than spinning off a new thread per message"),
so this is a code-vs-docs gap, not a design change.
Fix: OR is_free_channel into skip_thread alongside the existing no_thread_channels
check. One-line production change.
Regression test added at tests/gateway/test_discord_free_response.py:
test_discord_free_response_channel_skips_auto_thread asserts that a message
in a free-response channel never calls _auto_create_thread. Reverting the
one-line fix causes the test to fail with 'Expected mock to not have been
awaited. Awaited 1 times.' — i.e. the test demonstrates the bug concretely.
Lets platform plugins own their YAML→env config bridge instead of forcing
core gateway/config.py to know every platform's schema.
The hook receives the full parsed config.yaml and the platform's own
sub-dict, may mutate os.environ (env > YAML precedence preserved via the
standard `not os.getenv(...)` guards), and may return a dict to merge
into PlatformConfig.extra. It runs during load_gateway_config() after
the existing generic shared-key loop and before _apply_env_overrides(),
mirroring the env_enablement_fn dispatch pattern (#21306, #21331).
Pure addition — no behavior change for existing platforms. Each of the
eight platforms with hardcoded YAML→env blocks today (discord, telegram,
whatsapp, slack, dingtalk, mattermost, matrix, feishu, ~252 LOC in
gateway/config.py) can migrate in independent follow-up PRs; the
hardcoded blocks remain functional in the meantime, and their
`not os.getenv(...)` guards make them no-ops for any env var the hook
already set.
Test coverage: 10 new tests in tests/gateway/test_platform_registry.py
covering field default, callable acceptance, env mutation, extras
merge, both signature args, exception swallowing, missing/non-dict
sections, and env > YAML precedence.
Refs #3823, #24356.
Closes#24836.
Slack platform-blocks native slash commands inside thread replies ("/queue
is not supported in threads. Sorry!") and there is no app-side setting to
re-enable them. As a workaround, rewrite a leading '!' to '/' for any known
gateway command before downstream processing — so '!queue', '!stop',
'!model gpt-5.4' etc. work inside Slack threads (and anywhere else).
Only the first token is checked against is_gateway_known_command(), so
casual messages like '!nice work' pass through to the agent unchanged.
Downstream pipeline (MessageType.COMMAND tagging, gateway dispatcher,
thread reply routing) is unchanged.
Adds 6 tests covering rewrite, args preservation, thread routing,
casual-message passthrough, '@bot' suffix, and plain '/' still-works.
Replace tenant-specific example text in the transcript offset regression with generic follow-up turns so the upstream test documents the bug without customer-specific wording.
Keep the outer history_offset when _run_agent drains queued follow-ups recursively so transcript persistence includes every queued turn in the chain instead of only the last one.
The test_restart_command_while_busy_requests_drain_without_interrupt test
was asserting against a hardcoded emoji string that was valid before the
i18n migration. After gateway/run.py switched to t("gateway.draining",
count=N), the test sees the translated output (or the raw key when the
locale catalog isn't resolved in xdist workers).
Fix by asserting against t("gateway.draining", count=1) — this produces
the correct expected value regardless of whether the locale file is
available in the test environment.
When the user runs /stop or a session is interrupted mid-flight, the
👀 in-progress reaction lingered on the user's message indefinitely.
Without another agent run to swap it for 👍/👎, the eyes stayed there
forever — visually misleading (looks like the agent is still working).
Fix: on ProcessingOutcome.CANCELLED, call set_message_reaction with
reaction=None to clear all reactions on the message. Documented Bot API
semantics (equivalent to Bot API 10.0's deleteMessageReaction, but works
on PTB 22.6 already without the version bump).
Test changes:
- Renamed test_on_processing_complete_cancelled_keeps_existing_reaction
→ test_on_processing_complete_cancelled_clears_reaction; updated
assertion to expect set_message_reaction(reaction=None).
- Added test_on_processing_complete_cancelled_skipped_when_disabled
(TELEGRAM_REACTIONS=false short-circuits).
- Added test_clear_reactions_handles_api_error_gracefully and
test_clear_reactions_returns_false_without_bot to cover the new
_clear_reactions helper.
The clarify tool returned 'not available in this execution context' for
every gateway-mode agent because gateway/run.py never passed
clarify_callback into the AIAgent constructor. Schema actively encouraged
calling it; users never saw the question.
Changes:
- tools/clarify_gateway.py — new event-based primitive mirroring
tools/approval.py: register/wait_for_response/resolve_gateway_clarify
with per-session FIFO, threading.Event blocking with 1s heartbeat
slices (so the inactivity watchdog keeps ticking), and
clear_session for boundary cleanup.
- gateway/platforms/base.py — abstract send_clarify with a numbered-text
fallback so every adapter (Discord, Slack, WhatsApp, Signal, Matrix,
etc.) gets a working clarify out of the box. Plus an active-session
bypass: when the agent is blocked on a text-awaiting clarify, the next
non-command message routes inline to the runner's intercept instead
of being queued + triggering an interrupt. Same shape as the /approve
deadlock fix from PR #4926.
- gateway/platforms/telegram.py — concrete send_clarify renders one
inline button per choice plus '✏️ Other (type answer)'. cl: callback
handler resolves numeric choices immediately, flips to text-capture
mode for Other, with the same authorization guards as exec/slash
approvals.
- gateway/run.py — clarify_callback wired at the cached-agent per-turn
callback assignment site (only the user-facing agent path; cron and
hygiene-compress agents have no human attached). Bridges sync→async
via run_coroutine_threadsafe, blocks with the configured timeout, and
returns a '[user did not respond within Xm]' sentinel on timeout so
the agent adapts rather than pinning the running-agent guard. Text-
intercept added to _handle_message before slash-confirm intercept
(skipping slash commands). clear_session called in the run's finally
to cancel any orphan entries.
- hermes_cli/config.py — agent.clarify_timeout default 600s.
- website/docs/user-guide/messaging/telegram.md — Interactive Prompts
section.
Tests:
- tests/tools/test_clarify_gateway.py (14 tests) — full primitive
coverage: button resolve, open-ended auto-await, Other flip, timeout
None, unknown-id idempotency, clear_session cancellation, FIFO
ordering, register/unregister notify, config default.
- tests/gateway/test_telegram_clarify_buttons.py (12 tests) — render
paths (multi-choice/open-ended/long-label/HTML-escape/not-connected),
callback dispatch (numeric resolve/Other flip/already-resolved/
unauthorized/invalid-token), and base-adapter text fallback.
Out of scope: bot-to-bot, guest mode, checklists, poll media, live
photos. Closes#24191.
PR #24500 introduced stale-lock detection that calls
`_looks_like_gateway_process` to confirm a running PID is not an
unrelated process that reused the slot. On Windows neither `/proc`
nor `ps` is available, so `_read_process_cmdline` always returns
`None` and `_looks_like_gateway_process` always returns `False` —
causing every valid Windows gateway lock to be marked stale and
immediately evicted.
Fix: after `_looks_like_gateway_process` returns `False`, call
`_read_process_cmdline` directly. If the result is non-`None` the
live cmdline was readable and confirms the PID is foreign → stale.
If it is `None` (cmdline unreadable, e.g. Windows without ps), fall
back to `_record_looks_like_gateway` which validates the stored
`argv` the gateway wrote into the lock file at startup. Both
oracles must say "not a gateway" before the lock is evicted — the
same two-oracle pattern already used in `get_running_pid` (line 941).
Adds a regression test that simulates a Windows host where
`_looks_like_gateway_process` returns `False` for every PID and
`_read_process_cmdline` returns `None`, confirming the lock is kept
when the record's argv identifies it as a gateway process.
Post-#21561 the liveness probe in acquire_scoped_lock() routes through
gateway.status._pid_exists (psutil-first, safe on Windows), not
os.kill(pid, 0). The two new macOS regression tests were patching
status.os.kill, which had no effect — the unmocked psutil call returned
False for PID 99999, marking the lock stale before the new code branch
ran. The 'replaces' test passed only because acquired=True was already
the expected outcome; the 'keeps' test failed in CI.
Switch both tests to monkeypatch status._pid_exists directly, matching
the existing test_acquire_scoped_lock_rejects_live_other_process pattern,
so they actually exercise the new start_time=None + cmdline-based
staleness branch.
On macOS (and Windows), /proc is unavailable so _get_process_start_time()
always returns None. When a gateway creates a scoped lock record with
start_time=None and then exits, macOS can reuse that PID for an unrelated
process. On restart, acquire_scoped_lock() sees:
1. os.kill(pid, 0) succeeds (PID is alive — but it's bluetoothuserd, not
the gateway)
2. existing.start_time is None and current_start is None, so the
start_time comparison is inconclusive
3. The lock is treated as active, blocking gateway startup with:
"Telegram bot token already in use (PID 873). Stop the other gateway
first."
Root cause: _read_process_cmdline() only reads /proc/<pid>/cmdline, which
doesn't exist on macOS. It always returns None, making
_looks_like_gateway_process() always return False, so the cmdline fallback
path in acquire_scoped_lock() was unreachable on macOS.
Fix (two parts):
1. _read_process_cmdline(): Add a ps(1) fallback for platforms without
/proc. When /proc/<pid>/cmdline doesn't exist, we now run
"ps -p <pid> -o command=" to retrieve the process command line. The
/proc path is tried first (preserving Linux performance); ps is only
invoked as a fallback.
2. acquire_scoped_lock(): When both the lock record's start_time and the
live process's start_time are None (the macOS case), fall back to
checking whether the live PID still looks like a Hermes gateway process
via _looks_like_gateway_process(). If it doesn't, the lock is stale.
Closes#16376
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)"
This reverts commit a63a2b7c78.
* Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)"
This reverts commit 4a080b1d5a.
* Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)"
This reverts commit 404640a2b7.
When the Discord typing API call fails (rate limit, network error, 403),
_typing_loop returns early but the stale task remains in _typing_tasks.
Subsequent send_typing calls see the stale entry and skip, leaving no
typing indicator for the rest of the agent invocation.
Add finally block to _typing_loop to always remove the task from
_typing_tasks on exit, whether from cancellation, error, or normal
completion. This allows send_typing to create a fresh task.
3 new tests in test_discord_send.py:
- Task removed after API error
- Typing restartable after failure
- stop_typing cleans up
Two new tests:
- tests/gateway/test_telegram_format.py
test_message_too_long_splits_into_continuations_not_silent_truncation:
asserts edit_message returns success=True with continuation_message_ids
populated and message_id pointing at the last continuation when
content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original
fail-on-overflow assertion with the split-and-deliver contract.
- tests/gateway/test_stream_consumer.py
TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver:
asserts the consumer side updates _message_id to the latest
continuation, clears _last_sent_text, and fires on_new_message when
the adapter reports a split-and-deliver result.
Added tests/gateway/test_stream_consumer_draft.py with 11 tests
covering:
- Transport selection: auto+dm-supported -> draft; auto+group -> edit;
explicit edit; explicit draft on unsupported adapter -> edit;
MagicMock adapter -> edit (back-compat for the existing test suite).
- Happy path: DM stream animates draft frames with a single shared
draft_id, then finalizes via a regular adapter.send.
- Group fallback: drafts entirely skipped in non-DM chats.
- Failure fallback: send_draft returning success=False disables drafts
for the rest of the response.
- Draft_id lifecycle: consecutive responses use distinct ids; tool
boundaries bump the id so post-tool text animates fresh below the
tool-progress bubble (the openclaw #32535 leak guard).
- _already_sent contract: drafts must NOT set the flag so the gateway's
fallback final-send still fires (drafts have no message_id).
Updated website/docs/user-guide/messaging/telegram.md with a
'Streaming transport' section explaining auto|draft|edit|off, the
DM-only constraint, and the per-response fallback behaviour.
Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id
issue (#3206):
1. _build_message_event filters DM thread_ids: only preserve thread_id
for real topic messages (is_topic_message=True). Telegram puts
message_thread_id on every DM that is a reply, but reply-chain ids
route to nonexistent threads on send.
2. _send_message_with_thread_fallback helper: control sends
(send_update_prompt, send_exec_approval / send_slash_confirm,
send_model_picker) retry once without message_thread_id when
Telegram returns BadRequest 'Message thread not found'. Mirrors
the pattern PR #3390 added for the streaming send path.
Salvage notes:
- Conflict 1 (line ~4099): merged the contributor's DM is_topic_message
filter with the existing forum General-topic default from #22423,
preserving both behaviors.
- Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416)
alongside the new helper. Tightened the helper's exception catch
from bare 'Exception' to use the existing _is_bad_request_error +
_is_thread_not_found_error helpers (line 484-496) for consistency
with the streaming send path.
- Widened the fix to send_update_prompt (was bare self._bot.send_message,
same bug class).
Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@
local commit author).
* feat(goals): /goal checklist + /subgoal user controls
Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.
Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
more items not fewer. Falls through to legacy freeform judge when
decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
A bounded read_file tool (max 5 calls per turn, restricted to that
one file) lets the judge inspect history when the snippet is
ambiguous. Stickiness in code: terminal items are frozen, only the
user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
reverts to old prompt when empty.
- Status line shows M/N done counts.
CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.
Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.
* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning
Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:
1. Off-by-one bug — the judge sees the checklist rendered with 1-based
indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
state.checklist as 0-based. Result: every judge update landed on
the wrong item, evidence got attached to neighbouring rows, and
the genuine 'first pending' item (usually #1) never got marked.
Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
user prompt to call out the 1-based scheme explicitly. New tests
cover the parser conversion + an end-to-end fake-judge round-trip.
2. Conversation dump never happened — _extract_agent_messages tried
common AIAgent attribute names (.messages, .conversation_history,
etc.) but AIAgent doesn't expose the message list as an instance
attribute; it lives inside run_conversation()'s scope. Result: the
judge's read_file tool always saw history_path=unavailable. Fix:
added an explicit messages= kwarg to evaluate_after_turn that all
three call sites (CLI, gateway, TUI gateway) now pass directly.
Agent-attribute extraction kept as back-compat fallback.
3. Prompt was too harsh on simple goals. The original 'be HARSH,
default to leaving items pending' wording made the judge refuse
to mark 'file exists' completed even after the agent ran ls,
test -f, os.path.isfile, and find — burning the entire 8-turn
budget on a fizzbuzz task. Softened to 'strict but not absurd'
with explicit guidance on what counts as evidence and a directive
not to require re-proving items already established earlier.
Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
New TestUtf16OverflowDetection class covers two scenarios:
- test_emoji_text_exceeding_utf16_limit_triggers_overflow_split: feeds
2200 emoji codepoints (4400 UTF-16 units) — under Telegram's
codepoint-equivalent limit but over its UTF-16 limit. Asserts
truncate_message was called with len_fn=utf16_len, confirming the
consumer detected the overflow.
- test_codepoint_only_adapter_falls_back_to_len: documents that
adapters which don't subclass BasePlatformAdapter (or test MagicMocks)
fall back to plain len for backwards compat.
The contributor's PR shipped no tests for the UTF-16 path.
The contributor's regression test for Feishu fallback thread routing
asserted on attributes specific to the real lark SDK builder
(call_args.body, body.receive_id). In test environments without the
lark SDK installed, the in-tree fallback (gateway/platforms/feishu.py
_build_create_message_request) returns a SimpleNamespace using
.request_body instead of .body, causing AttributeError.
Now reads via getattr fallback and also verifies receive_id_type is
'thread_id' (not 'chat_id') as a stronger contract check.
When the first streamed message exceeds the platform length limit and
gets split into chunks, _send_new_chunk was called with self._message_id
(which is None on first send), dropping thread routing entirely.
Fallback to self._initial_reply_to_id so overflow chunks land in the
correct topic/thread.
Also fix a fragile test assertion that could be silently skipped.
Cherry-picked from PR #13077 commits:
- 5500c7d8 fix(gateway): stream consumer first message drops thread context
- e84403b9 test(gateway): add regression tests for stream consumer thread routing
Fixes: Streaming first message drops thread/topic context in Feishu group
topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id
ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and
_send_new_chunk. Also fixes Feishu _send_raw_message fallback path
(reply -> create) to use receive_id_type='thread_id' so the new message
lands in the correct topic instead of the main channel.
Authored by hrygo via PR #13077 (re-attributed from the bot-authored
salvage commit on the original branch).
The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was
copying the cumulative _already_sent flag into _final_response_sent on the
done frame. _already_sent goes True on any successful prior edit (tool
progress) or on fallback-mode promotion when an edit fails — neither
proves the *current* chunked send delivered the final answer.
When the chunked send actually fails (network error, flood control), the
consumer would wrongly claim 'final delivered' and the gateway's
independent fallback delivery in run.py would be suppressed. User saw
only tool-progress bubbles and never got the answer.
Now we track per-chunk success locally: _send_new_chunk returns the new
message_id on success or returns the passed-in reply_to unchanged on
failure. If at least one returned id differs, chunks_delivered = True;
otherwise stays False, gateway fallback runs.
Adds two regression tests:
- test_split_overflow_failed_send_does_not_mark_final_sent — primes
_already_sent=True, then makes every send fail; asserts
_final_response_sent stays False.
- test_split_overflow_partial_send_marks_final_sent — happy path,
asserts _final_response_sent goes True.
Note: the companion bug at the CancelledError handler (issue cited
lines 417-418) was already fixed by 3b5572ded on 2026-04-16.
Closes#10748
Follow-up to the previous commit's notifier behavior change. Two test fixes:
1. `tests/gateway/test_kanban_notifier.py` gains
`test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new
contract directly: a task that crashes, gets reclaimed, and crashes
again notifies the user BOTH times. Before #21398 the second crash
silently dropped because the subscription was already deleted.
2. `tests/hermes_cli/test_kanban_notify.py::
test_notifier_unsubs_after_abnormal_events[gave_up|crashed|timed_out]`
is flipped. Those tests were added in the salvage of #22941 and
asserted the OLD behavior (subscription deleted after gave_up /
crashed / timed_out). They're now obsolete — the new contract is
"subscription survives a non-final terminal event so retries reach
the user." Updated docstring + asserts; the cursor-advance check is
added to confirm the dedup mechanism still works.
The `test_notifier_unsubs_after_completed_event` test stays untouched
because `completed` IS still a terminal event that triggers unsub
(the task hits `done` status, which is handled by the `task_terminal`
branch in the notifier loop).
Follow-up to HuangYuChuh's #17384 cherry-pick:
- Use defensive getattr+logger.debug for delete_message lookup, mirroring
the sibling _try_send_fresh_final cleanup pattern at L820+. Platforms
that don't implement delete_message no longer raise AttributeError; the
failure path now logs at debug for diagnosability instead of silently
swallowing.
- Add three regression tests in tests/gateway/test_stream_consumer.py:
- delete_message awaited on happy-path exit with stale id
- delete_message NOT awaited when no fallback chunks reached the user
- no crash on adapters that lack delete_message (spec-restricted mock)
Two follow-up improvements to the previous commit's notifier dedup work.
1. Add a regression test for the send-exception rewind path. The
contributor's PR included a test for the adapter-disconnect path
(test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where
adapter is None at delivery time), but not for the "adapter is
connected, send() raises" path that fires inside the inner try/except
at gateway/run.py:4314. The new test
(test_kanban_notifier_rewinds_claim_on_send_exception) uses a
FailingAdapter that always raises and confirms (a) send was actually
attempted, (b) the claim was rewound, (c) the next call to
unseen_events_for_sub still returns the event for retry.
2. Drop the per-delivery success log from INFO to DEBUG. A busy board
on a multi-platform gateway can produce hundreds of these per day;
that's gateway.log noise that obscures real warnings. Failure paths
stay at WARNING (where you'd want to look when something's wrong)
so we don't lose visibility into transient send issues.