* - make buffered streaming
- fix path naming to expand `~` for agent.
- fix stripping of matrix ID to not remove other mentions / localports.
* fix(matrix): register MembershipEventDispatcher for invite auto-join
The mautrix migration (#7518) broke auto-join because InternalEventType.INVITE
events are only dispatched when MembershipEventDispatcher is registered on the
client. Without it, _on_invite is dead code and the bot silently ignores all
room invites.
Closes#10094Closes#10725
Refs: PR #10135 (digging-airfare-4u), PR #10732 (fxfitz)
* fix(matrix): preserve _joined_rooms reference for CryptoStateStore
connect() reassigned self._joined_rooms = set(...) after initial sync,
orphaning the reference captured by _CryptoStateStore at init time.
find_shared_rooms() returned [] forever, breaking Megolm session rotation
on membership changes.
Mutate in place with clear() + update() so the CryptoStateStore reference
stays valid.
Refs #8174, PR #8215
* fix(matrix): remove dual ROOM_ENCRYPTED handler to fix dedup race
mautrix auto-registers DecryptionDispatcher when client.crypto is set.
The adapter also registered _on_encrypted_event for the same event type.
_on_encrypted_event had zero awaits and won the race to mark event IDs
in the dedup set, causing _on_room_message to drop successfully decrypted
events from DecryptionDispatcher. The retry loop masked this by re-decrypting
every message ~4 seconds later.
Remove _on_encrypted_event entirely. DecryptionDispatcher handles decryption;
genuinely undecryptable events are logged by mautrix and retried on next
key exchange.
Refs #8174, PR #8215
* fix(matrix): re-verify device keys after share_keys() upload
Matrix homeservers treat ed25519 identity keys as immutable per device.
share_keys() can return 200 but silently ignore new keys if the device
already exists with different identity keys. The bot would proceed with
shared=True while peers encrypt to the old (unreachable) keys.
Now re-queries the server after share_keys() and fails closed if keys
don't match, with an actionable error message.
Refs #8174, PR #8215
* fix(matrix): encrypt outbound attachments in E2EE rooms
_upload_and_send() uploaded raw bytes and used the 'url' key for all
rooms. In E2EE rooms, media must be encrypted client-side with
encrypt_attachment(), the ciphertext uploaded, and the 'file' key
(with key/iv/hashes) used instead of 'url'.
Now detects encrypted rooms via state_store.is_encrypted() and
branches to the encrypted upload path.
Refs: PR #9822 (charles-brooks)
* fix(matrix): add stop_typing to clear typing indicator after response
The adapter set a 30-second typing timeout but never cleared it.
The base class stop_typing() is a no-op, so the typing indicator
lingered for up to 30 seconds after each response.
Closes#6016
Refs: PR #6020 (r266-tech)
* fix(matrix): cache all media types locally, not just photos/voice
should_cache_locally only covered PHOTO, VOICE, and encrypted media.
Unencrypted audio/video/documents in plaintext rooms were passed as MXC
URLs that require authentication the agent doesn't have, resulting
in 401 errors.
Refs #3487, #3806
* fix(matrix): detect stale OTK conflict on startup and fail closed
When crypto state is wiped but the same device ID is reused, the
homeserver may still hold one-time keys signed with the previous
identity key. Identity key re-upload succeeds but OTK uploads fail
with "already exists" and a signature mismatch. Peers cannot
establish new Olm sessions, so all new messages are undecryptable.
Now proactively flushes OTKs via share_keys() during connect() and
catches the "already exists" error with an actionable log message
telling the operator to purge the device from the homeserver or
generate a fresh device ID.
Also documents the crypto store recovery procedure in the Matrix
setup guide.
Refs #8174
* docs(matrix): improve crypto recovery docs per review
- Put easy path (fresh access token) first, manual purge second
- URL-encode user ID in Synapse admin API example
- Note that device deletion may invalidate the access token
- Add "stop Synapse first" caveat for direct SQLite approach
- Mention the fail-closed startup detection behavior
- Add back-reference from upgrade section to OTK warning
* refactor(matrix): cleanup from code review
- Extract _extract_server_ed25519() and _reverify_keys_after_upload()
to deduplicate the re-verification block (was copy-pasted in two
places, three copies of ed25519 key extraction total)
- Remove dead code: _pending_megolm, _retry_pending_decryptions,
_MAX_PENDING_EVENTS, _PENDING_EVENT_TTL — all orphaned after
removing _on_encrypted_event
- Remove tautological TestMediaCacheGate (tested its own predicate,
not production code)
- Remove dead TestMatrixMegolmEventHandling and
TestMatrixRetryPendingDecryptions (tested removed methods)
- Merge duplicate TestMatrixStopTyping into TestMatrixTypingIndicator
- Trim comment to just the "why"
Users (Teknium) report missing debug reports before the 1-hour auto-delete
fires. 6 hours gives enough window for async bug-report triage without
leaving sensitive log data on public paste services indefinitely.
Applies to both the CLI (hermes debug share) and gateway (/debug) paths.
Initialize next_channel_prompt before the pending_event check and use
getattr with None default, matching the existing pattern for
next_source/next_message/next_message_id. Prevents AttributeError
when pending_event is None (interrupt path).
Cherry-picked from #10953 by @jackjin1997.
Switch from fragile Markdown V1 to HTML parse mode with html.escape()
for exec approval messages. Add fallback to text-based approval when
the formatted send fails.
Cherry-picked from #10999 by @danieldoderlein.
config.yaml terminal.cwd is now the single source of truth for working
directory. MESSAGING_CWD and TERMINAL_CWD in .env are deprecated with a
migration warning.
Changes:
1. config.py: Remove MESSAGING_CWD from OPTIONAL_ENV_VARS (setup wizard
no longer prompts for it). Add warn_deprecated_cwd_env_vars() that
prints a migration hint when deprecated env vars are detected.
2. gateway/run.py: Replace all MESSAGING_CWD reads with TERMINAL_CWD
(which is bridged from config.yaml terminal.cwd). MESSAGING_CWD is
still accepted as a backward-compat fallback with deprecation warning.
Config bridge skips cwd placeholder values so they don't clobber
the resolved TERMINAL_CWD.
3. cli.py: Guard against lazy-import clobbering — when cli.py is
imported lazily during gateway runtime (via delegate_tool), don't
let load_cli_config() overwrite an already-resolved TERMINAL_CWD
with os.getcwd() of the service's working directory. (#10817)
4. hermes_cli/main.py: Add 'hermes memory reset' command with
--target all/memory/user and --yes flags. Profile-scoped via
HERMES_HOME.
Migration path for users with .env settings:
Remove MESSAGING_CWD / TERMINAL_CWD from .env
Add to config.yaml:
terminal:
cwd: /your/project/path
Addresses: #10225, #4672, #10817, #7663
When execute_code times out, the result JSON had status="timeout" and an
error field, but the output field was empty. Many models treat empty
output as "nothing happened" and produce an empty/minimal response. The
gateway stream consumer then considers the response "already sent" (from
pre-tool streaming) and silently drops it — leaving the user staring at
silence.
Three changes:
1. Include the timeout message in the output field (both local and remote
paths) so the model always has visible content to relay to the user.
2. Add periodic activity callbacks to the local execution polling loop so
the gateway's inactivity monitor knows execute_code is alive during
long runs.
3. Fix stream_consumer._send_fallback_final to not silently drop content
when the continuation appears empty but the final text differs from
what was previously streamed (e.g. after a tool boundary reset).
When the LLM returns an empty completion, gateway/run.py replaced
final_response with the literal string '(No response generated)'.
This defeated cron/scheduler.py's empty-response skip guard, causing
the placeholder to be delivered to home channels.
Changes:
- gateway/run.py: return empty string instead of placeholder when
there is no error and no response content
- cron/scheduler.py: defensively strip the placeholder text in case
any upstream path still produces it
FixesNousResearch/hermes-agent#9270
The cancellation handler previously promoted any partial send
(already_sent=True) to final_response_sent=True unconditionally.
This meant if intermediate text (e.g. 'Let me search…') was streamed
and the consumer was cancelled before delivering the actual answer,
the gateway's suppression check would still prevent the fallback send.
Now final_response_sent is only set in the cancellation path when:
- The best-effort send of accumulated content actually succeeded, OR
- It was already confirmed before cancellation
Companion fix for PR #11000's run.py changes — closes the
cancellation-path loophole that would otherwise let partial streams
suppress final delivery during queued follow-ups.
All 10 call sites in gateway/run.py and gateway/platforms/api_server.py
are inside async functions where a loop is guaranteed to be running.
get_event_loop() is deprecated since Python 3.10 — it can silently
create a new loop when none is running, masking bugs.
get_running_loop() raises RuntimeError instead, which is safer.
Surfaced during review of PRs #10533 and #10647.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Each top-level Slack DM now gets its own Hermes session, matching the
per-thread behavior channels already have. Previously all top-level DM
messages shared one continuous session because thread_ts was None,
causing context to accumulate across unrelated conversations.
The behavior is controlled by platforms.slack.extra.dm_top_level_threads_as_sessions
in config.yaml (default: true). Set to false to restore legacy behavior.
Based on PR #10789 by helix4u. Changes from original:
- Default flipped to true (was opt-in, now opt-out)
- Removed env var fallback (config.yaml only per project policy)
- Tests updated to cover both default and opt-out paths
Bump connect retry attempts from 3 to 8 and cap exponential backoff at
15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for
cold boot on slow networks or embedded devices. New budget: 8 attempts,
1+2+4+8+15+15+15=~60s total.
Inspired by PR #5770 by @Bartok9 (re-implemented against current main
since original was 913 commits stale with conflicts).
Three targeted fixes for the 'agent stuck on terminal command' report:
1. **Concurrent tool wait loop now checks interrupts** (run_agent.py)
The sequential path checked _interrupt_requested before each tool call,
but the concurrent path's wait loop just blocked with 30s timeouts.
Now polls every 5s and cancels pending futures on interrupt, giving
already-running tools 3s to notice the per-thread interrupt signal.
2. **Cancelled concurrent tools get proper interrupt messages** (run_agent.py)
When a concurrent tool is cancelled or didn't return a result due to
interrupt, the tool result message says 'skipped due to user interrupt'
instead of a generic error.
3. **Typing indicator fires before follow-up turn** (gateway/run.py)
After an interrupt is acknowledged and the pending message dequeued,
the gateway now sends a typing indicator before starting the recursive
_run_agent call. This gives the user immediate visual feedback that
the system is processing their new message (closing the perceived
'dead air' gap between the interrupt ack and the response).
Reported by @_SushantSays.
Fixes 12 CI test failures:
1. test_cli_new_session (4): _FakeAgent missing commit_memory_session
attribute added in the memory provider refactoring. Added MagicMock.
2. test_run_progress_topics (1): already_sent detection only checked
stream consumer flags, missing the response_previewed path from
interim_assistant_callback. Restructured guard to check both paths.
3. test_timezone (1): HERMES_TIMEZONE leaked into child processes via
_SAFE_ENV_PREFIXES matching HERMES_*. The code correctly converts
it to TZ but didn't remove the original. Added child_env.pop().
4. test_session_env (1): contextvars baseline captured from a different
context couldn't be restored after clear. Changed assertion to verify
the test's value was removed rather than comparing to a fragile baseline.
5. test_discord_slash_commands (5): already fixed on current main.
Gateway executor work now inherits the active session contextvars via
copy_context() so background process watchers retain the correct
platform/chat/user/session metadata for routing completion events back
to the originating chat.
Cherry-picked from #10647 by @helix4u with:
- Use asyncio.get_running_loop() instead of deprecated get_event_loop()
- Strip trailing whitespace
- Add *args forwarding test
- Add exception propagation test
In Telegram forum-enabled groups, the General topic does not include
message_thread_id in incoming messages (it is None). This caused:
1. Messages in General losing thread context — replies went to wrong place
2. Typing indicator failing because thread_id=1 was rejected by Telegram
Fix: synthesize thread_id="1" for forum groups when message_thread_id
is None, then handle it correctly per operation:
- send: omit message_thread_id (Telegram rejects thread_id=1 for sends)
- typing: pass thread_id=1, retry without it on "thread not found"
Also centralizes thread_id extraction into _metadata_thread_id() across
all send methods (send, send_voice, send_image, send_document, send_video,
send_animation, send_photo), replacing ~10 duplicate patterns.
Salvaged from PR #7892 by @corazzione.
Closes#7877, closes#7519.
Pass platform_env_var="TELEGRAM_PROXY" to resolve_proxy_url() in both
telegram.py (main connect) and telegram_network.py (fallback transport),
so a Telegram-specific proxy takes priority over the generic HTTPS_PROXY.
Also bridge telegram.proxy_url from config.yaml to the TELEGRAM_PROXY
env var (env var takes precedence if both are set), add OPTIONAL_ENV_VARS
entry, docs, and tests.
Composite salvage of four community PRs:
- Core approach (both call sites): #9414 by @leeyang1990
- config.yaml bridging + docs: #6530 by @WhiteWorld
- Naming convention: #9074 by @brantzh6
- Earlier proxy work: #7786 by @ten-ltw
Closes#9414, closes#9074, closes#7786, closes#6530
Co-authored-by: WhiteWorld <WhiteWorld@users.noreply.github.com>
Co-authored-by: brantzh6 <brantzh6@users.noreply.github.com>
Co-authored-by: ten-ltw <ten-ltw@users.noreply.github.com>
The command preview and description were wrapped in Markdown v1 inline
code (backticks) without escaping, causing Telegram API parse errors
when the command itself contained backticks or asterisks.
Fixes: 'Can't parse entities: can't find end of the entity'
Telegram on iOS auto-converts double hyphens (--) to em dashes (—)
or en dashes (–) via autocorrect. This breaks /model flag parsing
since parse_model_flags() only recognizes literal '--provider' and
'--global'.
When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai')
gets treated as the model name and fails with 'Model names cannot
contain spaces.'
Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they
appear before flag keywords (provider, global), before flag extraction.
The existing test suite in test_model_switch_provider_routing.py
already covers all four dash variants — this commit adds the code
that makes them pass.
Replace inline Path.home() / '.hermes' / 'profiles' detection in both CLI
and gateway /profile handlers with the existing get_active_profile_name()
from hermes_cli.profiles — which already handles custom-root deployments,
standard profiles, and Docker layouts.
Fixes /profile incorrectly reporting 'default' when HERMES_HOME points to
a custom-root profile path like /opt/data/profiles/coder.
Based on PR #10484 by Xowiek.
Background review notifications ("💾 Skill created", "💾 Memory updated")
could race ahead of the main assistant reply in chat, making it look like
the agent stopped after creating a skill.
Gate bg-review notifications behind a threading.Event + pending queue.
Register a release callback on the adapter's _post_delivery_callbacks dict
so base.py's finally block fires it after the main response is delivered.
The queued-message path in _run_agent pops and calls the callback directly
to prevent double-fire.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Closes#10541
WecomCallbackAdapter declared a _seen_messages dict and
MESSAGE_DEDUP_TTL_SECONDS constant but never actually checked
them in _handle_callback(). WeCom retries callback deliveries
on timeout, and each retry with the same MsgId was treated as
a fresh message and queued for processing.
Fix: check _seen_messages before enqueuing. Uses the same TTL-
based pattern as MessageDeduplicator (fixed in #10306) — check
age before returning duplicate, prune on overflow.
Closes#10305
Extract resolve_channel_prompt() shared helper into
gateway/platforms/base.py. Refactor Discord to use it.
Wire channel_prompts into Telegram (groups + forum topics),
Slack (channels), and Mattermost (channels).
Config bridging now applies to all platforms (not just Discord).
Added channel_prompts defaults to telegram/slack/mattermost
config sections.
Docs added to all four platform pages with platform-specific
examples (topic inheritance for Telegram, channel IDs for Slack,
etc.).
- Remove double str() normalization in _resolve_channel_prompt since
config bridging already handles numeric YAML key conversion
- Remove dead prompts.get(str(key)) fallback that could never match
after keys were already normalized to strings
- Replace getattr(event, "channel_prompt", None) with direct attribute
access since channel_prompt is a declared dataclass field
- Update test to verify normalization responsibility lives in config bridging
_parse_session_key() blindly assigned parts[5] as thread_id for all
chat types. For group sessions with per-user isolation, parts[5] is
a user_id, not a thread_id. This could cause shutdown notifications
to route with incorrect thread metadata.
Only return thread_id for chat types where the 6th element is
unambiguous: dm and thread. For group/channel sessions, omit
thread_id since the suffix may be a user_id.
Based on the approach from PR #9938 by @Ruzzgar.
Commentary messages (interim assistant status updates like "Using browser
tool...") are sent via _send_commentary(), which was incorrectly setting
_already_sent = True on success. This caused the final response to be
suppressed when there were multiple tool calls, because the gateway checks
already_sent to decide whether to skip re-sending the response.
The fix: commentary messages are interim status updates, not the final
response, so _already_sent should not be set when they succeed. This
ensures the final response is always delivered regardless of how many
commentary messages were sent during the turn.
Fixes: #10454
After clear_session_vars() reset contextvars to their default (''),
get_session_env() treated the empty string as falsy and fell through
to os.environ — resurrecting stale HERMES_SESSION_* values from CLI
startup, cron, or previous sessions. This broke session isolation
in the gateway where concurrent messages could see each other's
stale environment values.
Fix: use a sentinel (_UNSET) as the contextvar default instead of ''.
get_session_env() now checks 'value is not _UNSET' instead of
truthiness. Three states are cleanly distinguished:
- _UNSET (never set): fall back to os.environ (CLI/cron compat)
- '' (explicitly cleared): return '' — no os.environ fallback
- 'telegram' (actively set): return the value
clear_session_vars() now uses var.set('') instead of var.reset(token)
to mark vars as explicitly cleared rather than reverting to _UNSET.
Closes#10304
When a model (e.g. mimo-v2-pro) streams intermediate text alongside tool
calls ("Let me search for that") but then returns empty after processing
tool results, the stream consumer already_sent flag is True from the
earlier text delivery. The gateway suppression check
(already_sent=True, failed=False → return None) would swallow the final
response, leaving the user staring at silence after the search.
Two changes:
1. gateway/run.py return path: skip already_sent suppression when the
final_response is "(empty)" or empty — the user needs to know the
agent finished even if streaming sent partial content earlier.
2. gateway/run.py response handler: convert the internal "(empty)"
sentinel to a user-friendly warning instead of delivering the raw
sentinel string.
Tests added for all empty/None/sentinel cases plus preserved existing
suppression behavior for normal non-empty responses.
Discord's _register_slash_commands() had a hardcoded list of ~27 commands
while COMMAND_REGISTRY defines 34+ gateway-available commands. Missing
commands (debug, branch, rollback, snapshot, profile, yolo, fast, reload,
commands) were invisible in Discord's / autocomplete — users couldn't
discover them.
Add a dynamic catch-all loop after the explicit registrations that
iterates COMMAND_REGISTRY, skips already-registered commands, and
auto-registers the rest using discord.app_commands.Command(). Commands
with args_hint get an optional string parameter; parameterless commands
get a simple callback.
This ensures any future commands added to COMMAND_REGISTRY automatically
appear on Discord without needing a manual entry in discord.py.
Telegram and Slack already derive dynamically from COMMAND_REGISTRY
via telegram_bot_commands() and slack_subcommand_map() — no changes
needed there.
- Pastes uploaded by /debug now auto-delete after 1 hour via a detached
background process that sends DELETE to paste.rs
- CLI: shows privacy notice listing what data will be uploaded
- Gateway: only uploads summary report (system info + log tails), NOT
full log files containing conversation content
- Added 'hermes debug delete <url>' for immediate manual deletion
- 16 new tests covering auto-delete scheduling, paste deletion, privacy
notices, and the delete subcommand
Addresses user privacy concern where /debug uploaded full conversation
logs to a public paste service with no warning or expiry.
Two gateway fixes:
1. MessageDeduplicator.is_duplicate() now checks TTL at query time (#10306)
Previously, is_duplicate() returned True for any previously seen ID
without checking its age — expired entries were only purged when cache
size exceeded max_size. On normal workloads that never overflow, message
IDs stayed deduplicated forever instead of expiring after the TTL.
Fix: check `now - timestamp < ttl` before returning True. Expired
entries are removed and treated as new messages.
2. Gateway --config flag now uses yaml.safe_load() (#10216)
The --config CLI flag in gateway/run.py main() used json.load() to
parse config files. YAML is the only documented config format and
every other config loader uses yaml.safe_load(). A YAML config file
passed via --config would crash with json.JSONDecodeError.
Closes#10306Closes#10216
_parse_session_key() now extracts the optional 6th part (thread_id) from
session keys, and _notify_active_sessions_of_shutdown uses _parsed.get()
instead of the removed 'parts' variable. Without this, shutdown notifications
silently failed (NameError caught by try/except) and forum topic routing
was lost.
- Populate watcher_* routing fields for watch-only processes (not just
notify_on_complete), so watch-pattern events carry direct metadata
instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
(empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Three fixes for the duplicate reply bug affecting all gateway platforms:
1. base.py: Suppress stale response when the session was interrupted by a
new message that hasn't been consumed yet. Checks both interrupt_event
and _pending_messages to avoid false positives. (#8221, #2483)
2. run.py (return path): Remove response_previewed guard from already_sent
check. Stream consumer's already_sent alone is authoritative — if
content was delivered via streaming, the duplicate send must be
suppressed regardless of the agent's response_previewed flag. (#8375)
3. run.py (queued-message path): Same fix — already_sent without
response_previewed now correctly marks the first response as already
streamed, preventing re-send before processing the queued message.
The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.
Concepts from PR #8380 (konsisumer). Closes#8375, #8221, #2483.
Three independent fixes:
1. Reset activity timestamp on cached agent reuse (#9051)
When the gateway reuses a cached AIAgent for a new turn, the
_last_activity_ts from the previous turn (possibly hours ago)
carried over. The inactivity timeout handler immediately saw
the agent as idle for hours and killed it.
Fix: reset _last_activity_ts, _last_activity_desc, and
_api_call_count when retrieving an agent from the cache.
2. Detect uv-managed virtual environments (#8620 sub-issue 1)
The systemd unit generator fell back to sys.executable (uv's
standalone Python) when running under 'uv run', because
sys.prefix == sys.base_prefix (uv doesn't set up traditional
venv activation). The generated ExecStart pointed to a Python
binary without site-packages, crashing the service on startup.
Fix: check VIRTUAL_ENV env var before falling back to
sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
doesn't reflect the venv.
3. Nudge model to continue after empty post-tool response (#9400)
Weaker models (GLM-5, mimo-v2-pro) sometimes return empty
responses after tool calls instead of continuing to the next
step. The agent silently abandoned the remaining work with
'(empty)' or used prior-turn fallback text.
Fix: when the model returns empty after tool calls AND there's
no prior-turn content to fall back on, inject a one-time user
nudge message telling the model to process the tool results and
continue. The flag resets after each successful tool round so it
can fire again on later rounds.
Test plan: 97 gateway + CLI tests pass, 9 venv detection tests pass
When a user sends a message while the agent is executing a task on the
gateway, the agent is now interrupted immediately — not silently queued.
Previously, messages were stored in _pending_messages with zero feedback
to the user, potentially leaving them waiting 1+ hours.
Root cause: Level 1 guard (base.py) intercepted all messages for active
sessions and returned with no response. Level 2 (gateway/run.py) which
calls agent.interrupt() was never reached.
Fix: Expand _handle_active_session_busy_message to handle the normal
(non-draining) case:
1. Call running_agent.interrupt(text) to abort in-flight tool calls
and exit the agent loop at the next check point
2. Store the message as pending so it becomes the next turn once the
interrupted run returns
3. Send a brief ack: 'Interrupting current task (10 min elapsed,
iteration 21/60, running: terminal). I'll respond shortly.'
4. Debounce acks to once per 30s to avoid spam on rapid messages
Reported by @Lonely__MH.