Address Copilot review on PR #16666:
1. **Duplicate event on every tool start** — both ``tool_progress_callback``
and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so
wiring both into chat completions emitted *two* ``hermes.tool.progress``
events per real tool call. Drop the legacy ``_on_tool_progress`` emit
entirely; ``_on_tool_start`` now produces a single unified event that
carries the legacy ``tool``/``emoji``/``label`` fields plus the new
``toolCallId``/``status`` correlation fields. Label is computed inline
via ``build_tool_preview`` so callers do not need to pre-format it.
2. **Weak per-event correlation in the regression test** — the previous
assertion checked that a ``toolCallId`` appeared *somewhere* in the
aggregate, which would have passed even if ``running`` lacked the id.
Collect ``(status, toolCallId)`` per event and assert each event
carries the correct pair, plus exactly two events on the wire (no
silent duplication regression).
The two existing chat-completions tool-progress tests are updated to fire
``tool_start_callback`` instead of ``tool_progress_callback``, matching
production reality where ``run_agent`` always pairs them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Vercel Sandbox as a supported Hermes terminal backend alongside
existing providers (Local, Docker, Modal, SSH, Daytona, Singularity).
Uses the Vercel Python SDK to create/manage cloud microVMs, supports
snapshot-based filesystem persistence keyed by task_id, and integrates
with the existing BaseEnvironment shell contract and FileSyncManager
for credential/skill syncing.
Based on #17127 by @scotttrinh, cherry-picked onto current main.
Adds two API server endpoints for external UIs and orchestrators:
- GET /v1/capabilities — machine-readable feature discovery so clients
can detect which Runs API / SSE / auth features this Hermes version
supports before depending on them.
- GET /v1/runs/{run_id} — pollable run status so dashboards can check
queued/running/completed/failed/cancelled/stopping state without
holding an SSE connection open.
Also moves request validation ahead of run allocation so invalid
payloads no longer leave orphaned entries in _run_streams waiting for
the TTL sweep.
task_id is intentionally kept as "default" for the Runs API to
preserve the shared-sandbox model used by CLI, gateway, and the
existing _run_agent_with_callbacks path. session_id is surfaced in
run status for external-UI correlation only.
Salvage of PR #17085 by @Magaav.
Regression test for the ret=-2 / errmsg='unknown error' disambiguation:
- ret=-2 or errcode=-2 with 'unknown error' → stale session (True)
- ret=-2 with 'freq limit' or other errmsg → rate limit (False)
- ret=-14 → not matched here (handled by SESSION_EXPIRED_ERRCODE path)
- Success codes and missing errmsg → False
Wrap each adapter.connect() in asyncio.wait_for() so one platform hanging
during startup or reconnect cannot block the others. Telegram's 8-retry
connect loop (~140s worst case) previously prevented Feishu from ever
starting when Telegram was network-restricted — common for users in
regions where Telegram is blocked.
Default timeout is 30s; override via HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT
(0 disables). Applied to both startup and the reconnect watcher so a
platform that hangs mid-retry also does not stall retries for others.
Fixes#17242
Fixes#6672
Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.
MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
parent_session_id='', reset=False, **kwargs) with no-op default for
backward compat. reset=True signals /reset or /new — providers should
flush accumulated per-session buffers. reset=False for /resume,
/branch, compression where the logical conversation continues.
MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
during shutdown paths.
run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
into sync_all() and queue_prefetch_all(). Providers with defensive
session_id updates in sync_turn (Hindsight already had this at
plugins/memory/hindsight/__init__.py:1199) now actually receive the
current id.
- Compression block at ~L8884 already notified the context engine of
the rollover; now also calls
_memory_manager.on_session_switch(reason='compression').
cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
parent session_id already captured for the DB parent link.
gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
/branch and /reset. The next message rebuilds a fresh agent whose
memory provider initialize() runs with the correct session_id —
matches the pattern the gateway already uses for provider state
cross-session transitions.
Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
updates _session_id, mints a fresh _document_id (prevents
vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
_turn_counter / _turn_index so in-flight batches don't flush under
the new document id. parent_session_id only overwritten when provided
(avoids clobbering on a bare switch).
Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
default no-op, manager fan-out, failure isolation, empty-id no-op,
session_id propagation through sync_all/queue_prefetch_all, Hindsight
state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
assertions to account for the session_id kwarg on sync_all and
queue_prefetch_all.
E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash
Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.
Three Signal adapter improvements that depend on the no-edit-mode
plumbing from the previous commit.
1. Native formatting (markdown -> Signal bodyRanges)
Signal renders markdown as literal characters (**bold**, `code`, #
heading), which looks broken. Added _markdown_to_signal(text) that
strips markdown syntax and emits Signal-native bodyRanges as
start:length:STYLE entries. Offsets are computed in UTF-16 code
units so non-BMP emoji stay aligned. Supports BOLD, ITALIC, STRIKE,
MONO, and headings mapped to BOLD. Fenced code and inline code are
handled; link syntax is unwrapped to visible text + URL.
Includes edge-case fixes reported previously:
- Bullet lists ("* item") no longer misidentified as italics
- URLs containing underscores no longer italicized around the dot
2. Reply-quote context
Parses dataMessage.quote on inbound messages and populates
MessageEvent.raw_message with sender + timestamp_ms. This lets the
gateway's existing [Replying to: "..."] injector (gateway/run.py)
work on Signal, matching Telegram/Matrix behavior.
3. Processing reactions
Overrides on_processing_start -> hourglass and on_processing_complete
-> checkmark via the sendReaction JSON-RPC using targetAuthor and
targetTimestamp pulled from raw_message. Uses the ProcessingOutcome
enum introduced in the previous commit.
Also sets SUPPORTS_MESSAGE_EDITING = False on SignalAdapter so the
no-edit streaming path activates.
Tests: 40+ new tests in tests/gateway/test_signal_format.py covering
markdown conversion, UTF-16 offset correctness with non-BMP emoji,
bullet-list and URL false-positive regressions, reply-quote extraction,
and reaction payload shape. Regression extensions to test_signal.py.
Commit 3c42064e made config.yaml the single source of truth for
TERMINAL_CWD, but the config bridge passes cwd values verbatim to
os.environ. When a user sets terminal.cwd: ~/ in config.yaml, the
literal string '~/'' reaches subprocess.Popen, which the kernel
rejects because it does not expand shell tilde syntax.
This patch adds three defensive layers:
1. gateway/run.py — expanduser at config bridge time so TERMINAL_CWD
is always an absolute path.
2. tools/terminal_tool.py — expanduser when reading TERMINAL_CWD in
_get_env_config(), guarding against stale or manually-set env vars.
3. tools/environments/local.py — expanduser in LocalEnvironment before
passing cwd to subprocess.Popen, the final safety net.
Includes regression tests in test_config_cwd_bridge.py for nested
terminal.cwd, top-level cwd alias, and precedence ordering.
Refs: 3c42064e
After PR #7885 (97b0cd51e) added content-side segment breaks for
natural mid-turn assistant messages, the tool-progress task in
gateway/run.py was not updated to match. progress_msg_id and
progress_lines persisted for the whole run, so after a tool batch
produced bubble B1 followed by content bubble C1, the next tool.started
kept editing the OLD bubble B1 above C1 — making the chat appear out
of order on Telegram, Discord, and Slack.
Add on_new_message callback to GatewayStreamConsumer, fired at the
four sites where a fresh content bubble lands on the platform:
- _send_or_edit first-send branch (NOT edits)
- _send_commentary
- _send_new_chunk (overflow split)
- each successful chunk of _send_fallback_final
Gateway supplies a lambda that enqueues ('__reset__',) into the
progress_queue. send_progress_messages() handles the marker in both
the main loop and the CancelledError drain path, clearing
progress_msg_id, progress_lines, and the dedup state so the next
tool.started opens a fresh bubble below the new content.
Result: each tool batch appears in chronological order below the
preceding content. When no content appears between tool batches,
tools still group in one bubble (CLI-style compactness).
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL
message of each turn, disabled by default (display.runtime_footer.enabled).
Answers the Telegram-side parity ask: runtime context that the CLI status
bar already shows is now available in messaging replies when enabled.
Wiring:
- gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer +
build_footer_line. Pure-function renderer; per-platform overrides under
display.platforms.<platform>.runtime_footer.
- gateway/run.py: appends footer to response right after reasoning prepend
so it lands only on the final message (never tool progress or streaming
chunks). When streaming already delivered the body (already_sent), the
footer is sent as a small trailing message instead.
- agent_result now exposes context_length alongside last_prompt_tokens so
the footer can compute the pct; both gateway return paths updated.
- /footer [on|off|status] slash command, wired in CLI (cli.py) and gateway
(gateway/run.py both running-agent bypass and main dispatch). Global
toggle only; per-platform overrides via config.yaml.
Graceful degradation:
- Missing context_length (unknown model) → pct field silently dropped
(no '?%' artifact).
- Empty final_response → no footer appended.
- Unknown field names in config → silently ignored.
Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E
harness covering streaming vs non-streaming branches, per-platform override,
and the exact argument contract gateway/run.py uses.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The gateway caches one AIAgent per session to preserve prompt-cache hits,
keyed by _agent_config_signature(). The signature previously only
fingerprinted model/credentials/toolsets/ephemeral-prompt — NOT the
compression or context_length config. As a result, users who edited
model.context_length or compression.threshold in config.yaml on a
long-lived gateway saw no effect until they triggered an unrelated
cache eviction (/model switch, /reset, gateway restart).
Add a new cache_keys parameter to _agent_config_signature and a
_CACHE_BUSTING_CONFIG_KEYS registry listing config values the agent
bakes in at construction time. Call sites read the current config and
pass it through — next gateway message with an edited config
rebuilds the agent.
Keys registered:
- model.context_length
- compression.enabled
- compression.threshold
- compression.target_ratio
- compression.protect_last_n
Reported by @OP (Apr 26 feedback bundle).
## Changes
- gateway/run.py: new _CACHE_BUSTING_CONFIG_KEYS tuple,
_extract_cache_busting_config classmethod, cache_keys kwarg on
_agent_config_signature, call site passes the extracted dict
- tests/gateway/test_agent_cache.py: 11 new tests
(5 on _agent_config_signature behavior, 6 on _extract_cache_busting_config)
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Network errors through proxies (e.g. sing-box) can leave httpx
connections in a half-closed state occupying pool slots. After enough
reconnect cycles the 256-connection default fills up entirely, causing
Pool timeout: All connections in the connection pool are occupied.
Fix: cycle only the getUpdates request object (_request[0]) via
shut-down + re-initialize before restarting polling. This drains stale
connections without touching the general request (_request[1]) that
concurrent send_message / edit_message calls rely on.
The drain is applied to both _handle_polling_network_error and
_handle_polling_conflict reconnect paths via a shared
_drain_polling_connections() helper. Failures in the drain are
swallowed so reconnect always proceeds.
Based on #16466 by @Mirac1eSky.
The gateway session-hygiene pre-compression safety valve had a hardcoded
400-message threshold. On long-lived sessions with short turns this was
either too high (users with aggressive compression preferences) or too
low (users with very large context models who want to keep more history
in-flight).
Add compression.hygiene_hard_message_limit (default 400) so it can be
tuned without forking the gateway.
Reported by @OP (Apr 26 feedback bundle).
## Changes
- hermes_cli/config.py: new DEFAULT_CONFIG key with 400 default
- gateway/run.py: read compression.hygiene_hard_message_limit at
hygiene-time, fall back to 400 if missing/invalid
- tests/gateway/test_session_hygiene.py: two tests — override fires at
the configured limit, default does not fire below 400
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Follow-up to PR #16802 (BeliefanX). The original fix read
`agent_history[-1].get("timestamp")` for the tool-tail freshness gate,
but `gateway/run.py` strips the `timestamp` field off all tool/tool_call
rows when building `agent_history` from the raw transcript (see
`clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`). At
runtime the tool-tail branch always saw `None` and silently took the
legacy-fresh path — the stale-guard never fired for the tool-tail case
it was supposed to cover.
Changes:
- Read the freshness signal from the RAW `history` list (via new
`_last_transcript_timestamp()` helper) BEFORE the strip. Both the
resume_pending branch and the tool-tail branch use this single signal,
replacing the two divergent ones.
- Default window bumped 15 min → 1 hour via new
`_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`. The 15-minute default was
shorter than the default `gateway_timeout` of 30 min, so a legitimate
long-running turn interrupted near its timeout boundary and resumed
shortly after would have been misclassified as stale.
- Configurable via `config.yaml` `agent.gateway_auto_continue_freshness`
(bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same
pattern as `gateway_timeout`). Set to 0 to disable the gate.
- `_coerce_gateway_timestamp` now explicitly rejects bool (which is a
subclass of int and would otherwise coerce to 0.0/1.0).
- Tests rewritten to exercise the real production data shape: raw
`history` → `_build_agent_history` strip → freshness decision. A
regression guard (`test_stale_tool_tail_with_production_data_shape`)
asserts `agent_history` tool rows carry NO timestamp, protecting
against someone "fixing" the original bug by re-adding the stripped
field (which would break the OpenAI tool-result message contract).
Add BeliefanX to scripts/release.py AUTHOR_MAP.
E2E verified: config.yaml → env var bridge → helper returns configured
value; default 1h window; malformed/empty env var falls back to default;
ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
MatrixAdapter._is_self_sender returns True defensively when _user_id is empty
(whoami not yet resolved) to prevent echo loops — see #15763. The reaction
approval test must therefore initialize a user_id so _on_reaction does not
drop the inbound test event before reaching the approval handler.
A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures.
Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields:
- _last_aux_model_failure_model: str | None
- _last_aux_model_failure_error: str | None
Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings.
Surface at three places:
- gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved)
- gateway /compress command: ℹ line appended to the reply
- CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam
Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.
PR #16333 added a warning to the manual /compress reply when the
auxiliary summariser fails and the static fallback placeholder is
used, but only the gateway-hygiene path had a test
(test_session_hygiene_warns_user_when_summary_generation_fails).
The /compress branch in _handle_compress_command was uncovered.
New test test_compress_command_appends_warning_when_summary_generation_fails
mocks the compressor's _last_summary_fallback_used /
_last_summary_dropped_count / _last_summary_error fields and
verifies the /compress reply contains the ⚠️ marker, the underlying
error string, the dropped message count, and the 'historical
message(s) were removed' wording — i.e. the same contract the
hygiene-path test enforces.
Address review feedback on PR #16333:
1. The hygiene-path warning send was missing metadata=_hyg_meta. On
Telegram topics / Slack threads / Discord threads the warning would
land in the main channel instead of the originating thread. Now
reuses the same _hyg_meta dict already computed for the hygiene
compaction itself.
2. New gateway-level test
test_session_hygiene_warns_user_when_summary_generation_fails
verifies end-to-end:
- When the compressor's _last_summary_fallback_used flag is True,
the gateway invokes adapter.send() exactly once.
- The warning message includes the dropped count and the underlying
error string.
- metadata={'thread_id': ...} is propagated so the warning lands
in the originating topic/thread.
Tests: 20 gateway hygiene + 54 context_compressor — all pass.
The typing-indicator refresh loop in BasePlatformAdapter._keep_typing
awaited each send_typing call unconditionally. Each call is an HTTP
round-trip to the platform API (Telegram/Discord), normally ~100ms. When
the same network instability that causes upstream provider timeouts
(e.g. Anthropic capacity blips slowing first-token latency past the
120s stream-read timeout) also slows the platform typing API to
multi-second response times, the refresh loop stalls inside the await.
Platform-side typing expires at ~5s, so the bubble dies and stays dead
until the stuck send_typing call returns — right when the user most
needs the 'still working' signal and instead sees a bot that looks
dead, then asks 'wtf are you doing' which itself interrupts the
eventually-recovering turn.
Bound each send_typing with asyncio.wait_for (1.5s cap, derived from
interval so it's always below the 2s cadence). Slow calls get abandoned
so the next scheduled tick fires a fresh send_typing on schedule. As
long as any one of them reaches the platform within its ~5s
typing-expiry window, the bubble stays visible across the stall.
Also catches non-timeout send_typing exceptions (transient HTTP errors)
so one bad tick doesn't terminate the whole loop.
Tests: 4 new in tests/gateway/test_keep_typing_timeout.py covering
slow-send non-blocking, fast-send still-awaited, exception resilience,
and paused-chat regression guard.
Reviewer pushback on the original boundary-hardening commits — three
overreach points pulled plugin-specific policy into shared core paths:
1. gateway/run.py hardcoded a '## Honcho Context' literal split for
vision-LLM output. Plugin-format heading in framework code; could
truncate legitimate output naturally containing that header.
Drop the literal split; keep generic sanitize_context (the wrapper
strip is plugin-agnostic). Plugin-specific cleanup belongs at the
provider boundary, not the shared gateway path.
2. run_agent.run_conversation scrubbed user_message and
persist_user_message before the conversation loop. User text is
sacred — if a user types a literal <memory-context> tag we must
not silently delete it. The producer (build_memory_context_block)
is the only legitimate emitter; user input should never need the
reverse op.
3. _build_assistant_message scrubbed model output before persistence.
Same hazard: would silently mutate legitimate documentation/code
the model emits containing the literal markers. The streaming
scrubber catches real leaks delta-by-delta before content is
concatenated; persist-time scrub was redundant belt-and-suspenders.
4. _fire_stream_delta stripped leading newlines from every delta unless
a paragraph break flag was set. Mid-stream '\n' is legitimate
markdown — lists, code fences, paragraph breaks — and chunk
boundaries are arbitrary. Narrow lstrip to the very first delta
of the stream only (so stale provider preamble still gets cleaned
on turn start, but mid-stream formatting survives).
Plus: build_memory_context_block now logs a warning when its defensive
sanitize_context strips something — surfaces buggy providers returning
pre-wrapped text instead of silently double-fencing.
Net architectural change: scrub surface collapses from 8 sites to 3
(StreamingContextScrubber on output deltas, plugin→backend send,
build_memory_context_block input-validation). Plugin-specific strings
stay out of shared runtime paths. User input and persisted assistant
output are no longer mutated.
Tests: rescoped TestMemoryContextSanitization (helper-correctness only,
no source-inspection of removed call sites), updated vision tests to
drop '## Honcho Context' literal-split assertions, updated
_build_assistant_message persistence test to assert preservation.
Added: cross-turn scrubber reset, build_memory_context_block warn-on-
violation, mid-stream newline preservation (plain + code fence).
fixes#5719
The auxiliary vision LLM called by gateway._enrich_message_with_vision
can echo its injected Honcho system prompt back into the image
description. That description gets embedded verbatim into the enriched
user message, so recalled memory (personal facts, dialectic output)
surfaces into a user-visible bubble.
Strips both forms of leak before embedding:
- <memory-context>...</memory-context> fenced blocks (sanitize_context)
- trailing '## Honcho Context' sections (header + everything after)
Plus regression tests:
- tests/agent/test_streaming_context_scrubber.py — 13 tests on the
stateful scrubber (whole block, split tags, false-positive partial
tags, unterminated span, reset, case-insensitivity)
- tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on
_fire_stream_delta covering the realistic 7-chunk leak scenario and
the cross-turn scrubber reset
- tests/gateway/test_vision_memory_leak.py — 4 tests covering the
vision auto-analysis boundary (clean pass-through, '## Honcho Context'
header, fenced block, both patterns together)
* fix: clean gateway auxiliary client caches on teardown
* fix(gateway): recover from stale pid files and close cron agents
Two issues were keeping the gateway from surviving long runs:
1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
refuses to unlink when the file's pid differs from our own. That
safety check exists for the --replace atexit handoff, but it also
applied to stale-record cleanup, so after a crashy exit the pid
file was orphaned: `write_pid_file()`'s O_EXCL create then failed
with `FileExistsError`, and systemd looped on "PID file race lost
to another gateway instance". Unlink unconditionally from this
helper since the caller has already verified the record is dead.
2. The cron scheduler never closed the ephemeral `AIAgent` it creates
per tick, and never swept the process-global auxiliary-client
cache. Over days of 10-minute ticks this leaked subprocesses and
async httpx transports until the gateway hit EMFILE. Release the
agent and call `cleanup_stale_async_clients()` in `run_job`'s
outer `finally`, matching the gateway's own per-turn cleanup.
* chore(release): map bloodcarter@gmail.com -> bloodcarter
---------
Co-authored-by: bloodcarter <bloodcarter@gmail.com>
``_cleanup_agent_resources`` previously invoked
``agent.shutdown_memory_provider()`` with no arguments, so every memory
provider's ``on_session_end`` hook received an empty list. Providers
with an early-return guard on empty input (Holographic, Hindsight) never
extracted facts from the conversation, and users hit
"抱歉,找不到相關的對話記錄" on the first turn after any gateway
restart, session reset, or idle expiry.
Forward ``agent._session_messages`` — the transcript the agent itself
maintains and refreshes every turn via ``_persist_session`` — so
providers see the actual conversation. Falls back to the legacy no-arg
call whenever the attribute is absent or not a list (test stubs built
via ``object.__new__`` or ``MagicMock``) to preserve backward
compatibility with existing suites. ``AIAgent.shutdown_memory_provider``
already accepts ``messages: list = None`` (run_agent.py:4126), so this
is a pure caller-side fix.
Paths that use ``skip_memory=True`` temporary agents (memory flush,
hygiene auto-compress, ``/compress``) are no-ops inside
``shutdown_memory_provider`` because ``self._memory_manager`` is None —
no behaviour change for them.
Covers Part A of the bug report. Part B (adding ``on_session_end`` to
the Hindsight plugin) is a separate concern that would benefit from
this fix landing first.
Regression test added at
``tests/gateway/test_shutdown_memory_provider_messages.py`` covering:
populated messages forwarded, empty list still forwarded, attribute
missing falls back, non-list (MagicMock) falls back, provider
exceptions don't block ``close()``, None agent no-op, and agent
without ``shutdown_memory_provider`` tolerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Telegram groups emit a single bot_command entity covering the whole
/cmd@botname span with no accompanying mention entity, so the existing
mention gate in _message_mentions_bot dropped slash commands sent via
the bot-menu autocomplete whenever require_mention is enabled.
Recognise bot_command entities whose @botname suffix matches the bot
username (case-insensitive) as a direct mention, and keep rejecting
commands addressed at other bots. Fixes#15415.
Harden the Matrix adapter's sender-drop guards so bot-self events and
appservice/bridge identities never reach the gateway's pairing flow or
the agent loop.
Two filters, applied as early as possible in _on_room_message (and
_on_reaction for the self-filter):
1. _is_self_sender(sender) — case-insensitive + whitespace-trimmed
equality with self._user_id. When self._user_id is still empty
(whoami has not resolved, or login failed), returns True
defensively: an unidentified bot dropping its own events is always
preferable to falling into an echo loop. The previous byte-for-byte
equality check let differently-cased copies of the bot's MXID slip
through, and an unresolved self-ID silently disabled the guard.
2. _is_system_or_bridge_sender(sender) — drops appservice namespace
puppets (conventional @_bridge_...:server form) and malformed
senders with an empty localpart. These identities used to fall
through to the gateway's unauthorized-user path, trigger a pairing
code, and — once an operator approved the bridge — every outbound
message the bridge relayed would loop back as an authorized user
message. This was the root of the 'hall of mirrors' symptom.
Fixes#15763
Test plan
---------
scripts/run_tests.sh tests/gateway/test_matrix.py
scripts/run_tests.sh tests/gateway/test_matrix_mention.py tests/gateway/test_matrix_voice.py
All 182 tests pass. 14 new regression tests cover exact / case-insensitive
/ whitespace / unresolved-self-id matches, bridge prefix detection, empty
sender, and the full _on_room_message drop path.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.
Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.
Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).
- Built-in slack default: off; other tier-2 platforms unchanged.
- Adjust /verbose isolation test for off to new cycle.
- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.
Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.
Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
(behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
auto_skill to MessageEvent. gateway/run.py already handles auto-skill
injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
covering DM/thread/parent resolution, single-vs-list skills, dedup,
malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.
Config example:
slack:
channel_skill_bindings:
- id: "D0ATH9TQ0G6"
skills: ["german-flashcards"]
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.
Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
agent.steer(text) on the UI thread (thread-safe), with automatic
fallback to 'queue' when the agent is missing, steer() is unavailable,
images are attached, or steer() rejects the payload. /busy accepts
'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
path both route through running_agent.steer() when the mode is 'steer',
with the same fallback-to-queue safety net. Ack wording tells users
their message was steered into the current run. Restart-drain queueing
now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
_load_busy_input_mode accepts steer; busy-session ack exercises
steer success + two fallback-to-queue branches.
Requested on X by @CodingAcct.
Default is unchanged (interrupt).
Multiple overlapping Slack attachment improvements:
1. Upload retry with backoff on transient errors (429, 5xx, connection
reset, rate_limited, service unavailable). New _is_retryable_upload_error
helper covers three upload paths: _upload_file, send_video,
send_document. Up to 3 attempts with 1.5s * attempt backoff.
2. Thread participation tracking: successful file uploads now add the
thread_ts to _bot_message_ts, mirroring how text replies are tracked.
This lets follow-up thread messages auto-trigger the bot (same
engagement rules as replied threads).
3. Thread metadata preservation in the image redirect-guard fallback
(send_image → send text fallback) and in two gateway.run.py send
paths (image + document fallback calls).
4. HTML response rejection in _download_slack_file_bytes. Parallels
the existing check in _download_slack_file. Guards against Slack
returning a sign-in / redirect page as document bytes when scopes
are missing, so the agent doesn't get HTML-as-a-PDF.
5. File lifecycle event acks (file_shared / file_created / file_change).
These events arrive around snippet uploads. Acking them silences the
slack_bolt 'Unhandled request' 404 warnings without changing behavior.
6. Post-loop message type classification so a mixed image+document upload
classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT.
Previously, the per-file classification in the inbound loop could be
overwritten unpredictably.
7. Expanded text-inject whitelist in inbound document handling to cover
.csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so
snippets and config files are directly visible to the agent, not just
cached as opaque uploads. Paired with new MIME entries in
SUPPORTED_DOCUMENT_TYPES in base.py.
Squashed from two commits in #11819 so the single commit carries the
contributor's GitHub attribution (the original commits were authored
under a local dev hostname).