Adds opt-in auto-deletion for slash-command reply messages like
"New session started!", "Restarting gateway…", "Stopped.", and
YOLO toggles. After the TTL elapses the gateway calls the adapter's
delete_message; on platforms without a delete API (everything except
Telegram today) the TTL is silently ignored and the message stays.
Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful
real-time, but system notices clutter the thread once the agent finishes.
Implementation:
- EphemeralReply(str) sentinel in gateway/platforms/base.py. Subclasses
str so existing 'X' in response / response.startswith(...) checks in
tests and call sites keep working unchanged; isinstance() still
distinguishes it for the send path.
- _process_message_background and both busy-session bypass paths
(in base.py) call _unwrap_ephemeral() on the handler return, send
the unwrapped text, and schedule a detached delete task when the
TTL > 0 AND the adapter class overrides delete_message.
- display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG.
Handler can pass ttl_seconds explicitly to override.
- Wrapped the highest-noise return sites: /new, /reset, /stop,
/yolo on/off, /restart success + "already in progress". Draining
notices and /help output left as plain strings — those are
informational and users want to read them.
Backward-compat: default TTL 0 → no scheduling, no behavior change
for existing users. Platforms without delete_message silently no-op.
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.
Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
firing on 10K+ tokens of real pressure, confusing users about why
compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
next should_compress() check compares real usage against a stale
underestimate — compression triggers late, potentially past the model's
context limit on small-context models (#14695).
Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.
Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after
Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
— no agent instance is in scope to query for system prompt/tools, and the
existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
system prompt via api_messages[0].
Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.
Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).
Closes#14695Closes#6217
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.
In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).
Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.
Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.
Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.
Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).
Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
Widens #16528 to two sibling sites that had the same quoted-boolean
bug: a YAML string "false" (or "0", "no", "off") silently evaluated
truthy under bool() / if-check.
- gateway/run.py _load_show_reasoning: is_truthy_value wrap
- tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap
- regression tests for both
Add two operator-facing toggles for inbound Feishu admission, enabling
bot-to-bot scenarios such as A2A orchestration and inter-bot
notifications:
FEISHU_ALLOW_BOTS=none|mentions|all (default: none)
Accept messages from other bots. `mentions` requires the peer
bot to @-mention Hermes; `all` admits every peer-bot message.
FEISHU_REQUIRE_MENTION=true|false (default: true)
Whether group messages must @-mention the bot. Override per-chat
via `group_rules.<chat_id>.require_mention` in config.yaml.
Defaults preserve prior behavior. Self-echo protection is always on:
when the bot's identity is unresolved (auto-detection failed and
FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed
to avoid feedback loops.
Admitted peer bots bypass the human-user allowlist
(FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans
still need an explicit allowlist entry. yaml feishu.allow_bots is
bridged to the env var so the adapter and gateway auth layer share
one source of truth.
Resolving peer-bot display names requires the
application:bot.basic_info:read scope; without it, peers still route
but appear as their open_id.
Test: tests/gateway/test_feishu_bot_admission.py covers the admission
pipeline, group-policy bot-bypass, hydration, and event-dispatch
plumbing as a parametrized matrix.
Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256
reset_session() creates a fresh SessionEntry with created_at == updated_at,
but get_or_create_session() bumps updated_at on the next inbound message,
causing _is_new_session in _handle_message_with_agent to evaluate False.
The topic/channel skill auto-load gate (group_topics, channel_skill_bindings)
silently skips the first message after a manual reset.
Add an is_fresh_reset flag on SessionEntry, set by reset_session() and
consumed once by the message handler. Kept distinct from was_auto_reset
because that flag also drives a 'session expired due to inactivity'
user-facing notice and a context-note prepend — both wrong for an
explicit /new or /reset.
Persisted through to_dict/from_dict so the flag survives gateway
restart between /reset and the next message.
Fixes#6508
Co-authored-by: warabe1122 <45554392+warabe1122@users.noreply.github.com>
Co-authored-by: willy-scr <187001140+willy-scr@users.noreply.github.com>
/status was reading session_entry.total_tokens from the in-memory
SessionStore (gateway/session.py), which the agent never writes to —
so the token count was always 0.
The agent already persists token deltas to the SQLite SessionDB
(run_agent.py:11497) for every platform with a session_id. Route
/status through that single source of truth instead of duplicating
token writes into a second store.
Fix:
- gateway/run.py: _handle_status_command now calls
self._session_db.get_session(session_id) and sums the five token
component columns (input/output/cache_read/cache_write/reasoning).
Falls back to 0 when no SessionDB is configured or no row exists.
- Two new regression tests covering the populated-row and
missing-row paths.
Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
sqlite3 can only bind str/bytes/int/float/None to query parameters.
Multimodal message content is a list of parts (text + image_url), which
raised 'Error binding parameter 3: type list is not supported' in
append_message and replace_messages.
In the CLI/TUI this surfaced as a visible crash when users pasted
screenshots. In the gateway it was silently swallowed by a bare except
in append_to_transcript, causing multimodal turns to be lost from the
session transcript.
Fix at the DB layer: _encode_content wraps lists/dicts as
'\\x00json:' + json.dumps(...) on write, _decode_content unwraps on
read. Plain strings are untouched, so existing FTS search, previews,
and JSONL compat are unaffected. Paired decode in get_messages,
get_messages_as_conversation, and search_messages context previews.
Regression test covers: list content round-trip, dict content
round-trip, string content stored unchanged, replace_messages with
multimodal content.
Also included: aligned fix#17522 for TUI image attachment with
paths containing spaces (see previous commit).
Signal-cli sends dataMessage wrappers for profile key updates and other
metadata events that have no actual text content. These were reaching the
gateway as msg='' and triggering full agent turns for nothing.
Add early return in _handle_envelope() when both message field is empty/
missing/whitespace AND there are no attachments. Messages with media
attachments but no text still flow through.
- 12 lines added to gateway/platforms/signal.py
- 5 new tests in TestSignalContentlessEnvelope class
Belt-and-suspenders on top of @briandevans' #17758 fix. The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly. Pin each invariant so a
future refactor can't silently regress them:
1. Normal single-message path still releases _active_sessions[sk] and
_session_tasks[sk] through end-of-finally. The #17758 follow-up
moved _release_session_guard under
if current_task is self._session_tasks.get(session_key)
For the 99%-common case current_task IS the stored task, so the
guard must still fire. Test would fail if the conditional were
ever tightened in a way that dropped the normal path.
2. Drain-task cancellation releases the session. If the drain task
spawned by the in-band hand-off is cancelled mid-handler (e.g.
/stop fired while draining a follow-up), its own finally must
fire _release_session_guard. Without this a cancel would leave
the session permanently pinned busy.
3. Late-arrival drain still spawns when no in-band drain preceded
it. Pre-existing path, but the #17758 follow-up added a
re-queue branch that only fires when ownership was already
handed off. When no handoff happened the else branch must still
spawn a fresh drain task — otherwise a message arriving during
stop_typing gets silently dropped.
All three tests pass against current main. Zero production code
changes.
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping. That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.
Split the failure classification in ``GatewayRunner._run_agent``:
* Context-overflow (``compression_exhausted`` flag, explicit
context-length phrases, or generic 400 with a long history) → keep
the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
conversation survives a retry.
Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.
Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().
This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.
Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.
Closes#17775
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.
Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
GIFs peeled off and routed through send_animation (albums don't support
animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
over); URL images downloaded into BytesIO so they render inline; forum
channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
SMTP size governs); remote URLs remain linked in body (parity with
existing send_image)
All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.
Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.
Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.
Co-authored-by: Maxence Groine <maxence@groine.fr>
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.
Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.
Also implements batching + rate-limit handling in the `send_message`
tool.
New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block. The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.
A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.
Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key]. When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it. This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.
Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).
Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same. Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges. Post-fix, the entry is preserved and the drain
task reuses the original Event.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`. Each
chained follow-up added a frame to the call stack instead of starting
fresh. Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.
Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind. The new task takes
ownership via `_session_tasks[session_key]`.
The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.
Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry. Pre-fix: depths grow linearly `[1,2,3,…,12]`. Post-fix: all
depths are `1`.
`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.
- Add should_send_media_as_audio(platform, ext, is_voice) in
gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
_TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
end-to-end MEDIA routing via _process_message_background and
GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
fallback for FLAC/WAV.
Co-authored-by: Versun <me+github7604@versun.org>
Fixes the xdist collision that broke CI on PR #17764, and structurally
prevents future plugin-adapter tests from reintroducing it.
Problem
-------
tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py
(already on main) both followed the same anti-pattern:
sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>'))
from adapter import <Adapter>
Every platform plugin ships its own adapter.py, so the bare
'from adapter import ...' races for sys.modules['adapter']. Whichever test
collected first in a given xdist worker won; the other crashed at
collection with ImportError, and the polluted sys.path cascaded into 19
unrelated test failures across tools/, hermes_cli/, and run_agent/ in the
same worker.
Fix
---
1. tests/gateway/_plugin_adapter_loader.py (new): shared helper
load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py
via importlib.util under the unique module name plugin_adapter_<name>.
Zero sys.path mutation, no possibility of collision.
2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py:
migrated to the helper. All 'from adapter import ...' statements
(including the ones inside test methods) are replaced with module-level
attribute access on the loaded module.
3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans
every test_*.py under tests/gateway/ at session start and fails the
run with a pointer to the helper if any test uses sys.path.insert into
plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'.
Runs on the xdist controller only (skipped in workers). The next plugin
adapter test that tries to reintroduce this pattern gets rejected at
collection time with a clear remediation message.
4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to
AUTHOR_MAP so the check-attribution workflow passes.
Validation
----------
scripts/run_tests.sh tests/gateway/ 4194 passed
scripts/run_tests.sh tests/gateway/test_{teams,irc}* 72 passed (both orderings)
scripts/run_tests.sh <11 prev-failing test files> 398 passed
Guard triggers correctly on both Path-operator and string-literal forms
of the anti-pattern.
Hello! I am the maintainer of the microsoft-teams-apps Python SDK and
I built this Teams adapter to integrate Microsoft Teams into Hermes.
Adds a `plugins/platforms/teams` platform plugin using the new
PlatformRegistry system from #17751. The adapter self-registers via
`register(ctx)` — no hardcoding in run.py, toolsets.py, or any
other core file.
Key features:
- Supports personal DMs, group chats, and channel posts
- Adaptive Card approval prompts with in-place button replacement
(Allow Once / Allow Session / Always Allow / Deny)
- aiohttp webhook server bridged from the Teams SDK to avoid
the fastapi/uvicorn dependency
- ConversationReference caching for correct proactive sends in
non-DM chats
- `interactive_setup()` for `hermes gateway setup` integration
- `platform_hint` for LLM context (Teams markdown subset)
- 34 tests covering adapter init, send, message handling, and
plugin registration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(gateway): refine Platform._missing_ and platform-connected dispatch
Restricts plugin-name acceptance to bundled plugin scan + registry
(no arbitrary string -> enum-pollution), pulls per-platform connectivity
checks into a _PLATFORM_CONNECTED_CHECKERS lambda map with a clean
_is_platform_connected method, and adds tests covering the checker map,
plugin platform interface, and IRC setup wizard.
Extends the platform plugin interface from Phase 1 to cover every
touchpoint where built-in platforms have hardcoded behavior.
- allowed_users_env / allow_all_env: per-platform auth env vars
- max_message_length: smart-chunking for send_message tool
- pii_safe: session PII redaction flag
- emoji: CLI/gateway display
- allow_update_command: /update access control
send_message tool (tools/send_message_tool.py):
- Replaced hardcoded platform_map dict with Platform() call
- Added _send_via_adapter() for plugin platforms — routes through
live gateway adapter when available
- Registry-aware max message length for smart chunking
Cron delivery (cron/scheduler.py):
- Replaced hardcoded 15-entry platform_map with Platform() call
- Plugin platforms now work as cron delivery targets
User authorization (gateway/run.py _is_user_authorized):
- Registry fallback: checks PlatformEntry.allowed_users_env and
allow_all_env when platform not in hardcoded maps
- Plugin platforms get per-platform auth support
_UPDATE_ALLOWED_PLATFORMS: checks registry allow_update_command flag
Channel directory: includes plugin platforms in session enumeration
Orphaned config warning: descriptive message when plugin platform is
in config but no plugin registered it
Gateway weakref: _gateway_runner_ref for cross-module adapter access
hermes status: shows plugin platforms with (plugin) tag
hermes gateway setup: plugin platforms appear in menu with setup hints
hermes_cli/platforms.py: get_all_platforms() merges with registry,
platform_label() falls back to registry for plugin names
- 8 new tests (extended fields, cron resolution, platforms merge)
- Updated 3 tests for new Platform() based resolution
- 2829 passed, 24 pre-existing failures, zero new failures
Adds a platform adapter plugin interface so anyone can create new gateway
platforms (IRC, Viber, Line, etc.) as drop-in plugins without modifying
core gateway code.
- PlatformEntry dataclass: name, label, adapter_factory, check_fn,
validate_config, required_env, install_hint, source
- PlatformRegistry singleton with register/unregister/create_adapter
- _create_adapter() in gateway/run.py checks registry first, falls
through to existing if/elif chain for built-in platforms
- Platform._missing_() accepts unknown string values, creating cached
pseudo-members so Platform('irc') is Platform('irc') holds true
- GatewayConfig.from_dict() now parses plugin platform names from
config.yaml without rejecting them
- get_connected_platforms() delegates to registry for unknown platforms
- PluginContext.register_platform() for plugin authors
- Mirrors the existing register_tool() / register_hook() pattern
- Full async IRC adapter using stdlib asyncio (zero external deps)
- Connects via TLS, handles PING/PONG, nick collision, NickServ auth
- Channel messages require addressing (nick: msg), DMs always dispatch
- Markdown stripping for IRC-clean output, message splitting for
512-byte line limit
- Config via config.yaml extra dict or IRC_* env vars
- Platform enum dynamic members (identity stability, case normalization)
- PlatformRegistry (register, unregister, create, validation, factory)
- GatewayConfig integration (from_dict parsing, get_connected_platforms)
- IRC adapter (init, send, protocol parsing, markdown, requirements)
No existing platform adapters were migrated — the if/elif chain is
untouched. This is Phase 1: prove the interface with a real plugin.
PR #15027 (5 days ago) shipped TELEGRAM_GROUP_ALLOWED_USERS as a chat-ID
allowlist. #17686 correctly renames that to sender user IDs and moves
chat IDs to TELEGRAM_GROUP_ALLOWED_CHATS. Without a shim, any user on
PR #15027's guidance would silently start rejecting group traffic on
upgrade.
- gateway/run.py: in _is_user_authorized, if TELEGRAM_GROUP_ALLOWED_USERS
contains values starting with '-' (chat-ID-shaped), honor them as chat
IDs and log a one-shot deprecation warning pointing users at the new
TELEGRAM_GROUP_ALLOWED_CHATS var.
- tests/gateway/test_unauthorized_dm_behavior.py: three new tests cover
legacy chat-ID values authorizing the listed chat, not crossing to
other chats, and mixed sender/chat values in the same var.
- website/docs/user-guide/messaging/telegram.md: rewrite the Group
Allowlisting section to document the new user/chat split + migration
note. Remove stale '/thread_id' suffix claim (code never parsed it).
- website/docs/reference/environment-variables.md: document all three
Telegram allowlist env vars.
Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to
match the design: user-initiated rescan, no prompt-cache reset, no new
schema surface, no phantom user turn, and the next-turn note carries each
added/removed skill's 60-char description (not just its name).
Changes vs the original PR:
* Drop the in-process skills prompt-cache clear in reload_skills(). Skills
are invoked at runtime via /skill-name, skills_list, or skill_view —
they don't need to live in the system prompt for the model to use them.
Keeping the cache intact preserves prefix caching across the reload so
/reload-skills pays no cache-reset cost. (MCP has to break the cache
because tool schemas must be known at conversation start; skills do not.)
* Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from
tools/skills_tool.py, plus the four skills_reload enumerations in
toolsets.py. No new schema surface — agents can already see a freshly-
installed skill via skill_view / skills_list the moment it's on disk.
* Replace the phantom 'role: user' turn injection with a one-shot queued
note. CLI uses self._pending_skills_reload_note (same pattern as
_pending_model_switch_note, prepended to the next API call and cleared).
Gateway uses self._pending_skills_reload_notes[session_key]. The note
is prepended to the NEXT real user message in this session, so message
alternation stays intact and nothing out-of-band is persisted to the
transcript.
* reload_skills() now returns added/removed as
[{'name': str, 'description': str}, ...] (description truncated to 60
chars — matches the curator / gateway adapter budget). The injected
next-turn note formats each entry as 'name — description' so the model
can actually reason about which new skills to call without running
skills_list first.
* Only emit the note when the diff is non-empty. On empty diff, print
'No new skills detected' and do nothing else.
* Tests rewritten to cover the queue semantics, the description payload,
and a regression guard that the prompt-cache snapshot is preserved.
Adds a public reload path for the in-process skill caches so newly
installed (or removed) skills become visible mid-session without a
gateway restart. Mirrors the shape of /reload-mcp.
Three surfaces:
* /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py),
with /reload_skills alias for Telegram autocomplete and an explicit
Discord registration.
* skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents
pick up freshly-installed skills via tool call.
* agent.skill_commands.reload_skills() — shared helper that clears
_skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the
on-disk .skills_prompt_snapshot.json, then returns an added/removed
diff plus the new total count.
Tested:
* tests/agent/test_skill_commands_reload.py (9 cases)
* tests/cli/test_cli_reload_skills.py (3 cases)
* tests/gateway/test_reload_skills_command.py (4 cases)
Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop
skills into ~/.hermes/skills mid-session, plus agentic flows where the
agent itself installs a skill via the shell tool and needs it bound
without a gateway restart. The Python helper
clear_skills_system_prompt_cache(clear_snapshot=True) already exists
internally — this PR just exposes it via slash command and tool.
CI Tests workflow has been red on main for 40+ consecutive runs. This
commit recovers every failure visible in run 25130722163 (most recent
completed run prior to this PR).
Root causes, by group:
Test-mock drift after product landed (fix: update mocks)
- test_mcp_structured_content / test_mcp_dynamic_discovery (6 tests):
product added _rpc_lock (#02ae15222) and _schedule_tools_refresh
(#1350d12b0) without updating sibling test files. Install a real
asyncio.Lock inside the fake run-loop and patch at _schedule_tools_refresh.
- test_session.py: renamed normalize_whatsapp_identifier → canonical_
whatsapp_identifier upstream; keep a local alias so the legacy tests
keep working.
- test_run_progress_topics Slack DM test: PR #8006 made Slack default
tool_progress=off; explicitly set it to 'all' in the test fixture so
the progress-callback path still runs. Also read tool_progress_callback
at call time rather than freezing it in FakeAgent.__init__ — production
assigns it AFTER construction.
- test_tui_gateway_server session-create/close race: session.create now
defers _start_agent_build behind a 50ms timer — wait for the build
thread to enter _make_agent before closing, otherwise the orphan-
cleanup path never runs.
- test_protocol session.resume: product get_messages_as_conversation now
takes include_ancestors kwarg; accept **_kwargs in the test stub.
- test_copilot_acp_client redaction: redactor is OFF by default (snapshots
HERMES_REDACT_SECRETS at import); patch agent.redact._REDACT_ENABLED=True
for the duration of the test.
- test_minimax_provider: after #17171, dots in non-Anthropic model names
stay dots even with preserve_dots=False. Assert the new invariant
rather than the old 'broken for MiniMax' behavior.
- test_update_autostash: updater now scans `ps -A` for dashboard PIDs;
the test's catch-all subprocess.run stub needed stdout/stderr fields.
- test_accretion_caps: read_timestamps dict is populated lazily when
os.path.getmtime succeeds. Use .get("read_timestamps", {}) to tolerate
CI filesystems where the stat races file creation.
Change-detector tests (fix: rewrite as structural invariants)
- test_credential_sources_registry_has_expected_steps: was a frozen set
comparison that broke when minimax-oauth was added. Rewrite as an
invariant check (every step has description, no dupes, core steps
present) per AGENTS.md 'don't write change-detector tests'.
xdist ordering / test pollution (fix: reset state, use module-local patches)
- test_setup vercel: sibling test saved VERCEL_PROJECT_ID='project' to
os.environ via save_env_value() and never cleared it. monkeypatch.delenv
the VERCEL_* vars in the link-file test.
- test_clipboard TestIsWsl: GitHub Actions is on Azure VMs whose real
/proc/version often contains 'microsoft'. Patching builtins.open with
mock_open didn't reliably intercept hermes_constants.is_wsl's call in
xdist workers that had already cached _wsl_detected=True from an
earlier test. Patch hermes_constants.open directly and add
teardown_method to reset the cache after each test.
Pytest-asyncio cancellation hangs (fix: bound product await with timeout)
- test_session_split_brain_11016 (3 params) + test_gateway_shutdown
cancel-inflight: under pytest-asyncio 1.3.0, 'await task' and
'asyncio.gather(cancelled_tasks)' can stall for 30s when the cancelled
task's finally block awaits typing-task cleanup. Bound both with
asyncio.wait_for(..., timeout=5.0) and asyncio.shield — the stragglers
are released from adapter tracking and allowed to finish unwinding in
the background. This is also a legitimate hardening: a wedged finally
shouldn't stall the caller's dispatch or a gateway shutdown.
Orphan UI config (fix: merge tiny tab into messaging category)
- test_web_server test_no_single_field_categories: the telegram.reactions
config field lived in its own 'telegram' schema category with no
siblings. Fold it under 'discord' via _CATEGORY_MERGE so the dashboard
doesn't render an orphan single-field tab.
Local verification: 38/38 originally-failing tests pass; 4044/4044
gateway tests pass; 684/684 targeted subset (all 16 touched test files)
passes.
Address Copilot review on PR #16666:
1. **Duplicate event on every tool start** — both ``tool_progress_callback``
and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so
wiring both into chat completions emitted *two* ``hermes.tool.progress``
events per real tool call. Drop the legacy ``_on_tool_progress`` emit
entirely; ``_on_tool_start`` now produces a single unified event that
carries the legacy ``tool``/``emoji``/``label`` fields plus the new
``toolCallId``/``status`` correlation fields. Label is computed inline
via ``build_tool_preview`` so callers do not need to pre-format it.
2. **Weak per-event correlation in the regression test** — the previous
assertion checked that a ``toolCallId`` appeared *somewhere* in the
aggregate, which would have passed even if ``running`` lacked the id.
Collect ``(status, toolCallId)`` per event and assert each event
carries the correct pair, plus exactly two events on the wire (no
silent duplication regression).
The two existing chat-completions tool-progress tests are updated to fire
``tool_start_callback`` instead of ``tool_progress_callback``, matching
production reality where ``run_agent`` always pairs them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Vercel Sandbox as a supported Hermes terminal backend alongside
existing providers (Local, Docker, Modal, SSH, Daytona, Singularity).
Uses the Vercel Python SDK to create/manage cloud microVMs, supports
snapshot-based filesystem persistence keyed by task_id, and integrates
with the existing BaseEnvironment shell contract and FileSyncManager
for credential/skill syncing.
Based on #17127 by @scotttrinh, cherry-picked onto current main.
Adds two API server endpoints for external UIs and orchestrators:
- GET /v1/capabilities — machine-readable feature discovery so clients
can detect which Runs API / SSE / auth features this Hermes version
supports before depending on them.
- GET /v1/runs/{run_id} — pollable run status so dashboards can check
queued/running/completed/failed/cancelled/stopping state without
holding an SSE connection open.
Also moves request validation ahead of run allocation so invalid
payloads no longer leave orphaned entries in _run_streams waiting for
the TTL sweep.
task_id is intentionally kept as "default" for the Runs API to
preserve the shared-sandbox model used by CLI, gateway, and the
existing _run_agent_with_callbacks path. session_id is surfaced in
run status for external-UI correlation only.
Salvage of PR #17085 by @Magaav.
Regression test for the ret=-2 / errmsg='unknown error' disambiguation:
- ret=-2 or errcode=-2 with 'unknown error' → stale session (True)
- ret=-2 with 'freq limit' or other errmsg → rate limit (False)
- ret=-14 → not matched here (handled by SESSION_EXPIRED_ERRCODE path)
- Success codes and missing errmsg → False
Wrap each adapter.connect() in asyncio.wait_for() so one platform hanging
during startup or reconnect cannot block the others. Telegram's 8-retry
connect loop (~140s worst case) previously prevented Feishu from ever
starting when Telegram was network-restricted — common for users in
regions where Telegram is blocked.
Default timeout is 30s; override via HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT
(0 disables). Applied to both startup and the reconnect watcher so a
platform that hangs mid-retry also does not stall retries for others.
Fixes#17242
Fixes#6672
Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.
MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
parent_session_id='', reset=False, **kwargs) with no-op default for
backward compat. reset=True signals /reset or /new — providers should
flush accumulated per-session buffers. reset=False for /resume,
/branch, compression where the logical conversation continues.
MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
during shutdown paths.
run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
into sync_all() and queue_prefetch_all(). Providers with defensive
session_id updates in sync_turn (Hindsight already had this at
plugins/memory/hindsight/__init__.py:1199) now actually receive the
current id.
- Compression block at ~L8884 already notified the context engine of
the rollover; now also calls
_memory_manager.on_session_switch(reason='compression').
cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
parent session_id already captured for the DB parent link.
gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
/branch and /reset. The next message rebuilds a fresh agent whose
memory provider initialize() runs with the correct session_id —
matches the pattern the gateway already uses for provider state
cross-session transitions.
Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
updates _session_id, mints a fresh _document_id (prevents
vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
_turn_counter / _turn_index so in-flight batches don't flush under
the new document id. parent_session_id only overwritten when provided
(avoids clobbering on a bare switch).
Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
default no-op, manager fan-out, failure isolation, empty-id no-op,
session_id propagation through sync_all/queue_prefetch_all, Hindsight
state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
assertions to account for the session_id kwarg on sync_all and
queue_prefetch_all.
E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash
Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.
Three Signal adapter improvements that depend on the no-edit-mode
plumbing from the previous commit.
1. Native formatting (markdown -> Signal bodyRanges)
Signal renders markdown as literal characters (**bold**, `code`, #
heading), which looks broken. Added _markdown_to_signal(text) that
strips markdown syntax and emits Signal-native bodyRanges as
start:length:STYLE entries. Offsets are computed in UTF-16 code
units so non-BMP emoji stay aligned. Supports BOLD, ITALIC, STRIKE,
MONO, and headings mapped to BOLD. Fenced code and inline code are
handled; link syntax is unwrapped to visible text + URL.
Includes edge-case fixes reported previously:
- Bullet lists ("* item") no longer misidentified as italics
- URLs containing underscores no longer italicized around the dot
2. Reply-quote context
Parses dataMessage.quote on inbound messages and populates
MessageEvent.raw_message with sender + timestamp_ms. This lets the
gateway's existing [Replying to: "..."] injector (gateway/run.py)
work on Signal, matching Telegram/Matrix behavior.
3. Processing reactions
Overrides on_processing_start -> hourglass and on_processing_complete
-> checkmark via the sendReaction JSON-RPC using targetAuthor and
targetTimestamp pulled from raw_message. Uses the ProcessingOutcome
enum introduced in the previous commit.
Also sets SUPPORTS_MESSAGE_EDITING = False on SignalAdapter so the
no-edit streaming path activates.
Tests: 40+ new tests in tests/gateway/test_signal_format.py covering
markdown conversion, UTF-16 offset correctness with non-BMP emoji,
bullet-list and URL false-positive regressions, reply-quote extraction,
and reaction payload shape. Regression extensions to test_signal.py.
Commit 3c42064e made config.yaml the single source of truth for
TERMINAL_CWD, but the config bridge passes cwd values verbatim to
os.environ. When a user sets terminal.cwd: ~/ in config.yaml, the
literal string '~/'' reaches subprocess.Popen, which the kernel
rejects because it does not expand shell tilde syntax.
This patch adds three defensive layers:
1. gateway/run.py — expanduser at config bridge time so TERMINAL_CWD
is always an absolute path.
2. tools/terminal_tool.py — expanduser when reading TERMINAL_CWD in
_get_env_config(), guarding against stale or manually-set env vars.
3. tools/environments/local.py — expanduser in LocalEnvironment before
passing cwd to subprocess.Popen, the final safety net.
Includes regression tests in test_config_cwd_bridge.py for nested
terminal.cwd, top-level cwd alias, and precedence ordering.
Refs: 3c42064e