The May 5 refactor in d5357f816 made _message_thread_id_for_typing()
symmetric with _message_thread_id_for_send() by mapping the General
topic (thread id "1") to None upfront for both. That's correct for
sendMessage — Telegram rejects message_thread_id=1 on sends and the
topic must be omitted — but it's wrong for sendChatAction.
Observed behavior (confirmed via before/after Telegram wire traces):
Before d5357f816: thread_id=1 → message_thread_id=1 → bubble visible in General
After d5357f816: thread_id=1 → message_thread_id=None → no visible typing
Omitting message_thread_id on sendChatAction does NOT fall back to
the General topic's view in a forum-enabled supergroup; the bubble
ends up hidden from the client's General-topic pane entirely. For
any user on a forum-group, the typing indicator stopped appearing.
Fix: drop the symmetric "1 → None" mapping from the typing resolver.
sendMessage still maps 1 → None via _message_thread_id_for_send (that
side was never broken). The asymmetry is real and required by
Telegram's API — document it in the resolver docstring.
Partial revert of d5357f816; restores the behavior from 0cf7d570e
("fix(telegram): restore typing indicator and thread routing for
forum General topic"). Does not re-introduce the retry-without-thread
fallback that 41545f7ec scoped down for DM topics — with the resolver
fixed, the first call already hits the right wire shape.
Test updated from test_send_typing_general_topic_uses_none_thread_id
(which encoded the broken contract) to
test_send_typing_preserves_general_topic_thread_id, asserting the
single correct call with message_thread_id=1. 10 other tests in the
file untouched and passing.
Makes the in-tree QQ inline keyboards actually light up when the agent
blocks on a dangerous-command approval. Matches the cross-adapter
gateway contract already implemented by Discord, Telegram, Slack,
Matrix, and Feishu.
Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval
and falls back to a text prompt when it's missing. Without this wiring,
QQ users stared at plain '/approve' text even though the adapter shipped
button primitives.
### send_exec_approval(chat_id, command, session_key, description, metadata)
Matches the signature the gateway calls with. Builds an ApprovalRequest
(command_preview, description, timeout) and delegates to send_approval_request.
Uses the last inbound msg_id as reply_to so QQ accepts the passive
message. The 'metadata' parameter is accepted for contract parity but
intentionally unused — QQ doesn't have thread_id/DM-targeting overrides.
### send_update_prompt(chat_id, prompt, default, session_key, metadata)
Signature updated to match the cross-adapter contract used by
'hermes update --gateway' watcher. Renders a 'Update Needs Your Input'
prompt with the optional default hint and a Yes/No keyboard. Replaces
the earlier 3-arg helper that wasn't wired anywhere.
### Default interaction dispatcher
_default_interaction_dispatch() auto-registered as the adapter's
interaction callback in __init__. Routes:
- approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval
Button → choice mapping:
allow-once → 'once'
allow-always → 'always'
deny → 'deny'
(QQ's 3-button mobile layout deliberately collapses 'session' + 'always'
into one button; /approve session text fallback remains available.)
- update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response
(the detached 'hermes update --gateway' watcher polls this file)
- anything else → logged and dropped
Resolve exceptions are caught and logged — never propagate into the WS
loop. Callers can override via set_interaction_callback() to route
clicks elsewhere or pass None to drop them entirely.
### Net effect
QQ users now get native tap-to-approve UX on dangerous-command prompts
and update-confirmation prompts, without having to type /approve or /deny
as text. The adapter hooks into tools.approval the same way every other
button-capable platform does.
### Tests
14 new tests cover:
- Default callback installed on __init__
- send_exec_approval / send_update_prompt exist as class methods (so the
gateway's type-probe detects them)
- allow-once/always/deny each map to the correct resolve choice
- update_prompt:y / update_prompt:n each write atomically to the response
file (via monkeypatched get_hermes_home)
- Unknown button_data / empty button_data / resolve exceptions are harmless
- send_exec_approval honours last_msg_id reply-to and accepts metadata
- send_update_prompt delegates with correct content + keyboard
Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc).
Also ran tools/test_approval.py alongside — no regressions (276 passed
combined).
Co-authored-by: WideLee <limkuan24@gmail.com>
Follow-up to the previous commit:
- Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1,
ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is
treated as non-loopback since unset usually means public default bind.
- Fix mixed-indent comment in the safety rail (comment now aligned with
the if-block) and collapse the nested-if into one condition.
- Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN
IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level
parametrized coverage of _is_loopback_host for spellings we can't bind
in the hermetic test env (::1, ip6-localhost, ip6-loopback).
- Pin test_connect_starts_server + test_webhook_deliver_only defaults
to 127.0.0.1 so they keep passing under the new rail.
- Document the behavior in website/docs/user-guide/messaging/webhooks.md.
Following PR #21306 which added the new generic plugin-platform hooks,
update the three platform-authoring docs so plugin authors find them:
- website/docs/developer-guide/adding-platform-adapters.md: expand the
'What the Plugin System Handles Automatically' table with env-only
auto-enable + cron delivery + hermes-config UI entries rows. Add
three new sections — 'Env-Driven Auto-Configuration', 'Cron
Delivery', 'Surfacing Env Vars in hermes config' — covering the
hook signatures, plugin.yaml rich-dict format, and the
home_channel-key special case. Update the main register() example
to pass env_enablement_fn + cron_deliver_env_var inline so readers
see them on their first pass. Upgrade the PLUGIN.yaml snippet to
show bare-string + rich-dict + optional_env.
- website/docs/guides/build-a-hermes-plugin.md: the thin platform
example in the build-a-plugin tour now includes env_enablement_fn
and cron_deliver_env_var, plus an optional_env block in the inline
plugin.yaml. Keeps pointing to the developer-guide page for the
full treatment.
- gateway/platforms/ADDING_A_PLATFORM.md: the in-repo reference
shallow-points at the docsite but now names the three new hooks
explicitly so contributors reading the source tree know what
they're for. Also adds teams + google_chat as reference
implementations alongside irc.
When a user replies while quoting another message, QQ sets
'message_type = 103' and pushes the referenced message's content +
attachments inside 'msg_elements[0]'. The old adapter ignored
msg_elements entirely, so:
- Bare quote-replies (no user text) surfaced nothing to the LLM.
- Quoted images/files/voice were never downloaded or described.
- Quoted voice messages specifically produced no transcript — the model
had no way to see what the user was referring to when saying 'about
this voice note…'.
This commit adds _process_quoted_context(d) which extracts msg_elements,
unions their attachments, and runs them through the SAME
_process_attachments pipeline as the main message body. Quoted voice
gets an STT transcript (tried via QQ's asr_refer_text first, then the
configured STT provider); quoted images get cached just like main-body
images; quoted files surface with their original filename intact (not
the CDN URL hash).
The quoted content is prepended to the user's text as a '[Quoted message]:'
block so the LLM sees the full referential context on one turn.
Images-only quotes surface a '[Quoted message]: (image)' marker so the
model knows an image was referenced even if no text came with it.
All four inbound handlers (_handle_c2c_message, _handle_group_message,
_handle_guild_message, _handle_dm_message) now call the helper uniformly
— one merge pattern, not four divergent implementations.
Filename preservation is carried by _process_attachments' existing
'[Attachment: {filename or ct}]' line; nothing else needed for that.
12 new tests under TestProcessQuotedContext and TestMergeQuoteInto cover:
- Non-quote messages short-circuit to empty
- message_type=103 with no msg_elements is harmless
- Text-only quotes render with '[Quoted message]:' prefix
- Voice attachments in the quote flow through STT
- File attachments in the quote preserve the original filename
- Image attachments surface cached paths + media types
- Images-only quote still emits a marker
- Multiple msg_elements are concatenated
- Malformed message_type values return empty
- _merge_quote_into prepends with a blank-line separator
Full qqbot suite: 130 passed (72 existing + 19 chunked + 27 keyboards
+ 12 quoted).
Co-authored-by: WideLee <limkuan24@gmail.com>
The QQ Bot v2 API supports inline keyboards on outbound messages. When a
user taps a button, the platform dispatches an INTERACTION_CREATE
gateway event; the bot ACKs it via PUT /interactions/{id} and decodes
the button's data payload to route the click.
This commit adds:
New module gateway/platforms/qqbot/keyboards.py
- Inline-keyboard dataclasses (InlineKeyboard, KeyboardRow, KeyboardButton,
KeyboardButtonAction, KeyboardButtonRenderData, KeyboardButtonPermission)
that serialize to the JSON shape the QQ API expects.
- build_approval_keyboard(session_key) — 3-button layout:
✅ 允许一次 / ⭐ 始终允许 / ❌ 拒绝, all sharing group_id='approval'
so clicking one greys out the rest.
- build_update_prompt_keyboard() — Yes/No keyboard for update confirms.
- parse_approval_button_data() / parse_update_prompt_button_data() —
decode the button_data payload from INTERACTION_CREATE.
approve:<session_key>:<decision> (decision = allow-once|allow-always|deny)
update_prompt:<answer> (answer = y|n)
- build_approval_text(ApprovalRequest) — markdown renderer for the
surrounding message body (exec-approval and plugin-approval variants,
with severity icons 🔴/🔵/🟡).
- parse_interaction_event(raw) → InteractionEvent dataclass — normalizes
the nested raw payload (id / scene / openids / button_data / etc.).
Adapter changes (gateway/platforms/qqbot/adapter.py)
- _dispatch_payload routes INTERACTION_CREATE → _on_interaction.
- _on_interaction parses the event, ACKs via PUT /interactions/{id}, then
invokes a user-registered interaction callback. Exceptions from the
callback are caught and logged (never propagate into the WS loop).
- set_interaction_callback(cb) lets gateway wiring register a routing
handler that inspects button_data and resolves the corresponding
pending approval / update prompt.
- _send_c2c_text / _send_group_text now accept an optional keyboard kwarg
and append it to the outbound body.
- send_with_keyboard(chat_id, content, keyboard, reply_to=None) — public
helper that sends a single short message with a keyboard attached.
Does NOT chunk-split (a keyboard message has one interactive surface).
Guild chats are rejected non-retryably — they don't support keyboards.
- send_approval_request(chat_id, ApprovalRequest, reply_to=None) +
send_update_prompt(chat_id, content, reply_to=None) — convenience
wrappers over send_with_keyboard.
Tests
27 new unit tests under TestApprovalButtonData, TestUpdatePromptButtonData,
TestBuildApprovalKeyboard, TestBuildUpdatePromptKeyboard, TestBuildApprovalText,
TestInteractionEventParsing, and TestAdapterInteractionDispatch. Cover:
- Button-data round-trip (build → parse returns original session/decision)
- Keyboard JSON shape + mutual-exclusion group_id
- Exec vs plugin approval text templates + severity icons
- Interaction event parsing (c2c / group / guild scene codes)
- _on_interaction end-to-end: ACK invoked, callback receives parsed event,
callback exceptions are swallowed, missing id skips ACK, no registered
callback is harmless.
Full qqbot suite: 118 passed (72 existing + 19 chunked + 27 keyboards).
Co-authored-by: WideLee <limkuan24@gmail.com>
The v2 'single POST /v2/{users|groups}/{id}/files' upload path is capped
at ~10 MB inline (base64 'file_data' or 'url'). For larger files the QQ
platform provides a three-step flow:
1. POST /upload_prepare → upload_id + pre-signed COS part URLs
2. PUT each part to its COS URL → POST /upload_part_finish
3. POST /files with {upload_id} → file_info token
This commit adds a new gateway/platforms/qqbot/chunked_upload.py module
that implements the flow, wires it into QQAdapter._send_media for local
files (URL uploads keep the existing inline path), and introduces
structured exceptions so the caller can surface actionable error text:
- UploadDailyLimitExceededError (biz_code 40093002, non-retryable)
- UploadFileTooLargeError (file exceeds the platform limit)
Both carry file_name / file_size_human / limit_human so the model can
compose user-friendly replies instead of seeing opaque HTTP codes.
The part_finish 40093001 retryable-error loop respects the server-
provided retry_timeout (capped at 10 minutes locally) with a 1 s
polling interval. COS PUTs retry transient failures up to 2 times
with exponential backoff. complete_upload retries up to 2 times.
Covers files up to the platform's ~100 MB per-file limit; before this
the adapter silently rejected anything over ~10 MB.
19 new unit tests under TestChunkedUpload* cover the happy path,
prepare-response parsing, helper functions, part retries, COS PUT
retries, group vs c2c routing, and the structured-error mapping.
Co-authored-by: WideLee <limkuan24@gmail.com>
PairingStore.approve_code() didn't consult _is_locked_out(), so after
MAX_FAILED_ATTEMPTS bad approvals the lockout flag was set but a valid
code still got accepted — any pending code (legitimately issued or
attacker-obtained) could be approved during the 1-hour lockout window,
nullifying the brute-force protection.
- gateway/pairing.py: lockout check runs in approve_code() right after
_cleanup_expired, before the pending lookup. Returns None on lockout.
- tests/gateway/test_pairing.py: test_lockout_blocks_code_approval pins
the regression — reporter's exact reproducer (generate valid code,
exhaust attempts with WRONGCODE, try to approve valid code) must
return None and leave is_approved == False. Also pins recovery: once
lockout expires, the still-pending code approves normally.
- hermes_cli/pairing.py: _cmd_approve distinguishes the two None cases.
On lockout, prints 'Platform locked out... clears in N minutes. To
reset sooner, delete the _lockout:<platform> entry from
_rate_limits.json' instead of the misleading 'Code not found or
expired' message. 29/29 pairing tests pass; E2E-verified with
reporter's exact Python reproducer.
Widen the platform-plugin surface so plugins can self-configure from env
vars and opt into cron home-channel delivery without editing core files.
Closes the scope gap that forced every new platform (Google Chat, Teams,
IRC, future) to either touch gateway/config.py, cron/scheduler.py, and
hermes_cli/config.py or live without env-only setup.
Changes:
- gateway/platform_registry.py: two new optional PlatformEntry fields.
- env_enablement_fn: () -> Optional[dict]. Called during
_apply_env_overrides BEFORE the adapter is constructed. Returned
dict fields are merged into PlatformConfig.extra; the special
'home_channel' key (if present) becomes a proper HomeChannel
dataclass on the PlatformConfig.
- cron_deliver_env_var: name of the *_HOME_CHANNEL env var. When set,
the plugin platform is a valid cron deliver= target and cron reads
the env var to resolve the default chat/room ID.
- gateway/config.py: the existing plugin-platform enable pass at the
bottom of _apply_env_overrides now calls env_enablement_fn and seeds
extras/home_channel. No effect on plugins that don't set the new
field.
- cron/scheduler.py: _is_known_delivery_platform and
_resolve_home_env_var fall through to the registry when the platform
isn't in the hardcoded built-in sets. New _iter_home_target_platforms
helper iterates built-ins + plugin platforms for the deliver=origin
fallback.
- gateway/run.py: _home_target_env_var now consults the new resolver so
plugin-defined home channels work for non-cron call sites too.
- hermes_cli/config.py: new _inject_platform_plugin_env_vars() sibling
of _inject_profile_env_vars(). Scans plugins/platforms/*/plugin.yaml
at import time and contributes entries to OPTIONAL_ENV_VARS so
'hermes config' UI discovers them. Supports bare-string and rich-dict
requires_env entries plus a new optional_env list for non-required
vars (home channels, allowlists).
All additions are strictly opt-in. Existing plugins (IRC, Teams,
image_gen, memory) see zero behavior change until they adopt the new
fields.
Mirrors the Slack `allowed_channels` feature (PR #7401) and Discord's
`allowed_channels` (PR #7044) across the remaining group-capable platforms.
All five platforms (Slack + Discord + the four added here) now follow the
same pattern: primary config via config.yaml, env-var fallback as an escape
hatch — matching the project policy that .env is for secrets only and
behavioral settings belong in config.yaml.
Also fixes a duplicate `slack` key in DEFAULT_CONFIG introduced by PR
#7401 (the later entry silently overwrote `allowed_channels`, `require_mention`,
and `free_response_channels` at dict-literal evaluation time).
Platforms added:
- Telegram: `telegram.allowed_chats` (env alias: `TELEGRAM_ALLOWED_CHATS`)
- Mattermost: `mattermost.allowed_channels` (env alias: `MATTERMOST_ALLOWED_CHANNELS`)
- Matrix: `matrix.allowed_rooms` (env alias: `MATRIX_ALLOWED_ROOMS`)
- DingTalk: `dingtalk.allowed_chats` (env alias: `DINGTALK_ALLOWED_CHATS`)
Mattermost and Matrix previously had NO config.yaml bridging for any of
their gating settings; this PR adds `load_gateway_config` bridges for them
(Mattermost gets require_mention + free_response_channels + allowed_channels;
Matrix gets allowed_rooms on top of its existing bridges for require_mention
and free_response_rooms).
Semantics identical everywhere:
- Empty = no restriction (fully backward compatible).
- Non-empty = hard whitelist: non-listed chats are silently ignored,
even when the bot is @mentioned.
- DMs bypass the check entirely.
DEFAULT_CONFIG merges the duplicate `slack` block and adds new `mattermost`
and `matrix` blocks so all gating settings surface in defaults.
Not included: Feishu (has its own per-chat `chat_rules` system that covers
this use case differently), WhatsApp (already has `group_allow_from` via
`group_policy: allowlist`), pure-DM platforms (Signal, SMS, BlueBubbles,
Yuanbao — no group concept).
Extracts the three try/write_runtime_status/except-log blocks into a
shared _write_runtime_status_safe() helper. On failure, logs the first
occurrence per (platform, context) at warning level and downgrades
subsequent failures to debug — so a persistently broken status dir
(permissions, ENOSPC) doesn't spam the log on every Telegram reconnect.
Uses getattr for the _status_write_logged set so test harnesses that
skip __init__ (object.__new__(Adapter)) don't break.
Follow-up to the salvaged #21158.
Per repo policy, ~/.hermes/.env is for secrets only. Guild IDs are
behavioral configuration, not secrets. Replacing the
DISCORD_DM_ROLE_AUTH_GUILD env var from the original fix with
discord.dm_role_auth_guild in config.yaml.
- New module-level _read_dm_role_auth_guild() helper reads
hermes_cli.config.read_raw_config()['discord']['dm_role_auth_guild'].
Fails closed on any parse error (safe default = DM role-auth off).
- DEFAULT_CONFIG['discord'] gains dm_role_auth_guild: '' with a comment
documenting the opt-in.
- Tests patch hermes_cli.config.read_raw_config directly (via the
_set_dm_role_auth_guild helper) instead of setenv/delenv. 12 tests
in test_discord_roles_dm_scope pass; no env var involvement.
- Docstring + module docstring + comments updated to reference
discord.dm_role_auth_guild.
- E2E verified with real imports across 6 scenarios: unset, int,
string, garbage, zero, and (crucially) env-var-only-no-config all
return None except the valid int/string cases. Env var has zero
effect — policy compliance confirmed.
Sibling-site fix: _evaluate_slash_authorization was the fourth
_is_allowed_user caller and didn't pass guild/is_dm through, so slash
interactions would take the DM branch regardless of whether they came
from a guild channel. Now reads interaction.guild + in_dm and forwards.
Also updates test_discord_slash_auth fixture (_make_interaction) so
the SimpleNamespace guild mock has a get_member(uid)->None method —
required by the new guild-scoped fallback path in _is_allowed_user.
Tests exercising positive role paths still work via user.roles.
Three new regression tests in test_discord_roles_dm_scope:
- Slash DM + role in mutual public guild → rejected
- Slash in guild B + role only in guild A → rejected
- Slash in guild B + role in guild B → allowed (positive control)
368 Discord tests pass. test_discord_free_channel_skips_auto_thread
also fails on clean main (pre-existing, unrelated to this fix).
The initial DISCORD_ALLOWED_ROLES implementation (#11608, merged from #9873)
scans every mutual guild when resolving a user's roles. This allows a
cross-guild DM bypass:
1. Bot is in both public server A and private server B.
2. User holds the allowed role in server A only.
3. User DMs the bot. The role check finds the role in A and authorizes the
DM, granting access as if the user were trusted in server B.
Fix:
- DMs (no guild context) disable role-based auth by default. Opt-in via
DISCORD_DM_ROLE_AUTH_GUILD=<guild_id> restricts role lookup to one
explicitly-trusted guild.
- Guild messages check roles only in the originating guild
(message.guild), never in other mutual guilds.
- Reject cached author.roles when the Member came from a different guild
than the current message.
Backwards compatibility:
- DISCORD_ALLOWED_USERS behavior is unchanged (still works in both DMs
and guild messages).
- Deployments that rely on roles in guild channels continue to work;
role checks are now strictly scoped to that guild.
- Deployments that intentionally want role-based DM auth can opt into a
single trusted guild via DISCORD_DM_ROLE_AUTH_GUILD.
Tests: 9 new regression guards in
tests/gateway/test_discord_roles_dm_scope.py covering the bypass path,
the opt-in path, cross-guild guild-message bypass, and backwards-compat
user-ID paths. 47/47 discord-auth tests pass.
Refs: #11608 (initial implementation), #7871 (feature request),
#9873 (PR author credit @0xyg3n)
- Add PID file mechanism to track bridge processes and kill stale ones on startup
- Improve _kill_port_process() with lsof fallback when fuser is not available
- Support explicit WhatsApp disable via config.yaml (whatsapp.enabled: false)
- Respect WHATSAPP_ENABLED=false env var to disable WhatsApp
Fixes#19124
When a kanban worker subprocess exits rc=0 but its task is still in
status='running', the agent almost certainly answered the task
conversationally without calling kanban_complete or kanban_block. The
dispatcher used to classify this as a generic crash and respawn, which
loops forever on small local models (gemma4-e2b q4 etc.) that keep
returning clean but unproductive output.
Dispatcher changes:
- The waitpid reap loop at the top of dispatch_once now records each
reaped child's raw exit status in a bounded module registry
(_recent_worker_exits, TTL 600s, size cap 4096).
- _classify_worker_exit distinguishes clean_exit / nonzero_exit /
signaled / unknown using os.WIFEXITED / WIFSIGNALED.
- detect_crashed_workers consults the classification when a worker
is found dead. clean_exit → protocol_violation event + immediate
circuit-breaker trip (failure_limit=1). Everything else keeps the
existing crashed-event + counter behavior.
- DispatchResult.auto_blocked now includes protocol-violation trips.
Gateway fix (Bug A in #20894):
- gateway.run._notify_active_sessions_of_shutdown snapshots
self.adapters with list(...) before iterating. adapter.send() can
hit a fatal-error path that pops the adapter from the dict, which
was raising 'RuntimeError: dictionary changed size during iteration'
during shutdown.
Regression tests:
- test_detect_crashed_workers_protocol_violation_auto_blocks verifies
rc=0 + still-running → status=blocked on first occurrence with
protocol_violation + gave_up events and NO crashed event.
- test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies
non-zero exits keep the existing 2-strike behavior.
Closes#20894.
Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.
This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.
Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.
Verified the targeted use case: info-graph emits
信息图已生成(...)
[[as_document]]
MEDIA:/tmp/info-graph-x/infographic.jpg
→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).
Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up on top of Zyproth's session-source cache: swap the unbounded
dict for an OrderedDict with a 512-entry LRU cap so long-running
gateways can't accumulate stale entries for dead sessions forever.
- self._session_sources is now an OrderedDict
- _cache_session_source() move_to_end + popitem(last=False) above cap
- _get_cached_session_source() move_to_end on hit (LRU read bump)
- restart_test_helpers.py wires OrderedDict + _session_sources_max
Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor
already wired into send_message_tool, logs, and tool output actually runs
on a fresh install.
- agent/redact.py: env-var default "" → "true"
- hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True;
two config-template comments rewritten
- gateway/run.py + cli.py: startup log / banner warning when the user
has explicitly opted out, so the downgrade is visible in agent.log
and at CLI banner time
- docs/reference/environment-variables.md: description reconciled
- tests: flipped the default-pin, restructured the force=True
regression test to explicit-false instead of unset
Users who need raw credential values (redactor development) can still
opt out via security.redact_secrets: false in config.yaml or
HERMES_REDACT_SECRETS=false in .env.
Closes#17691.
Addresses #20785 (short-term output-pipeline recommendation).
aiohttp ClientTimeout uses BaseTimerContext which calls
loop.call_later() internally. When invoked via
asyncio.run_coroutine_threadsafe() from cron jobs, this
triggers "Timeout context manager should be used inside a task"
errors, causing message delivery failures.
Replace all direct ClientTimeout usage with asyncio.wait_for():
- _upload_ciphertext: CDN upload (120s timeout)
- _download_bytes: CDN download (configurable timeout)
- _download_remote_media: remote media fetch (30s timeout)
Also set total=None on _send_session to disable aiohttp built-in
timeout, and change trust_env=True to False to bypass proxy for
WeChat CDN connections.
Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape,
wider coverage.
Changes:
- Drop the synthetic '[System note: ...]' in the internal MessageEvent.
The existing _is_resume_pending branch in _handle_message_with_agent
(run.py ~L13738) already injects a reason-aware recovery system note
on the next turn. With kyan's text in place the model saw two stacked
system notes. Now the event text is empty and the existing injection
path owns the wording.
- Drop SessionStore.list_resume_pending() as a new public method. The
filter is 8 lines inline in _schedule_resume_pending_sessions() —
one caller, no other pluggability need.
- Add 'restart_interrupted' to the auto-resume reason set. That's the
reason SessionStore.suspend_recently_active() stamps on sessions
recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker).
Previously those sessions had to wait for a real user message to
auto-resume; now they continue automatically at startup like
drain-timeout interruptions do.
- Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so
future reasons (e.g. 'manual_resume_request') can be opted in with
one line.
Test coverage added:
- drain-timeout + crash-recovery both scheduled
- stale entries skipped (outside freshness window)
- suspended entries skipped (suspended > resume_pending)
- originless entries skipped (no routing target)
- disallowed reasons skipped (graceful forward-compat)
E2E verified end-to-end with a real on-disk SessionStore: 2 eligible
sessions scheduled, 2 ineligible skipped, empty-text internal events
delivered to the adapter.
Co-authored-by: Kevin Yan <kevyan1998@gmail.com>
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.
Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.
Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>
When terminal.backend is docker, inbound documents uploaded via messaging
platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host
path under ~/.hermes/cache/documents, but the container sandbox only sees them
at the auto-mounted /root/.hermes/cache/documents path.
This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the
natural sibling to get_cache_directory_mounts()) and calls it at the
document-context-injection site in gateway/run.py so the agent always receives
a path it can open directly, matching the mount layout already established
by get_cache_directory_mounts() (#4846).
Scope: only Docker backend for now; other backends use different mount
semantics and are left unchanged until verified.
Fixes#18787
Two follow-ups on top of helix4u's slash-command sync hardening:
- Only suppress exceptions that are actually Discord 429 rate limits
(discord.RateLimited, HTTPException with status 429, or a clearly
rate-limit-named duck type). Arbitrary failures that happen to expose
a retry_after attribute now re-raise to the outer handler instead of
silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
stops collecting ad-hoc runtime files.
Added a test verifying unrelated exceptions don't get misclassified as
rate limits.
Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.
Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.
The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove dead metadata.get('reply_to') fallback in _send_raw_message;
nothing in the codebase ever sets 'reply_to' inside a metadata dict —
the key only appears as a top-level send_voice() keyword argument
- Simplify _status_thread_metadata construction in run.py to use a
single dict literal instead of create-then-mutate pattern; the
or-{} guard was dead since source.thread_id implies _progress_thread_id
is also set for Feishu
- Add yuqian@zmetasoft.com to AUTHOR_MAP for contributor attribution
Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.