Commit graph

733 commits

Author SHA1 Message Date
JC的AI分身
80b386a472 fix(feishu): refresh bot identity during hydration 2026-05-05 06:04:20 -07:00
Teknium
314361733f test(api_server): _run_agent result now carries session_id for #16938 2026-05-05 06:01:03 -07:00
vominh1919
7f735b4db2 fix: return effective session_id after context compression (#16938)
When context compression rotates the agent's session_id to a new
child session, the API server was still returning the stale parent
session_id in the X-Hermes-Session-Id response header.

This caused external clients to keep sending the old session_id,
loading uncompressed parent history instead of the compressed
continuation.

Fix: _run_agent() now includes the effective session_id in its
result dict, and the response header uses it instead of the
original provided session_id.
2026-05-05 06:01:03 -07:00
Teknium
fe8560fc12
feat(api-server): X-Hermes-Session-Key header for long-term memory scoping (#20199)
* feat(api-server): X-Hermes-Session-Key header for long-term memory scoping

API Server integrations (Open WebUI, custom web UIs) can now pass a stable
per-channel identifier via X-Hermes-Session-Key that scopes long-term memory
(Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id.
This matches the native gateway's session_key / session_id split: one stable
key per assistant channel, many independent transcripts that rotate on /new.

- _create_agent and _run_agent accept gateway_session_key and pass it to
  AIAgent(gateway_session_key=...), which is already honored by the Honcho
  memory provider (plugins/memory/honcho/client.py resolve_session_name).
- New shared helper _parse_session_key_header applies the same API-key
  gate, control-character sanitization, and a 256-char length cap as the
  existing session-id header.
- All three agent endpoints honor the header: /v1/chat/completions,
  /v1/responses, /v1/runs. JSON and SSE responses echo it back.
- /v1/capabilities advertises session_key_header so clients can
  feature-detect.

Closes #20060.

Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>

* chore: AUTHOR_MAP entry for manateelazycat

---------

Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
2026-05-05 05:34:47 -07:00
chengoak
8f4c0bf088 fix(wecom): pad base64 AES key before decode
WeCom doesn't pad base64 aeskey, causing Python strict mode decode failure
on media/image/file messages. Add automatic padding before base64 decode:
aes_key + '=' * ((4 - len(aes_key) % 4) % 4).

Salvages the AES padding fix from @chengoak's PR #17040. The SSRF whitelist
entry for a private COS bucket hostname was dropped as it belongs in user
config, not the built-in trusted-private-IP-hosts list. The debug-level
full-body info log was dropped to avoid logging potentially sensitive
message content at INFO level.
2026-05-05 05:00:41 -07:00
0xsir0000
f6b68f0f50 fix(gateway): keep DoH-confirmed Telegram IPs that match system DNS (#14520)
discover_fallback_ips() filtered out any DoH-resolved IP that also appeared
in the system resolver's answer set, on the assumption that the system IP
was unreachable. When DoH and system DNS agreed (a common case), the
function returned the hardcoded _SEED_FALLBACK_IPS list instead — and on
networks where those seed addresses are not routable, the Telegram fallback
transport had nothing usable to retry against and polling failed.

Drop the system_ips exclusion so DoH-confirmed IPs are preserved regardless
of system DNS overlap. The TelegramFallbackTransport already tries the
primary path first via system DNS, then falls through to the IP-rewrite
path on connect failure; including the same IP in both lanes lets a
transient primary failure recover via the explicit IP route instead of
escalating to seed addresses.

Update the two tests that codified the old exclusion to reflect the new,
inclusion-by-default behaviour.

Fixes #14520
2026-05-05 04:42:59 -07:00
Albert.Zhou
fd9c32c0f2 fix(email): drop non-allowlisted senders before dispatch to prevent mail loops
Add EMAIL_ALLOWED_USERS check in EmailAdapter._dispatch_message()
to silently discard emails from senders not in the allowlist.  This
prevents the adapter from creating thread context and dispatching a
MessageEvent for unauthorized senders, which could race with the
gateway authorization check and result in SMTP replies being sent
despite the handler returning None.

Test: tests/gateway/test_email.py::TestDispatchMessage::test_non_allowlisted_sender_dropped
Test: tests/gateway/test_email.py::TestDispatchMessage::test_allowlisted_sender_proceeds
Test: tests/gateway/test_email.py::TestDispatchMessage::test_empty_allowlist_allows_all
2026-05-04 12:35:22 -07:00
EmelyanenkoK
25065283b3 fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
briandevans
ce22301dc6 test(sms): use clear=True in test_missing_phone_number_is_non_retryable
Prevents pre-existing TWILIO_PHONE_NUMBER or SMS_WEBHOOK_URL values in
the outer test environment from leaking into the assertion context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:25:09 -07:00
Pratik Rai
7a8ee8b29d fix(gateway): deduplicate Weixin messages by content fingerprint 2026-05-04 05:20:13 -07:00
OpenClaw Bot
0443484115 fix(qqbot): honor proxy env vars for websocket 2026-05-04 05:06:09 -07:00
QifengKuang
69fc6d9c1e fix(telegram): fall back to document on any send_photo failure, not just dim errors
Broadens the existing fallback (previously only fired for
Photo_invalid_dimensions) to cover every send_photo exception class:
rate limits, corrupt file markers, format edge cases. The expected
dimension case still logs at INFO (document is the right path); all
other cases log at WARNING with exc_info so they're visible in logs.

If send_document itself fails, we still fall back to the base adapter's
text-only 'Image: /path' rendering as a last resort.

Salvage of #15837 — original PR author QifengKuang proposed the broader
try/except-style fallback. Adapted to keep the existing INFO-vs-WARNING
log split for dimension errors (the expected case).

Co-authored-by: QifengKuang <k2767567815@gmail.com>
2026-05-04 04:54:54 -07:00
Kathy
a79b0ec461 fix: keep Feishu topic replies from falling back to new threads (local patch)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-04 04:53:28 -07:00
ygd58
297eaa3533 fix(api_server): emit run.failed when run_conversation returns failed=True
When run_conversation encounters a non-retryable client error (401, 400,
etc.), it returns a dict with failed=True instead of raising. The gateway's
_run_and_close only branched on exceptions, so it always emitted run.completed
even for failed runs — clients could not distinguish success from failure.

Inspect the result dict before emitting: if failed=True, emit run.failed
with the error message; otherwise emit run.completed as before. The existing
except Exception path is unchanged for genuine programming errors.

Fixes #15561
2026-05-04 04:47:36 -07:00
DaniuXie
a45bd28598 fix(wecom): set SUPPORTS_MESSAGE_EDITING=False to prevent broken streaming 2026-05-04 03:10:36 -07:00
ee-blog
f6aa1965d7 fix(telegram): fallback to document when photo dimensions exceed limits
Telegram's send_photo has dimension limits (sum of width+height <= 10000px).
When sending large screenshots or tall images, the API returns
'Photo_invalid_dimensions' error.

Fix: Catch this specific error in send_image_file() and automatically
fallback to send_document() which has no dimension limits (only 50MB size).

This is similar to the existing 5MB URL fallback (commit 542faf22) but
handles local files with dimension issues instead of URL size issues.
2026-05-04 02:33:09 -07:00
barteq
ad4542bf6d fix(gateway): allow free_response_channels to override DISCORD_IGNORE_NO_MENTION
When DISCORD_IGNORE_NO_MENTION is true (default), the bot ignores
messages without @mention. However, this check ran before evaluating
free_response_channels, so messages in free-response channels were
wrongly dropped unless they contained a mention.

This change adds a carve-out: if the message lands in a channel that
is configured as a free response channel (or its parent category is),
the ignore-no-mention rule is skipped.

Also removes the unconditional skip_thread for free response channels
so that auto_thread still creates threads there unless explicitly
disabled via DISCORD_NO_THREAD_CHANNELS.
2026-05-04 02:32:39 -07:00
Asunfly
8a364df2c8 fix: inherit reasoning config in API server runs 2026-05-04 01:44:16 -07:00
Aleksandr Pasevin
8a4fe80f8d fix(signal): skip reactions for unauthorized senders
The on_processing_start hook fired a reaction emoji (👀) on every
inbound Signal message before run.py's _is_user_authorized check.
This meant contacts not in SIGNAL_ALLOWED_USERS would see the bot
react to their messages even though Hermes silently dropped them —
leaking the presence of the bot and causing confusing UX.

Two changes to gateway/platforms/signal.py:

1. Read SIGNAL_ALLOWED_USERS into self.dm_allow_from in __init__
   (mirrors the group_allow_from pattern already in place).

2. Add _reactions_enabled(event) — two-gate check:
   - SIGNAL_REACTIONS=false/0/no disables reactions globally
   - If SIGNAL_ALLOWED_USERS is set, only react to senders in
     the allowlist (skips unauthorized contacts)

Both on_processing_start and on_processing_complete now call this
guard before sending any reaction.

Telegram already has an equivalent _reactions_enabled() guard
(controlled by TELEGRAM_REACTIONS). This brings Signal to parity.
2026-05-04 01:38:21 -07:00
Zyproth
a5cae16496 fix(api_server): fall back to default port on malformed API_SERVER_PORT 2026-05-03 15:27:03 -07:00
charliekerfoot
1148c46241 fix(gateway): correct ws scheme conversion for https urls 2026-05-03 03:54:03 -07:00
Hermes Agent
934103476f fix(gateway): send /new response before cancel_session_processing to avoid race (#18912)
When /new is issued while an agent is actively processing, the confirmation response was never sent to the user because cancel_session_processing() was called before _send_with_retry(). Task cancellation side effects could silently drop the response.

Fix: reorder to send the response BEFORE cancelling the old task. Add logging at the send point (matching the pattern at line 2800 in _process_message_background) so future failures are visible.

Closes: #18912
2026-05-03 03:54:03 -07:00
nftpoetrist
6c1322b997 fix(slack): close previous handler in connect() to prevent zombie Socket Mode connections
SlackAdapter.connect() overwrote self._handler, self._app, and
self._socket_mode_task without closing the prior AsyncSocketModeHandler
first. If connect() was called a second time on the same adapter (e.g.
during a gateway restart or in-process reconnect attempt), the old Socket
Mode websocket stayed alive. Both the old and new connections received
every Slack event and dispatched it twice — producing double responses
with different wording, the same bug that affected DiscordAdapter (#18187,
fixed in #18758).

Fix: add a close-before-reassign guard at the start of the connection
setup path, mirroring the guard DiscordAdapter.connect() already has.
When self._handler is None (fresh adapter, first connect()) the block is
a harmless no-op. Scoped to the handler/app fields only — no behavior
change for any path that does not call connect() twice.

Fixes #18980
2026-05-03 03:47:49 -07:00
0xyg3n
19ba9e43b6 fix(gateway/discord): require allowlist auth on slash commands
Slash commands (_run_simple_slash, _handle_thread_create_slash) bypassed
every DISCORD_ALLOWED_* gate enforced by on_message. Any guild member
could invoke /background (RCE via terminal), /restart, /model, /skill,
etc. CVSS 9.8 Critical.

- _evaluate_slash_authorization mirrors on_message gates (user, role,
  channel, ignored channel) with fail-closed semantics
- _check_slash_authorization sends ephemeral reject + logs + admin alert
- Auth gate runs before defer() so rejections are ephemeral
- /skill autocomplete returns [] for unauthorized users (no catalog leak)
- Component views (ExecApproval, SlashConfirm, UpdatePrompt, ModelPicker)
  now honor role allowlists via shared _component_check_auth helper
- Optional DISCORD_HIDE_SLASH_COMMANDS defense-in-depth
- Cross-platform admin alert (Telegram/Slack fallback) on unauthorized attempts

Based on PR #18125 by @0xyg3n.
2026-05-03 03:44:55 -07:00
MottledShadow
a22465e07a fix(weixin): send_weixin_direct cross-loop session check
When send_message tool is called from inside a running gateway, the
_run_async bridge spawns a worker thread with a separate event loop.
send_weixin_direct then reuses the live adapter's aiohttp session
which was created on the gateway's main loop.  aiohttp's TimerContext
checks asyncio.current_task(loop=session._loop) and sees None because
we're executing on the worker thread's loop → raises 'Timeout context
manager should be used inside a task'.

Fix: skip the live-adapter shortcut when the session belongs to a
different event loop, falling through to the fresh-session path.
2026-05-03 01:51:33 -07:00
teknium1
762eb79f1e fix(gateway): tighten httpx keepalive and close whatsapp typing-response leak (#18451)
Two mitigations for the CLOSE_WAIT accumulation reported against QQ Bot
+ Feishu on macOS behind Cloudflare Warp.

1. Shared httpx.Limits helper (gateway/platforms/_http_client_limits.py).
   Every long-lived platform adapter now constructs httpx.AsyncClient
   with max_keepalive_connections=10 and keepalive_expiry=2.0, vs httpx's
   default of unbounded keepalive pool and 5.0s expiry. On macOS/Warp the
   default 5s window let idle keepalive sockets sit in CLOSE_WAIT long
   enough for seven persistent adapters (QQ Bot, WeCom, DingTalk, Signal,
   BlueBubbles, WeCom-callback, plus the transient Feishu helper) to
   compound to the 256-fd ulimit. Tunable via
   HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY and
   HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE env vars.

2. whatsapp.send_typing aiohttp leak. The call was
   'await self._http_session.post(...)' with no 'async with' and no
   variable capture — the ClientResponse went out of scope unclosed,
   holding its TCP socket in CLOSE_WAIT until GC. Fixed by wrapping in
   'async with'. This was the only bare-await aiohttp leak in the
   gateway/tools/plugins tree per audit; all other aiohttp sites use
   the context-manager pattern correctly.

The underlying reporter also saw Feishu SDK (lark-oapi) connections in
CLOSE_WAIT — those are inside the SDK and out of our direct control, but
tightening httpx keepalive across adapters reduces the aggregate pool
pressure regardless of which individual adapter leaks.
2026-05-02 02:23:37 -07:00
beibi9966
38dd057e91 fix(feishu): finalize remote document downloads inside httpx.AsyncClient context (#18502)
Snapshot Content-Type and body while the client context is still
active so pooled connections fully release on exit. Previously the
read happened after `async with httpx.AsyncClient(...)` returned —
which works today only because httpx eagerly buffers non-streaming
responses; a future refactor to `.stream()` would silently read-
after-close.

Part of the #18451 connection-hygiene audit. Salvage of #18502.
2026-05-02 02:23:37 -07:00
Teknium
1dce908930
fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761)
* fix(gateway): config.yaml wins over .env for agent/display/timezone settings

Regression from the silent config→env bridge. The bridge at module import
time is correct for max_turns (unconditional overwrite), but every other
agent.*, display.*, timezone, and security bridge key was guarded by
'if X not in os.environ' — so a stale .env entry from an old 'hermes setup'
run would shadow the user's current config.yaml indefinitely.

Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60
in .env from an old setup, and the gateway silently capped at 60
iterations per turn. Gateway logs confirmed api_calls never exceeded 60.

Three changes:

1. gateway/run.py: drop the 'not in os.environ' guards for all agent.*,
   display.*, timezone, and security.* bridge keys. config.yaml is now
   authoritative for these settings — same semantics already in place
   for max_turns, terminal.*, and auxiliary.*. Also surface the bridge
   failure (previously 'except Exception: pass') to stderr so operators
   see bridge errors instead of silently falling back to .env.

2. gateway/run.py: INFO-log the resolved max_iterations at gateway
   start so operators can verify the config→env bridge did the right
   thing instead of chasing a phantom budget ceiling.

3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in
   the setup wizard. config.yaml is the single source of truth. Also
   clean up any stale .env entry left behind by pre-fix setups.

Regression tests in tests/gateway/test_config_env_bridge_authority.py
guard each config→env key against the 'stale .env shadows config' bug.

* fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log)

Three issues observed in production gateway.log during a rapid restart
chain on 2026-05-02, all fixed here.

1. _send_restart_notification logged unconditional success
   adapter.send() catches provider errors (e.g. Telegram 'Chat not found')
   and returns SendResult(success=False); it never raises. The caller
   ignored the return value and always logged 'Sent restart notification
   to <chat>' at INFO, producing a misleading success line directly
   below the 'Failed to send Telegram message' traceback on every boot.
   Now inspects result.success and logs WARNING with the error otherwise.

2. WhatsApp bridge SIGTERM on shutdown classified as fatal error
   _check_managed_bridge_exit() saw the bridge's returncode -15 (our own
   SIGTERM from disconnect()) and fired the full fatal-error path,
   producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus
   'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every
   planned shutdown, immediately before the normal '✓ whatsapp
   disconnected'. Adds a _shutting_down flag that disconnect() sets
   before the terminate, and _check_managed_bridge_exit() returns None
   for returncode in {0, -2, -15} while shutting down. OOM-kill (137)
   and other non-signal exits still hit the fatal path.

3. restart_drain_timeout default 60s → 180s
   On 2026-05-02 01:43:27 a user /restart fired while three agents were
   mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget
   expired and all three were force-interrupted. 180s covers realistic
   in-flight agent turns; users on very-long-reasoning models can still
   raise it further via agent.restart_drain_timeout in config.yaml.
   Existing explicit user values are preserved by deep-merge.

Tests
- tests/gateway/test_restart_notification.py: two new tests assert INFO
  is only logged on SendResult(success=True) and WARNING with the error
  string is logged on SendResult(success=False).
- tests/gateway/test_whatsapp_connect.py: parametrized test for
  returncode in {0, -2, -15} proves shutdown-time exits are suppressed;
  separate test proves returncode 137 (SIGKILL/OOM) still surfaces as
  fatal even when _shutting_down is set.
- _check_managed_bridge_exit() reads _shutting_down via getattr-with-
  default so existing _make_adapter() test helpers that bypass __init__
  (pitfall #17 in AGENTS.md) keep working unmodified.
2026-05-02 02:08:06 -07:00
luyao618
292d2fb42f fix(discord): close old client before reconnect to prevent zombie websockets (#18187)
When DiscordAdapter.connect() is called during reconnect, it creates a new
commands.Bot client without closing the previous one. The old client's
websocket remains connected to Discord's gateway, causing both to fire
on_message for every incoming event — resulting in double responses.

Fix: before creating a new Bot instance, check if a previous client exists
and close it. This ensures only one websocket connection is active at any
time.

Closes #18187
2026-05-02 02:04:14 -07:00
Teknium
10297fa23c
fix(discord): /reload-skills now refreshes the /skill autocomplete live (#18754)
`_register_skill_group` captured the skill catalog in closure variables
(`entries` and `skill_lookup`) so the single `tree.add_command` call at
startup owned the only live copy. The closure is never re-entered after
startup, so `/reload-skills` — which rescans the on-disk skills dir and
refreshes the in-process `_skill_commands` registry — had no way to
propagate results into the `/skill` autocomplete on Discord. New skills
stayed invisible in the dropdown, and deleted skills returned
"Unknown skill" when the stale autocomplete entry was clicked.

The fix is purely a dataflow change: promote `entries` and `skill_lookup`
to instance attributes (`_skill_entries`, `_skill_lookup`), split the
collector-driven rebuild into a helper (`_refresh_skill_catalog_state`),
and add a public `refresh_skill_group()` method that re-runs the helper
and is safe to call at any point after the initial registration.

The gateway's `_handle_reload_skills_command` then iterates
`self.adapters` and calls `refresh_skill_group()` on any adapter that
exposes it (currently only Discord). Both sync and async implementations
are supported; adapters that don't override the method (Telegram's
BotCommand menu, Slack subcommand map, etc.) are silently skipped — the
in-process `reload_skills()` call covers them.

No `tree.sync()` is required because Discord fetches autocomplete
options dynamically on every keystroke — mutating the instance state the
callbacks already read from is sufficient. That sidesteps the per-app
command-bucket rate limit (~5 writes / 20 s) that made the previous
bulk-sync-on-reload approach unusable (#16713 context).

Tests: tests/gateway/test_reload_skills_discord_resync.py — five cases
covering (1) refresh replaces entries, (2) entries stay sorted after
refresh, (3) collector exception leaves cached state intact, (4)
`_refresh_skill_catalog_state` populates the instance attrs, (5)
orchestrator calls `refresh_skill_group()` on sync + async adapters and
skips adapters that don't expose it.
2026-05-02 02:00:11 -07:00
Jacob Lizarraga
2470434d60 fix(telegram): probe polling liveness after reconnect to detect wedged Updater
After a transient Telegram 502, _handle_polling_network_error's
stop()+start_polling() cycle can leave PTB's Updater with `running=True`
but a wedged consumer task that never makes progress. No error_callback
fires in that state, so the reconnect ladder never advances past attempt
1, the MAX_NETWORK_RETRIES fatal-error path is never reached, and the
gateway sits silent indefinitely.

Schedule a heartbeat probe (60s after a successful reconnect) that
verifies Updater.running is still True and bot.get_me() responds within
a tight asyncio.wait_for timeout. Either failure feeds back into the
reconnect ladder so the existing escalation path fires.

No PTB-internal coupling, no Application rebuild — minimal additive
defense inside the existing reconnect abstraction.

Tests cover healthy / Updater non-running / probe timeout / probe
network error / already-fatal cases, plus an integration check that the
probe is actually scheduled after a successful start_polling().

Closes the silent-wedge case observed in the wild after a transient
Telegram 502; existing reconnect tests updated to mock bot.get_me() now
that the success path schedules a heartbeat probe.
2026-05-02 01:55:04 -07:00
Amr Essam
d05a87e686 fix(gateway): clear slack assistant thread status 2026-05-01 14:01:26 -07:00
hinotoi-agent
a147164d3c fix(slack): preserve per-user slash-command session isolation 2026-05-01 14:01:26 -07:00
YAMAGUCHI Seiji
2b3923ff13 fix(gateway): coerce scalar free_response_channels to str before split
YAML loads a bare numeric value such as
    discord:
      free_response_channels: 1491973769726791812
as an int.  _discord_free_response_channels() / _slack_free_response_channels()
checked `isinstance(raw, list)` and `isinstance(raw, str)` in that order and
then fell through to `return set()`, so a single-channel config that happened
to be unquoted was silently dropped with no log line — the bot kept demanding
@mentions even though the channel was configured to free-response.

A multi-channel value like `1234567890,9876543210` does not trip this because
the comma forces YAML to parse it as a string.  Single-channel configs are
the only case that breaks, which is exactly the footgun that's hardest to
diagnose (the config "looks right" and the feature just doesn't activate).

Note that the old-schema env-var bridge at gateway/config.py:614+ already
runs `str(frc)` when forwarding to SLACK_/DISCORD_FREE_RESPONSE_CHANNELS,
so the env-var fallback worked.  The bug only surfaces on the
`config.extra["free_response_channels"]` path populated by the `platforms:`
bridge at gateway/config.py:576, which passes the raw YAML value through
unchanged.

Fix at the reader: treat any non-list value as a scalar, coerce with str(),
then apply the same CSV split semantics.  This keeps the public contract
stable (list or str-like continues to work identically) while accepting
the ints that the YAML loader is free to hand us.

Added tests for both Discord and Slack covering:
  - bare int value in config.extra
  - list of ints in config.extra
2026-05-01 14:01:26 -07:00
kshitijk4poor
8fcc160f6b fix(gateway/slack): review fixes — scope ephemeral to commands, user isolation
Self-review fixes for the slash ephemeral ack:

- Only stash response_url when text starts with '/' (gateway command).
  Free-form questions via '/hermes <question>' must produce public agent
  replies visible to the whole channel, not ephemeral.
- Use a ContextVar (_slash_user_id) to thread the invoking user's ID
  from _handle_slash_command through to send().  _pop_slash_context now
  matches the exact (channel_id, user_id) key when the ContextVar is
  set, preventing concurrent users on the same channel from stealing
  each other's ephemeral context.  ContextVars propagate to child
  asyncio.Tasks, so the value survives through handle_message →
  _process_message_background → _send_with_retry → send().
- Add truncate_message() in _send_slash_ephemeral to prevent silent
  failures on long responses (response_url has the same ~40k limit).
- Log send_private_notice failures at debug level instead of bare
  except/pass — aids diagnostics without spamming.
- Document app_mention dedup dependency on shared event ts.
- Add tests: free-form question must NOT stash context, concurrent
  users on the same channel get isolated contexts, non-slash send()
  path fallback behavior.
2026-05-01 13:33:06 -07:00
probepark
0ab2d752ff feat(gateway): private notice delivery and Slack format_message fixes
Adds platform-level private notice delivery abstraction so operational
messages (e.g. sethome prompt) can be sent ephemerally on Slack when
configured with `slack.notice_delivery: private`.

Changes:
- gateway/config.py: _normalize_notice_delivery() + GatewayConfig.get_notice_delivery()
  with per-platform config bridging
- gateway/platforms/base.py: send_private_notice() default implementation
  (falls through to send())
- gateway/platforms/slack.py: send_private_notice() via chat_postEphemeral
- gateway/run.py: _deliver_platform_notice() helper replaces direct
  adapter.send() for the sethome notice, with private→public fallback
- gateway/platforms/slack.py: app_mention handler now forwards to
  _handle_slack_message (safe due to ts-based dedup) instead of no-op pass,
  fixing edge-case Slack configs where mentions arrive only as app_mention
- gateway/platforms/slack.py format_message: negative lookbehind prevents
  markdown images (![]()) from becoming broken Slack links; italic regex
  now requires non-whitespace boundaries so 'a * b * c' stays literal

Based on PR #9340 by @probepark.
2026-05-01 13:33:06 -07:00
kshitijk4poor
7cda0e5224 fix(gateway/slack): ephemeral ack and routing for slash commands
Slack slash commands (/q, /btw, /stop, /model, etc.) previously showed
no user-visible acknowledgement and posted command replies as public
channel messages.  This diverged from Discord, which uses ephemeral
deferred responses for slash commands.

Changes:
- handle_hermes_command now passes response_type='ephemeral' and a
  'Running /cmd…' text to ack(), giving the user immediate 'Only visible
  to you' feedback when they invoke any native slash command.
- _handle_slash_command stashes the Slack response_url from the command
  payload in a per-channel context dict before dispatching to
  handle_message.
- send() checks for a pending slash context and, when found, POSTs to
  the response_url with replace_original=true to swap the initial ack
  with the real command reply (e.g. 'Queued for the next turn.'),
  keeping it ephemeral.
- Stale slash contexts are garbage-collected on lookup (120s TTL).
- The response_url POST is non-fatal: if it fails, the user already saw
  the initial ack, and send() returns success=True.

Fixes #18182
2026-05-01 13:33:06 -07:00
Siddharth Balyan
75e1339d4c
fix(telegram): send seed message after creating DM topics (#18334)
Telegram's client does not display empty forum topics in the chat's
topic list. After createForumTopic succeeds, send a short pin message
into the new topic so it becomes immediately visible to the user.

Only fires for newly created topics (no thread_id in config yet).
Failure to send the seed is non-fatal (debug-logged, topic still works).
2026-05-01 15:21:56 +05:30
UgwujaGeorge
b7ad3f478f fix(yuanbao): enforce owner identity check on group slash commands
The bot-owner identity check inside OwnerCommandMiddleware was commented
out and replaced with a hardcoded `is_owner = True`, so any group member
could trigger allowlisted privileged commands (/approve, /deny, /stop,
/reset, /retry, /undo, /new, /background, /bg, /btw, /queue, /q) by
sending the slash command without @-mentioning the bot. The most severe
case is /approve: a non-owner could approve a dangerous tool call the
bot was waiting on the owner to confirm.

Re-enable the documented identity check (push.from_account ==
push.bot_owner_id) so only the configured owner can issue these
commands.
2026-04-30 23:57:55 -07:00
Teknium
4caad285a6
feat(gateway): auto-delete slash-command system notices after TTL (#18266)
Adds opt-in auto-deletion for slash-command reply messages like
"New session started!", "Restarting gateway…", "Stopped.", and
YOLO toggles.  After the TTL elapses the gateway calls the adapter's
delete_message; on platforms without a delete API (everything except
Telegram today) the TTL is silently ignored and the message stays.

Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful
real-time, but system notices clutter the thread once the agent finishes.

Implementation:

- EphemeralReply(str) sentinel in gateway/platforms/base.py.  Subclasses
  str so existing 'X' in response / response.startswith(...) checks in
  tests and call sites keep working unchanged; isinstance() still
  distinguishes it for the send path.
- _process_message_background and both busy-session bypass paths
  (in base.py) call _unwrap_ephemeral() on the handler return, send
  the unwrapped text, and schedule a detached delete task when the
  TTL > 0 AND the adapter class overrides delete_message.
- display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG.
  Handler can pass ttl_seconds explicitly to override.
- Wrapped the highest-noise return sites: /new, /reset, /stop,
  /yolo on/off, /restart success + "already in progress".  Draining
  notices and /help output left as plain strings — those are
  informational and users want to read them.

Backward-compat: default TTL 0 → no scheduling, no behavior change
for existing users.  Platforms without delete_message silently no-op.
2026-04-30 23:05:48 -07:00
Oxidane-bot
8d7500d80d fix(gateway): snapshot callback generation after agent binds it, not before
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.

In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).

Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.

Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.

Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.

Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).

Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
2026-04-30 20:41:18 -07:00
hharry11
158eb32686 fix(gateway): preserve document type when merging queued events 2026-04-30 20:37:27 -07:00
Roy-oss1
b94cb8e2c4 feat(feishu): operator-configurable bot admission and mention policy
Add two operator-facing toggles for inbound Feishu admission, enabling
bot-to-bot scenarios such as A2A orchestration and inter-bot
notifications:

  FEISHU_ALLOW_BOTS=none|mentions|all   (default: none)
    Accept messages from other bots. `mentions` requires the peer
    bot to @-mention Hermes; `all` admits every peer-bot message.

  FEISHU_REQUIRE_MENTION=true|false     (default: true)
    Whether group messages must @-mention the bot. Override per-chat
    via `group_rules.<chat_id>.require_mention` in config.yaml.

Defaults preserve prior behavior. Self-echo protection is always on:
when the bot's identity is unresolved (auto-detection failed and
FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed
to avoid feedback loops.

Admitted peer bots bypass the human-user allowlist
(FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans
still need an explicit allowlist entry. yaml feishu.allow_bots is
bridged to the env var so the adapter and gateway auth layer share
one source of truth.

Resolving peer-bot display names requires the
application:bot.basic_info:read scope; without it, peers still route
but appear as their open_id.

Test: tests/gateway/test_feishu_bot_admission.py covers the admission
pipeline, group-policy bot-bypass, hydration, and event-dispatch
plumbing as a parametrized matrix.

Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256
2026-04-30 20:30:31 -07:00
Yukipukii1
25cbe3e1d6 fix(gateway): preserve thread routing for /update progress and prompts 2026-04-30 20:19:23 -07:00
hharry11
2997ef9446 fix(api-server): use session-scoped task IDs for tool isolation 2026-04-30 19:59:38 -07:00
johnncenae
a83d579d5b fix(telegram): enforce gateway auth for inline approval callbacks 2026-04-30 19:59:31 -07:00
Teknium
f43b126677 fix(gateway): atomic writes for sibling recovery/dedup state files
Widen PR #17842's atomic-write fix to two sibling sites that exhibit the
same 'partial JSON on interrupted write' class of bug:

- gateway/platforms/feishu.py: dedup state (_dedup_state_path)
- gateway/platforms/helpers.py: ParticipatedThreadTracker save

Both are small recovery/coordination files that get rewritten frequently and
break cross-restart dedup if left partial.
2026-04-30 19:58:16 -07:00
Chris Danis
f61695ee73 fix(signal): skip contentless envelopes (profile key updates, empty messages)
Signal-cli sends dataMessage wrappers for profile key updates and other
metadata events that have no actual text content. These were reaching the
gateway as msg='' and triggering full agent turns for nothing.

Add early return in _handle_envelope() when both message field is empty/
missing/whitespace AND there are no attachments. Messages with media
attachments but no text still flow through.

- 12 lines added to gateway/platforms/signal.py
- 5 new tests in TestSignalContentlessEnvelope class
2026-04-30 19:42:59 -07:00
Teknium
3de8e21683 feat(gateway): native send_multiple_images for Telegram, Discord, Slack, Mattermost, Email
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.

Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
  GIFs peeled off and routed through send_animation (albums don't support
  animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
  over); URL images downloaded into BytesIO so they render inline; forum
  channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
  respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
  chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
  SMTP size governs); remote URLs remain linked in body (parity with
  existing send_image)

All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.

Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.

Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.

Co-authored-by: Maxence Groine <maxence@groine.fr>
2026-04-30 04:28:08 -07:00
Maxence Groine
04ea895ffb feat(gateway/signal): add support for multiple images sending
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.

Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.

Also implements batching + rate-limit handling in the `send_message`
tool.

New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
2026-04-30 04:28:08 -07:00