Commit graph

419 commits

Author SHA1 Message Date
Teknium
276d20e62c
fix(gateway): /restart uses service restart under systemd instead of detached subprocess
The detached bash subprocess spawned by /restart gets killed by
systemd's KillMode=mixed cgroup cleanup, leaving the gateway dead.

Under systemd (detected via INVOCATION_ID env var), /restart now uses
via_service=True which exits with code 75 — RestartForceExitStatus=75
in the unit file makes systemd auto-restart the service. The detached
subprocess approach is preserved as fallback for non-systemd
environments (Docker, tmux, foreground mode).
2026-04-12 22:32:19 -07:00
Teknium
8a64f3e368 feat(gateway): notify /restart requester when gateway comes back online
When a user sends /restart, the gateway now persists their routing info
(platform, chat_id, thread_id) to .restart_notify.json. After the new
gateway process starts and adapters connect, it reads the file, sends a
'Gateway restarted successfully' message to that specific chat, and
cleans up the file.

This follows the same pattern as _send_update_notification (used by
/update). Thread IDs are preserved so the notification lands in the
correct Telegram topic or Discord thread.

Previously, after /restart the user had no feedback that the gateway was
back — they had to send a message to find out. Now they get a proactive
notification and know their session continues.
2026-04-12 22:23:48 -07:00
Teknium
a0cd2c5338
fix(gateway): verbose tool progress no longer truncates args when tool_preview_length is 0 (#8735)
When tool_preview_length is 0 (default for platforms without a tier
default, like Session), verbose mode was truncating args JSON to 200
characters.  Since the user explicitly opted into verbose mode, they
expect full tool call detail — the 200-char cap defeated the purpose.

Now: tool_preview_length=0 means no truncation in verbose mode.
Positive values still cap as before.  Platform message-length limits
handle overflow naturally.
2026-04-12 20:05:12 -07:00
Teknium
15b1a3aa69
fix: improve WhatsApp UX — chunking, formatting, streaming (#8723)
Three changes that address the poor WhatsApp experience reported by users:

1. Reclassify WhatsApp from TIER_LOW to TIER_MEDIUM in display_config.py
   — enables streaming and tool progress via the existing Baileys /edit
   bridge endpoint. Users now see progressive responses instead of
   minutes of silence followed by a wall of text.

2. Lower MAX_MESSAGE_LENGTH from 65536 to 4096 and add proper chunking
   — send() now calls format_message() and truncate_message() before
   sending, then loops through chunks with a small delay between them.
   The base class truncate_message() already handles code block boundary
   detection (closes/reopens fences at chunk boundaries). reply_to is
   only set on the first chunk.

3. Override format_message() with WhatsApp-specific markdown conversion
   — converts **bold** to *bold*, ~~strike~~ to ~strike~, headers to
   bold text, and [links](url) to text (url). Code blocks and inline
   code are protected from conversion via placeholder substitution.

Together these fix the two user complaints:
- 'sends the whole code all the time' → now chunked at 4K with proper
  formatting
- 'terminal gets interrupted and gets cooked' → streaming + tool progress
  give visual feedback so users don't accidentally interrupt with
  follow-up messages
2026-04-12 19:20:13 -07:00
Teknium
9e992df8ae
fix(telegram): use UTF-16 code units for message length splitting (#8725)
Port from nearai/ironclaw#2304: Telegram's 4096 character limit is
measured in UTF-16 code units, not Unicode codepoints. Characters
outside the Basic Multilingual Plane (emoji like 😀, CJK Extension B,
musical symbols) are surrogate pairs: 1 Python char but 2 UTF-16 units.

Previously, truncate_message() used Python's len() which counts
codepoints. This could produce chunks exceeding Telegram's actual limit
when messages contain many astral-plane characters.

Changes:
- Add utf16_len() helper and _prefix_within_utf16_limit() for
  UTF-16-aware string measurement and truncation
- Add _custom_unit_to_cp() binary-search helper that maps a custom-unit
  budget to the largest safe codepoint slice position
- Update truncate_message() to accept optional len_fn parameter
- Telegram adapter now passes len_fn=utf16_len when splitting messages
- Fix fallback truncation in Telegram error handler to use
  _prefix_within_utf16_limit instead of codepoint slicing
- Update send_message_tool.py to use utf16_len for Telegram platform
- Add comprehensive tests: utf16_len, _prefix_within_utf16_limit,
  truncate_message with len_fn (emoji splitting, content preservation,
  code block handling)
- Update mock lambdas in reply_mode tests to accept **kw for len_fn
2026-04-12 19:06:20 -07:00
Teknium
f724079d3b fix(gateway): reject known-weak placeholder credentials at startup
Port from openclaw/openclaw#64586: users who copy .env.example without
changing placeholder values now get a clear error at startup instead of
a confusing auth failure from the platform API. Also rejects placeholder
API_SERVER_KEY when binding to a network-accessible address.

Cherry-picked from PR #8677.
2026-04-12 18:05:41 -07:00
Teknium
c7d8d109ff fix(matrix): trust m.mentions.user_ids as authoritative mention signal
Port from openclaw/openclaw#64796: Per MSC3952 / Matrix v1.7, the
m.mentions.user_ids field is the authoritative mention signal. Clients
that populate m.mentions but don't duplicate @bot in the body text
were being silently dropped when MATRIX_REQUIRE_MENTION=true.

Cherry-picked from PR #8673.
2026-04-12 18:05:41 -07:00
Teknium
bcad679799 fix(api_server): normalize array-based content parts in chat completions
Some OpenAI-compatible clients (Open WebUI, LobeChat, etc.) send
message content as an array of typed parts instead of a plain string:

    [{"type": "text", "text": "hello"}]

The agent pipeline expects strings, so these array payloads caused
silent failures or empty messages.

Add _normalize_chat_content() with defensive limits (recursion depth,
list size, output length) and apply it to both the Chat Completions
and Responses API endpoints. The Responses path had inline
normalization that only handled input_text/output_text — the shared
function also handles the standard 'text' type.

Salvaged from PR #7980 (ikelvingo) — only the content normalization;
the SSE and Weixin changes in that PR were regressions and are not
included.

Co-authored-by: ikelvingo <ikelvingo@users.noreply.github.com>
2026-04-12 18:03:16 -07:00
Teknium
a266238e1e
fix(weixin): streaming cursor, media uploads, markdown links, blank messages (#8665)
Four fixes for the Weixin/WeChat adapter, synthesized from the best
aspects of community PRs #8407, #8521, #8360, #7695, #8308, #8525,
#7531, #8144, #8251.

1. Streaming cursor (▉) stuck permanently — WeChat doesn't support
   message editing, so the cursor appended during streaming can never
   be removed.  Add SUPPORTS_MESSAGE_EDITING = False to WeixinAdapter
   and check it in gateway/run.py to use an empty cursor for non-edit
   platforms.  (Fixes #8307, #8326)

2. Media upload failures — two bugs in _send_file():
   a) upload_full_url path used PUT (404 on WeChat CDN); now uses POST.
   b) aes_key was base64(raw_bytes) but the iLink API expects
      base64(hex_string); images showed as grey boxes.  (Fixes #8352, #7529)
   Also: unified both upload paths into _upload_ciphertext(), preferring
   upload_full_url.  Added send_video/send_voice methods and voice_item
   media builder for audio/.silk files.  Added video_md5 field.

3. Markdown links stripped — WeChat can't render [text](url), so
   format_message() now converts them to 'text (url)' plaintext.
   Code blocks are preserved.  (Fixes #7617)

4. Blank message prevention — three guards:
   a) _split_text_for_weixin_delivery('') returns [] not ['']
   b) send() filters empty/whitespace chunks before _send_text_chunk
   c) _send_message() raises ValueError for empty text as safety net

Community credit: joei4cm (#8407), lyonDan (#8521), SKFDJKLDG (#8360),
tomqiaozc (#7695), joshleeeeee (#8308), luoxiao6645(#8525),
longsizhuo (#7531), Astral-Yang (#8144), QingWei-Li (#8251).
2026-04-12 16:43:25 -07:00
Teknium
1179918746 fix: salvage follow-ups for Feishu QR onboarding (#7706)
- Remove duplicate _setup_feishu() definition (old 3-line version left
  behind by cherry-pick — Python picked the new one but dead code
  remained)
- Remove misleading 'Disable direct messages' DM option — the Feishu
  adapter has no DM policy mechanism, so 'disable' produced identical
  env vars to 'pairing'. Users who chose 'disable' would still see
  pairing prompts. Reduced to 3 options: pairing, allow-all, allowlist.
- Fix test_probe_returns_bot_info_on_success and
  test_probe_returns_none_on_failure: patch FEISHU_AVAILABLE=True so
  probe_bot() takes the SDK path when lark_oapi is not installed
2026-04-12 13:05:56 -07:00
Shuo
d7785f4d5b feat(feishu): add scan-to-create onboarding for Feishu / Lark
Add a QR-based onboarding flow to `hermes gateway setup` for Feishu / Lark.
Users scan a QR code with their phone and the platform creates a fully
configured bot application automatically — matching the existing WeChat
QR login experience.

Setup flow:
- Choose between QR scan-to-create (new app) or manual credential input (existing app)
- Connection mode selection (WebSocket / Webhook)
- DM security policy (pairing / open / allowlist / disabled)
- Group chat policy (open with @mention / disabled)

Implementation:
- Onboard functions (init/begin/poll/QR/probe) in gateway/platforms/feishu.py
- _setup_feishu() in hermes_cli/gateway.py with manual fallback
- probe_bot uses lark_oapi SDK when available, raw HTTP fallback otherwise
- qr_register() catches expected errors (network/protocol), propagates bugs
- Poll handles HTTP 4xx JSON responses and feishu/lark domain auto-detection

Tests:
- 25 tests for onboard module (registration, QR, probe, contract, negative paths)
- 16 tests for setup flow (credentials, connection mode, DM policy, group policy,
  adapter integration verifying env vars produce valid FeishuAdapterSettings)

Change-Id: I720591ee84755f32dda95fbac4b26dc82cbcf823
2026-04-12 13:05:56 -07:00
Teknium
4eecaf06e4
fix: prevent duplicate update prompt spam in gateway watcher (#8343)
The _watch_update_progress() poll loop never deleted .update_prompt.json
after forwarding the prompt to the user, causing the same prompt to be
re-sent every poll cycle (2s). Two fixes:

1. Delete .update_prompt.json after forwarding — the update process only
   polls for .update_response, it doesn't need the prompt file to persist.
2. Guard re-sends with _update_prompt_pending check — belt-and-suspenders
   to prevent duplicates even under race conditions.

Add regression test asserting the prompt is sent exactly once.
2026-04-12 04:52:59 -07:00
Teknium
b6b6b02f0f
fix: prevent unwanted session auto-reset after graceful gateway restarts (#8299)
When the gateway shuts down gracefully (hermes update, gateway restart,
/restart), it now writes a .clean_shutdown marker file. On the next
startup, if this marker exists, suspend_recently_active() is skipped
and the marker is cleaned up.

Previously, suspend_recently_active() fired on EVERY startup —
including planned restarts from hermes update or hermes gateway restart.
This caused users to lose their conversation history unexpectedly: the
session would be marked as suspended, and the next message would
trigger an auto-reset with a notification the user never asked for.

The original purpose of suspend_recently_active() is crash recovery —
preventing stuck sessions that were mid-processing when the gateway
died unexpectedly. Graceful shutdowns already drain active agents via
_drain_active_agents(), so there is no stuck-session risk. After a
crash (no marker written), suspension still fires as before.

Fixes the scenario where a user asks the agent to run hermes update,
the gateway restarts, and the user's next message gets an unwanted
'Session automatically reset' notification with their history cleared.
2026-04-12 03:03:07 -07:00
bravohenry
81ac62c0e9 fix(weixin): split chatty short replies into separate bubbles, keep structured content together
Add content-aware splitting to compact mode: short chat-like exchanges
(2-6 short lines without headings/lists/quotes) get separate message
bubbles for a natural chat feel, while structured content (tables,
headings with body, numbered lists) stays in a single message.

Cherry-picked from PR #7587 by bravohenry, adapted to the compact/legacy
split_per_line architecture from #7903.
2026-04-12 00:38:07 -07:00
Teknium
f53a5a7fe1
fix: suppress duplicate completion notifications when agent already consumed output via wait/poll/log (#8228)
When the agent calls process(action='wait') or process(action='poll')
and gets the exited status, the completion_queue notification is
redundant — the agent already has the output from the tool return.
Previously, the drain loops in CLI and gateway would still inject
the [SYSTEM: Background process completed] message, causing the
agent to receive the same information twice.

Fix: track session IDs in _completion_consumed set when wait/poll/log
returns an exited process. Drain loops in cli.py and gateway watcher
skip completion events for consumed sessions. Watch pattern events
are never suppressed (they have independent semantics).

Adds 4 tests covering wait/poll/log marking and running-process
negative case.
2026-04-12 00:36:22 -07:00
Tom Qiao
8a48c58bd3 fix(gateway): add missing RedactingFormatter import
The gateway startup path references RedactingFormatter without
importing it, causing a NameError crash when launched with a
verbosity flag (e.g. via launchd --replace).

Fixes #8044

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 19:38:05 -07:00
Teknium
a0a02c1bc0
feat: /compress <focus> — guided compression with focus topic (#8017)
Adds an optional focus topic to /compress: `/compress database schema`
guides the summariser to preserve information related to the focus topic
(60-70% of summary budget) while compressing everything else more aggressively.
Inspired by Claude Code's /compact <focus>.

Changes:
- context_compressor.py: focus_topic parameter on _generate_summary() and
  compress(); appends FOCUS TOPIC guidance block to the LLM prompt
- run_agent.py: focus_topic parameter on _compress_context(), passed through
  to the compressor
- cli.py: _manual_compress() extracts focus topic from command string,
  preserves existing manual_compression_feedback integration (no regression)
- gateway/run.py: _handle_compress_command() extracts focus from event args
  and passes through — full gateway parity
- commands.py: args_hint="[focus topic]" on /compress CommandDef

Salvaged from PR #7459 (CLI /compress focus only — /context command deferred).
15 new tests across CLI, compressor, and gateway.
2026-04-11 19:23:29 -07:00
helix4u
cfbfc4c3f1 fix(discord): decouple readiness from slash sync 2026-04-11 19:22:14 -07:00
Siddharth Balyan
50d86b3c71
fix(matrix): replace pickle crypto store with SQLite, fix E2EE decryption (#7981)
Fixes #7952 — Matrix E2EE completely broken after mautrix migration.

- Replace MemoryCryptoStore + pickle/HMAC persistence with mautrix's
  PgCryptoStore backed by SQLite via aiosqlite. Crypto state now
  persists reliably across restarts without fragile serialization.

- Add handle_sync() call on initial sync response so to-device events
  (queued Megolm key shares) are dispatched to OlmMachine instead of
  being silently dropped.

- Add _verify_device_keys_on_server() after loading crypto state.
  Detects missing keys (re-uploads), stale keys from migration
  (attempts re-upload), and corrupted state (refuses E2EE).

- Add _CryptoStateStore adapter wrapping MemoryStateStore to satisfy
  mautrix crypto's StateStore interface (is_encrypted,
  get_encryption_info, find_shared_rooms).

- Remove redundant share_keys() call from sync loop — OlmMachine
  already handles this via DEVICE_OTK_COUNT event handler.

- Fix datetime vs float TypeError in session.py suspend_recently_active()
  that crashed gateway startup.

- Add aiosqlite and asyncpg to [matrix] extra in pyproject.toml.

- Update test mocks for PgCryptoStore/Database and add query_keys mock
  for key verification. 174 tests pass.

- Add E2EE upgrade/migration docs to Matrix user guide.
2026-04-12 07:24:46 +05:30
Teknium
723b5bec85
feat: per-platform display verbosity configuration (#8006)
Add display.platforms section to config.yaml for per-platform overrides of
display settings (tool_progress, show_reasoning, streaming, tool_preview_length).

Each platform gets sensible built-in defaults based on capability tier:
- High (telegram, discord): tool_progress=all, streaming follows global
- Medium (slack, mattermost, matrix, feishu): tool_progress=new
- Low (signal, whatsapp, bluebubbles, wecom, etc.): tool_progress=off, streaming=false
- Minimal (email, sms, webhook, homeassistant): tool_progress=off, streaming=false

Example config:
  display:
    platforms:
      telegram:
        tool_progress: all
        show_reasoning: true
      slack:
        tool_progress: off

Resolution order: platform override > global setting > built-in platform default.

Changes:
- New gateway/display_config.py: resolver module with tier-based platform defaults
- gateway/run.py: tool_progress, tool_preview_length, streaming, show_reasoning
  all resolve per-platform via the new resolver
- /verbose command: now cycles tool_progress per-platform (saves to
  display.platforms.<platform>.tool_progress instead of global)
- /reasoning show|hide: now saves show_reasoning per-platform
- Config version 15 -> 16: migrates tool_progress_overrides into display.platforms
- Backward compat: legacy tool_progress_overrides still read as fallback
- 27 new tests for resolver, normalization, migration, backward compat
- Updated verbose command tests for per-platform behavior

Addresses community request for per-channel verbosity control (Guillaume Meyer,
Nathan Danielsen) — high verbosity on backchannel Telegram, low on customer-facing
Slack, none on email.
2026-04-11 17:20:34 -07:00
asheriif
97b0cd51ee feat(gateway): surface natural mid-turn assistant messages in chat platforms
Add display.interim_assistant_messages config (enabled by default) that
forwards completed assistant commentary between tool calls to the user
as separate chat messages. Models already emit useful status text like
'I'll inspect the repo first.' — this surfaces it on Telegram, Discord,
and other messaging platforms instead of swallowing it.

Independent from tool_progress and gateway streaming. Disabled for
webhooks. Uses GatewayStreamConsumer when available, falls back to
direct adapter send. Tracks response_previewed to prevent double-delivery
when interim message matches the final response.

Also fixes: cursor not stripped from fallback prefix in stream consumer
(affected continuation calculation on no-edit platforms like Signal).

Cherry-picked from PR #7885 by asheriif, default changed to enabled.
Fixes #5016
2026-04-11 16:21:39 -07:00
0xbyt4
32519066dc fix(gateway): add HERMES_SESSION_KEY to session_context contextvars
Complete the contextvars migration by adding HERMES_SESSION_KEY to the
unified _VAR_MAP in session_context.py. Without this, concurrent gateway
handlers race on os.environ["HERMES_SESSION_KEY"].

- Add _SESSION_KEY ContextVar to _VAR_MAP, set_session_vars(), clear_session_vars()
- Wire session_key through _set_session_env() from SessionContext
- Replace os.getenv fallback in tools/approval.py with get_session_env()
  (function-level import to avoid cross-layer coupling)
- Keep os.environ set as CLI/cron fallback

Cherry-picked from PR #7878 by 0xbyt4.
2026-04-11 15:35:04 -07:00
chqchshj
5f0caf54d6 feat(gateway): add WeCom callback-mode adapter for self-built apps
Add a second WeCom integration mode for regular enterprise self-built
applications.  Unlike the existing bot/websocket adapter (wecom.py),
this handles WeCom's standard callback flow: WeCom POSTs encrypted XML
to an HTTP endpoint, the adapter decrypts, queues for the agent, and
immediately acknowledges.  The agent's reply is delivered proactively
via the message/send API.

Key design choice: always acknowledge immediately and use proactive
send — agent sessions take 3-30 minutes, so the 5-second inline reply
window is never useful.  The original PR's Future/pending-reply
machinery was removed in favour of this simpler architecture.

Features:
- AES-CBC encrypt/decrypt (BizMsgCrypt-compatible)
- Multi-app routing scoped by corp_id:user_id
- Legacy bare user_id fallback for backward compat
- Access-token management with auto-refresh
- WECOM_CALLBACK_* env var overrides
- Port-in-use pre-check before binding
- Health endpoint at /health

Salvaged from PR #7774 by @chqchshj.  Simplified by removing the
inline reply Future system and fixing: secrets.choice for nonce
generation, immediate plain-text acknowledgment (not encrypted XML
containing 'success'), and initial token refresh error handling.
2026-04-11 15:22:49 -07:00
etcircle
72b345e068 fix(gateway): preserve queued voice events for STT 2026-04-11 14:43:53 -07:00
Teknium
b0892375cd fix: mock aiohttp server in startup guard tests to avoid port binding
The startup guard tests called connect() which bound a real aiohttp
server on port 8080 — flaky in any environment where the port is
in use. Mock AppRunner, TCPSite, and ClientSession instead.
2026-04-11 14:05:38 -07:00
Mariano Nicolini
0a922bf218 add new test covering edge case where both insecure_no_sig and _webhook_url are set 2026-04-11 14:05:38 -07:00
Mariano Nicolini
8ce6aaac23 change Twilio signature verification from opt-in to opt-out 2026-04-11 14:05:38 -07:00
Mariano Nicolini
ad1e8804a6 handle port variants in Twilio signatures 2026-04-11 14:05:38 -07:00
Mariano Nicolini
c22bffc92e add basic twilio signature checking and tests 2026-04-11 14:05:38 -07:00
Markus Corazzione
885123d44b fix(weixin): add per-chunk retry with backoff for text delivery
When sending multi-chunk responses, individual chunks can fail due to
transient iLink API errors. Previously a single failure would abort the
entire message. Now each chunk is retried with linear backoff before
giving up, and the same client_id is reused across retries for
server-side deduplication.

Configurable via config.yaml (platforms.weixin.extra) or env vars:
- send_chunk_delay_seconds (default 0.35s) — pacing between chunks
- send_chunk_retries (default 2) — max retry attempts per chunk
- send_chunk_retry_delay_seconds (default 1.0s) — base retry delay

Replaces the hardcoded 0.3s inter-chunk delay from #7903.

Salvaged from PR #7899 by @corazzione. Fixes #7836.
2026-04-11 14:02:33 -07:00
Teknium
04c1c5d53f
refactor: extract shared helpers to deduplicate repeated code patterns (#7917)
* refactor: add shared helper modules for code deduplication

New modules:
- gateway/platforms/helpers.py: MessageDeduplicator, TextBatchAggregator,
  strip_markdown, ThreadParticipationTracker, redact_phone
- hermes_cli/cli_output.py: print_info/success/warning/error, prompt helpers
- tools/path_security.py: validate_within_dir, has_traversal_component
- utils.py additions: safe_json_loads, read_json_file, read_jsonl,
  append_jsonl, env_str/lower/int/bool helpers
- hermes_constants.py additions: get_config_path, get_skills_dir,
  get_logs_dir, get_env_path

* refactor: migrate gateway adapters to shared helpers

- MessageDeduplicator: discord, slack, dingtalk, wecom, weixin, mattermost
- strip_markdown: bluebubbles, feishu, sms
- redact_phone: sms, signal
- ThreadParticipationTracker: discord, matrix
- _acquire/_release_platform_lock: telegram, discord, slack, whatsapp,
  signal, weixin

Net -316 lines across 19 files.

* refactor: migrate CLI modules to shared helpers

- tools_config.py: use cli_output print/prompt + curses_radiolist (-117 lines)
- setup.py: use cli_output print helpers + curses_radiolist (-101 lines)
- mcp_config.py: use cli_output prompt (-15 lines)
- memory_setup.py: use curses_radiolist (-86 lines)

Net -263 lines across 5 files.

* refactor: migrate to shared utility helpers

- safe_json_loads: agent/display.py (4 sites)
- get_config_path: skill_utils.py, hermes_logging.py, hermes_time.py
- get_skills_dir: skill_utils.py, prompt_builder.py
- Token estimation dedup: skills_tool.py imports from model_metadata
- Path security: skills_tool, cronjob_tools, skill_manager_tool, credential_files
- Non-atomic YAML writes: doctor.py, config.py now use atomic_yaml_write
- Platform dict: new platforms.py, skills_config + tools_config derive from it
- Anthropic key: new get_anthropic_key() in auth.py, used by doctor/status/config/main

* test: update tests for shared helper migrations

- test_dingtalk: use _dedup.is_duplicate() instead of _is_duplicate()
- test_mattermost: use _dedup instead of _seen_posts/_prune_seen
- test_signal: import redact_phone from helpers instead of signal
- test_discord_connect: _platform_lock_identity instead of _token_lock_identity
- test_telegram_conflict: updated lock error message format
- test_skill_manager_tool: 'escapes' instead of 'boundary' in error msgs
2026-04-11 13:59:52 -07:00
WAXLYY
f4f4078ad9 fix(gateway/weixin): ensure atomic persistence for critical session state 2026-04-11 13:48:25 -07:00
helix4u
39da23a129 fix(api-server): keep chat-completions SSE alive 2026-04-11 13:47:25 -07:00
Teknium
cac6178104 fix(gateway): propagate user identity through process watcher pipeline
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:

- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it

Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.

Fix:
1. Propagate user_id/user_name through the full watcher chain:
   session_context.py → gateway/run.py → terminal_tool.py →
   process_registry.py (including checkpoint persistence/recovery)

2. Add None user_id guard in _handle_message() — silently drop
   non-internal messages with no user identity instead of triggering
   the pairing flow.

Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).

Closes #6341, #6485, #7643
Relates to #6516, #7392
2026-04-11 13:46:16 -07:00
Teknium
da9f96bf51
fix(weixin): keep multi-line messages in single bubble by default (#7903)
The Weixin adapter was splitting responses at every top-level newline,
causing notification spam (up to 70 API calls for a single long markdown
response). This salvages the best aspects of six contributor PRs:

Compact mode (new default):
- Messages under the 4000-char limit stay as a single bubble even with
  multiple lines, paragraphs, and code blocks
- Only oversized messages get split at logical markdown boundaries
- Inter-chunk delay (0.3s) between chunks prevents WeChat rate-limit drops

Legacy mode (opt-in):
- Set split_multiline_messages: true in platforms.weixin.extra config
- Or set WEIXIN_SPLIT_MULTILINE_MESSAGES=true env var
- Restores the old per-line splitting behavior

Salvaged from PRs #7797 (guantoubaozi), #7792 (luoxiao6645),
#7838 (qyx596), #7825 (weedge), #7784 (sherunlock03), #7773 (JnyRoad).
Core fix unanimous across all six; config toggle from #7838; inter-chunk
delay from #7825.
2026-04-11 12:00:05 -07:00
Teknium
06e1d9cdd4
fix: resolve three high-impact community bugs (#5819, #6893, #3388) (#7881)
Matrix gateway: fix sync loop never dispatching events (#5819)
- _sync_loop() called client.sync() but never called handle_sync()
  to dispatch events to registered callbacks — _on_room_message was
  registered but never fired for new messages
- Store next_batch token from initial sync and pass as since= to
  subsequent incremental syncs (was doing full initial sync every time)
- 17 comments, confirmed by multiple users on matrix.org

Feishu docs: add interactive card configuration for approvals (#6893)
- Error 200340 is a Feishu Developer Console configuration issue,
  not a code bug — users need to enable Interactive Card capability
  and configure Card Request URL
- Added required 3-step setup instructions to feishu.md
- Added troubleshooting entry for error 200340
- 17 comments from Feishu users

Copilot provider drift: detect GPT-5.x Responses API requirement (#3388)
- GPT-5.x models are rejected on /v1/chat/completions by both OpenAI
  and OpenRouter (unsupported_api_for_model error)
- Added _model_requires_responses_api() to detect models needing
  Responses API regardless of provider
- Applied in __init__ (covers OpenRouter primary users) and in
  _try_activate_fallback() (covers Copilot->OpenRouter drift)
- Fixed stale comment claiming gateway creates fresh agents per message
  (it caches them via _agent_cache since the caching was added)
- 7 comments, reported on Copilot+Telegram gateway
2026-04-11 11:12:20 -07:00
Siddharth Balyan
69f3aaa1d6
fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21 (#7848)
* fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21

MemoryCryptoStore.__init__() now requires account_id and pickle_key
positional arguments as of mautrix 0.21. The migration from matrix-nio
(commit 1850747) didn't account for this, causing E2EE initialization
to fail with:

  MemoryCryptoStore.__init__() missing 2 required positional arguments:
  'account_id' and 'pickle_key'

Pass self._user_id as account_id and derive pickle_key from the same
user_id:device_id pair already used for the on-disk HMAC signature.

Update the test stub to accept the new parameters.

Fixes #7803

* fix: use consistent fallback for pickle_key derivation

Address review: _pickle_key now uses _acct_id (which has the 'hermes'
fallback) instead of raw self._user_id, so both values stay consistent
when user_id is empty.

---------

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-04-11 10:43:49 -07:00
Hygaard
a2f9f04c06 fix: honor session-scoped gateway model overrides 2026-04-11 03:11:34 -07:00
Kenny Xie
eb8071bbc1 test(gateway): isolate blocking approval env 2026-04-11 02:03:20 -07:00
Kenny Xie
ecfae98152 fix(gateway): address restart review feedback 2026-04-10 21:18:34 -07:00
Kenny Xie
3163731289 fix(gateway): drain in-flight work before restart 2026-04-10 21:18:34 -07:00
Teknium
241032455c
fix: don't evict cached agent on failed runs — prevents MCP restart loop (#7539)
* fix: circuit breaker stops CPU-burning restart loops on persistent errors

When a gateway session hits a non-retryable error (e.g. invalid model
ID → HTTP 400), the agent fails and returns. But if the session keeps
receiving messages (or something periodically recreates agents), each
attempt spawns a new AIAgent — reinitializing MCP server connections,
burning CPU — only to hit the same 400 error again. On a 4-core server,
this pegs an entire core per stuck session and accumulates 300+ minutes
of CPU time over hours.

Fix: add a per-session consecutive failure counter in the gateway runner.

- Track consecutive non-retryable failures per session key
- After 3 consecutive failures (_MAX_CONSECUTIVE_FAILURES), block
  further agent creation for that session and notify the user:
  '⚠️ This session has failed N times in a row with a non-retryable
  error. Use /reset to start a new session.'
- Evict the cached agent when the circuit breaker engages to prevent
  stale state from accumulating
- Reset the counter on successful agent runs
- Clear the counter on /reset and /new so users can recover
- Uses getattr() pattern so bare GatewayRunner instances (common in
  tests using object.__new__) don't crash

Tests:
- 8 new tests in test_circuit_breaker.py covering counter behavior,
  threshold, reset, session isolation, and bare-runner safety

Addresses #7130.

* Revert "fix: circuit breaker stops CPU-burning restart loops on persistent errors"

This reverts commit d848ea7109.

* fix: don't evict cached agent on failed runs — prevents MCP restart loop

When a run fails (e.g. invalid model ID → 400) and fallback activated,
the gateway was evicting the cached agent to 'retry primary next time.'
But evicting a failed agent forces a full AIAgent recreation on the next
message — reinitializing MCP server connections, spawning stdio
processes — only to hit the same 400 again. This created a CPU-burning
loop (91%+ for hours, #7130).

The fix: add `and not _run_failed` to the fallback-eviction check.
Failed runs keep the cached agent. The next message reuses it (no MCP
reinit), hits the same error, returns it to the user quickly. The user
can /reset or /model to fix their config.

Successful fallback runs still evict as before so the next message
retries the primary model.

Addresses #7130.
2026-04-10 21:16:56 -07:00
Kenny Xie
1ffd92cc94 fix(gateway): make manual compression feedback truthful 2026-04-10 21:16:53 -07:00
Kenny Xie
d6c2ad7e41 fix(gateway): make compress responses truthful 2026-04-10 21:16:53 -07:00
Teknium
be9198f1e1 fix: guard mautrix imports for gateway-safe fallback + fix test isolation
Follow-up fixes for the matrix-nio → mautrix migration:

1. Module-level mautrix.types import now wrapped in try/except with
   proper stub classes. Without this, importing gateway.platforms.matrix
   crashes the entire gateway when mautrix isn't installed — even for
   users who don't use Matrix. The stubs mirror mautrix's real attribute
   names so tests that exercise adapter methods (send, reactions, etc.)
   work without the real SDK.

2. Removed _ensure_mautrix_mock() from test_matrix_mention.py — it
   permanently installed MagicMock modules in sys.modules via setdefault(),
   polluting later tests in the suite. No longer needed since the module
   imports cleanly without mautrix.

3. Fixed thread persistence tests to use direct class reference in
   monkeypatch.setattr() instead of string-based paths, which broke
   when the module was reimported by other tests.

4. Moved the module-importability test to a subprocess to prevent it
   from polluting sys.modules (reimporting creates a second module object
   with different __dict__, breaking patch.object in subsequent tests).
2026-04-10 21:15:59 -07:00
alt-glitch
417e28f941 test(matrix): update all test mocks for mautrix-python API
Rewrite mock infrastructure across three test files:
- test_matrix.py: replace fake nio module with fake mautrix module tree,
  update all client method mocks to new API names and return types
- test_matrix_voice.py: update event construction, download/upload mocks,
  handler invocation (single event arg, no room object)
- test_matrix_mention.py: update mock module, event construction, DM
  detection via _dm_rooms cache instead of room.member_count

157 tests passing.
2026-04-10 21:15:59 -07:00
Bartok Moltbot
992422910c fix(api): send tool progress as custom SSE event to prevent model corruption (#6972)
Tool progress markers (e.g. ` list`) were injected directly into
SSE delta.content chunks. OpenAI-compatible frontends (Open WebUI,
LobeChat, etc.) store delta.content verbatim as the assistant message
and send it back on subsequent requests. After enough turns, the model
learns to emit these markers as plain text instead of issuing real tool
calls — silently hallucinating tool results without ever running them.

Fix: Send tool progress as a custom `event: hermes.tool.progress` SSE
event instead of mixing it into delta.content. Per the SSE spec, clients
that don't understand a custom event type silently ignore it, so this is
backward-compatible. Frontends that want to render progress indicators
can listen for the custom event without persisting it to conversation
history.

The /v1/runs endpoint already uses structured events — this aligns the
/v1/chat/completions streaming path with the same principle.

Closes #6972
2026-04-10 18:55:26 -07:00
0xFrank-eth
e8034e2f6a fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.

Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.

Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
  helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
  accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
  terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
  agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API

Fixes #7358

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 17:04:38 -07:00
entropidelic
989b950fbc fix(security): enforce API_SERVER_KEY for non-loopback binding
Add is_network_accessible() helper using Python's ipaddress module to
robustly classify bind addresses (IPv4/IPv6 loopback, wildcards,
mapped addresses, hostname resolution with DNS-failure-fails-closed).

The API server connect() now refuses to start when the bind address is
network-accessible and no API_SERVER_KEY is set, preventing RCE from
other machines on the network.

Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>
2026-04-10 16:51:44 -07:00
Fran Fitzpatrick
3e24ba1656 feat(matrix): add MATRIX_DM_MENTION_THREADS env var
When enabled, @mentioning the bot in a DM creates a thread (default:
false). Supports both env var and YAML config (matrix.dm_mention_threads).
6 new tests, docs updated.

From #6957
2026-04-10 15:46:20 -07:00