Commit graph

816 commits

Author SHA1 Message Date
Teknium
c1eb2dcda7
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback

Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.

# What this PR makes true

1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
   detection banner with copy-pasteable remediation steps the moment
   they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
   a fresh install to 'core only' — the installer keeps every other
   extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
   lazy-install on first use under a strict allowlist, instead of
   eagerly pulling everything at install time.

# Detection: hermes_cli/security_advisories.py

- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
  mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
  dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
  the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
  re-banner after ack.
- Wired into:
  * hermes doctor — runs first, prints full remediation block
  * hermes doctor --ack <id> — dismisses an advisory
  * cli.py interactive run() and single-query branches — short
    stderr banner pointing at hermes doctor
  * gateway/run.py startup — operator-visible warning in gateway.log

# Lazy-install framework: tools/lazy_deps.py

- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
  memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
  uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
  pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
  HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
  * tools/tts_tool.py — _import_elevenlabs() calls ensure first
  * plugins/memory/honcho/client.py — get_honcho_client lazy-installs
  * tts.mistral / stt.mistral entries pre-registered for when PyPI
    restores mistralai

# Installer fallback tiers

scripts/install.sh, scripts/install.ps1, setup-hermes.sh:

- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
  array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
  PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
  landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
  the same _BROKEN_EXTRAS array so updates stay in sync.

Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).

# Config

hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: []  (advisory IDs the user has dismissed)
- allow_lazy_installs: True  (security gate for ensure())

No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.

# Tests

tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
  gateway_log_message
- shipped catalog well-formedness invariant

tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
  every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
  pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command

Combined: 63 new tests, all passing under scripts/run_tests.sh.

# Validation

- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
  tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
  tests/hermes_cli/test_doctor_command_install.py
  tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
  tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
  9191 passed, 8 pre-existing failures (verified on origin/main
  before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
  + gateway_log_message with mocked installed version → produces
  copy-pasteable remediation output

# Community

Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md

Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md

Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>

* build(deps): pin every direct dep to ==X.Y.Z (no ranges)

Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.

Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.

What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.

Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.

Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.

mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.

LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.

Validation:

- Cross-checked all 77 pinned direct deps in pyproject.toml against
  uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
  → 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.

* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra

You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.

# What this commit fixes

1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
   uv.lock records SHA256 hashes for every transitive — a compromised
   package with a different hash gets REJECTED. Falls through to the
   existing `uv pip install` cascade if the lockfile is missing or
   stale, with a loud warning that the fallback path does NOT
   hash-verify transitives. Previously only `setup-hermes.sh` (the dev
   path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
   (the paths fresh users actually run) skipped it.

2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
   project is fully quarantined right now — every version returns 404,
   so any pin we wrote was unresolvable, which broke `uv lock --check`
   in CI. Restoration is documented in pyproject.toml as a 5-step
   checklist (verify, re-add extra, re-enable in 4 modules, regenerate
   lock, optionally re-add to [all]).

3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
   jsonpath-python pruned. `uv lock --check` now passes.

# Defense-in-depth view

| Layer                      | Where             | Protects against                          |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject    | direct deps       | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph  | transitive worm injection                  |
| Tier-0 hash-verified path  | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate  | every PR          | drift between pyproject and lockfile      |
| `hermes_cli/security_advisories.py` | runtime  | cleanup for users who already got hit      |

The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.

# Validation

- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
  (test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
  test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.

* chore: remove community announcement drafts (PR body covers it)

* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)

Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.

Moved out of core dependencies = []:
- anthropic   (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client  (image gen; only when picked)
- edge-tts    (default TTS but still optional)

New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].

New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.

Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.

Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).

Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
2026-05-12 01:02:25 -07:00
kshitij
ce0f529cde
chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940)
C401:   set(x for x in y) -> {x for x in y}      (set comprehension)
C416:   [(k,v) for k,v in d] -> list(d.items())  (unnecessary listcomp)
C408:   tuple()/dict() -> ()/{}                   (unnecessary collection call)
PLR1722: exit() -> sys.exit()                     (adds import sys where needed)

21 instances fixed, 0 remaining. 19 files, +40/-36.
2026-05-11 11:20:58 -07:00
kshitij
2ec8d2b42f
chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937)
Replace  with  for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.

608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
2026-05-11 11:13:25 -07:00
kshitij
657874460f
chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926)
PLR5501 (collapsible-else-if): 28 instances — else: if: → elif:
PLR1730 (if-stmt-min-max):   15 instances — if x<y: x=y → x=max(x,y)
C420   (dict.fromkeys):       2 instances — dictcomp → dict.fromkeys
PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix)
C414   (unnecessary-list):    1 instance — sorted(list(x)) → sorted(x)

28 files, -44 net lines. All mechanical, zero logic changes.
17,211 tests pass, zero regressions.
2026-05-11 11:03:29 -07:00
0xbyt4
ace1c4ea8c fix(discord): typing indicator task not cleaned up after API error
When the Discord typing API call fails (rate limit, network error, 403),
_typing_loop returns early but the stale task remains in _typing_tasks.
Subsequent send_typing calls see the stale entry and skip, leaving no
typing indicator for the rest of the agent invocation.

Add finally block to _typing_loop to always remove the task from
_typing_tasks on exit, whether from cancellation, error, or normal
completion. This allows send_typing to create a fresh task.

3 new tests in test_discord_send.py:
- Task removed after API error
- Typing restartable after failure
- stop_typing cleans up
2026-05-10 22:41:26 -07:00
wilsen0
ac95b8cdbe perf(gateway): tune Telegram cadence + adaptive fast-path for short replies
Re-authored against current main from PR #10388 by @wilsen0.  The
original branch is 3800+ commits stale and could not be cherry-picked
without reverting unrelated work; this change carries only the perf
intent forward.

Tuning summary
==============

Text-batch ingress (gateway/platforms/telegram.py):
  - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3
  - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0
  - Adaptive fast-path tiers in _flush_text_batch:
      total <= 320 cp -> min(cap, 0.18)
      total <= 1024 cp -> min(cap, 0.24)
      else            -> cap
    A single short reply now reaches the agent in ~180ms instead of
    600ms.  Tier constants compose with the configured cap via min()
    so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS
    below 0.18 still wins on every tier.
  - _env_float_clamped helper replaces bare float(os.getenv()).
    Rejects NaN / Inf, applies optional min/max bounds.  Used for
    text-batch + media-batch knobs.  Prevents asyncio.sleep(NaN)
    crashes when an operator typos an env var.

Stream cadence (gateway/config.py + stream_consumer.py):
  - StreamingConfig.edit_interval default 1.0s -> 0.8s
  - StreamingConfig.buffer_threshold default 40 -> 24 chars
  - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now
    a single source of truth.  StreamConsumerConfig imports them
    instead of duplicating the literals; the prior dual-source drift
    is fixed.

Tool progress (gateway/display_config.py):
  - Telegram default tool_progress 'all' -> 'new'.  Inside
    Telegram's ~1 edit/s flood envelope the 'all' default would
    accumulate edit pressure on busy chats; 'new' shows only the
    leading bubble per tool batch and feels less spammy.
  - Slack tier_low override (tool_progress='off') is preserved.

Composition with native draft streaming (#23512)
================================================

The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH
the draft path (send_draft) and the edit path (edit_message), so the
tighter cadence helps native draft as much as edit-based.  The
text-batch fast-path applies before the consumer starts, so it speeds
up the first-token latency on every transport.  No conflict.

Stale-base avoidance
====================

Re-authored from scratch rather than cherry-picked.  Dropped from the
original branch:
  - Unrelated d2f043f9c 'fix(anthropic): preserve third-party thinking
    continuity' commit
  - boot_md.py builtin gateway hook (unrelated)
  - Reverted Slack tool_progress='off' (#14663) restoration
  - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO
    members deletion
  - 2300+ lines of run.py base-skew noise

Tests
=====

New tests/gateway/test_telegram_text_batch_perf.py:
  - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds).
  - 4 tests for the adaptive-tier composition rules.

Updated tests/gateway/test_display_config.py:
  - test_platform_default_when_no_user_config: 'all' -> 'new' for
    Telegram, with comment.
  - test_high_tier_platforms: split into Telegram-overrides-to-new
    and Discord-stays-all assertions.

Closes #10388.

Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
2026-05-10 22:22:25 -07:00
kjames2001
bf1f40996f fix(telegram): split-and-deliver oversized edits instead of silent truncation
When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit,
the adapter caught the BadRequest, best-effort truncated the content
with '…', and returned SendResult(success=True). The stream consumer
believed the full edit was delivered and never recovered, silently
dropping everything past the truncation boundary on long replies.

Returning failure isn't safe either — the consumer's existing fallback
path can race against the next streaming tick, producing duplicate
sends or gaps. Instead, the adapter now SPLITS the oversized payload
across the existing message + new continuation messages, so the user
always gets the full reply in correct order.

How it works:

1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH,
   call the new _edit_overflow_split helper directly — saves a doomed
   round-trip + a Telegram error.

2. Reactive: if Telegram still returns 'message_too_long' after the
   pre-flight (e.g. parse_mode formatting inflated the payload past
   the limit via MarkdownV2 escapes), the same helper handles it.

3. _edit_overflow_split:
   - Splits via truncate_message(len_fn=utf16_len) — same chunking the
     non-streaming send() path uses; chunks get '(1/N)' suffixes.
   - Edits the original message_id with chunk 1 (with parse_mode +
     plain-fallback when finalize=True, mirroring the main edit path).
   - Sends each remaining chunk via self._bot.send_message threaded as
     a reply to the previous chunk so the user sees them as a
     contiguous block. MarkdownV2-with-plain-fallback per chunk on
     finalize.
   - Returns SendResult(success=True, message_id=<last_chunk_id>,
     continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the
     stream consumer can keep editing the most recent visible message
     and the gateway has full visibility into every message id.

SendResult contract extension:

  Added optional continuation_message_ids: tuple = () field. When
  empty (the common case), behavior is unchanged. When populated, the
  caller knows the adapter delivered across multiple platform messages.

Stream consumer integration:

  GatewayStreamConsumer._send_or_edit advances _message_id to the
  last-continuation id when it sees continuation_message_ids on a
  successful edit result, resets _last_sent_text (the new visible
  message holds only the final chunk's text), and fires
  on_new_message so tool-progress bubbles linearize below the new
  continuation rather than the original. Mirrors the openclaw #32535
  inter-tool-leak guard.

Composes with what just landed:

  - PR #23455 (UTF-16 length-aware splitting in stream consumer)
    prevents most overflows upstream by measuring text in UTF-16
    codeunits before deciding to split. This PR is the safety net at
    the adapter boundary.
  - PR #23512 (native draft streaming, default for DM Telegram) routes
    DM streaming through send_draft, which has its own contract
    unaffected by this change. So this fix narrows in scope to the
    edit-based path: groups, supergroups, forum topics, every
    non-Telegram platform, and the per-response fallback after a
    draft failure.

Salvage notes:

  - Cherry-picked from PR #19537 by @kjames2001. Original PR returned
    failure on overflow; this evolves to split-and-deliver so users
    never lose content and the consumer state stays consistent.
  - Dropped an unrelated model-picker hunk (line 2114-2117) that
    silently killed the 'X more available — type /model <name>
    directly' hint by hardcoding total=len(models). Not in scope.
  - Restored the timeout-aware retryable=not is_timeout signal in
    send()'s fallthrough catch block.

Closes #19537.
2026-05-10 22:02:56 -07:00
NivOO5
4ed293b38e feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+)
Adds Telegram's native streaming-draft API as a streaming transport so DM
replies render with smooth animated previews as tokens arrive, dropping
the per-edit jitter of the legacy editMessageText polling path.

Adapter contract (gateway/platforms/base.py):
  - supports_draft_streaming(chat_type, metadata) -> bool. Default False.
    Telegram returns True only for DMs and only when the bound python-
    telegram-bot version exposes Bot.send_message_draft (PTB 22.6+).
  - send_draft(chat_id, draft_id, content, metadata) -> SendResult.
    Default raises NotImplementedError. Telegram delegates to PTB's
    send_message_draft. Drafts have no message_id (Bot API contract);
    SendResult.message_id is None on success.

Telegram adapter (gateway/platforms/telegram.py):
  - supports_draft_streaming gates on chat_type='dm' AND PTB capability.
  - send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads
    message_thread_id through metadata, and routes failures back as
    SendResult(success=False, error=...) so the consumer can fall back.

Stream consumer (gateway/stream_consumer.py):
  - StreamConsumerConfig gains transport ('auto'|'draft'|'edit'|'off')
    and chat_type fields.
  - run() resolves _use_draft_streaming once via a probe at the top of
    the run, allocating a fresh class-wide draft_id_counter so each
    response animates as its own preview (no animation collision across
    consecutive responses to the same chat).
  - _send_or_edit gains a pre-edit branch: when drafts are active AND
    not finalizing AND no edit-path message_id is established, the
    frame routes through _send_draft_frame instead of edit_message.
    Drafts intentionally do NOT set _already_sent so the gateway's
    final sendMessage path still fires — drafts have no message_id and
    the user needs a real message in their chat history.
  - _reset_segment_state bumps the draft_id when the consumer is in
    draft mode so each text block after a tool boundary animates as a
    fresh preview below the tool-progress bubble (avoids the inter-
    tool-call leak openclaw documented in their #32535).
  - Per-response fallback: any send_draft failure (transient network,
    server reject, capability gap) flips _use_draft_streaming to False
    for the rest of the run, gracefully returning to the edit path.

Gateway config (gateway/config.py):
  - StreamingConfig.transport default flips edit -> auto. The auto path
    is identical to edit on every chat type that doesn't currently
    support drafts (groups, supergroups, forum topics, every non-
    Telegram platform), so the default is backwards-compatible for
    non-DM users.

Lifecycle model (Telegram Bot API 9.5):
  1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble.
  2. Repeated sendMessageDraft calls with the SAME draft_id animate
     the preview as text grows.
  3. Drafts have no message_id and cannot be edited or deleted.
  4. When the response finishes the gateway's normal sendMessage path
     delivers the final answer; the draft preview clears naturally on
     the client and the user sees a real message in their history.

Inspired by PR #3412 by @NivOO5. Re-authored against current main
(stream_consumer.py is now ~4x larger than at #3412's branch base, with
new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the
original PR didn't account for) but the design call (DM-only, edit-
fallback, transport=auto|draft|edit|off) is faithful to the original
proposal, with two improvements baked in:

  1. Per-response draft_id (monotonic counter, not a time hash) — no
     collision risk across consecutive responses on the same chat.
  2. Tool-boundary draft_id bump — prevents the inter-tool-call leak
     openclaw hit during their rollout (their #32535).

Closes #21439 (duplicate feature request).
2026-05-10 20:02:50 -07:00
rahimsais
737314fe91 fix(telegram): normalize dm threads and retry control sends
Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id
issue (#3206):

1. _build_message_event filters DM thread_ids: only preserve thread_id
   for real topic messages (is_topic_message=True). Telegram puts
   message_thread_id on every DM that is a reply, but reply-chain ids
   route to nonexistent threads on send.

2. _send_message_with_thread_fallback helper: control sends
   (send_update_prompt, send_exec_approval / send_slash_confirm,
   send_model_picker) retry once without message_thread_id when
   Telegram returns BadRequest 'Message thread not found'. Mirrors
   the pattern PR #3390 added for the streaming send path.

Salvage notes:
- Conflict 1 (line ~4099): merged the contributor's DM is_topic_message
  filter with the existing forum General-topic default from #22423,
  preserving both behaviors.
- Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416)
  alongside the new helper. Tightened the helper's exception catch
  from bare 'Exception' to use the existing _is_bad_request_error +
  _is_thread_not_found_error helpers (line 484-496) for consistency
  with the streaming send path.
- Widened the fix to send_update_prompt (was bare self._bot.send_message,
  same bug class).

Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@
local commit author).
2026-05-10 18:09:31 -07:00
Aubrey Freeman III
c0da5d09a6 fix: use UTF-16 length for Telegram stream consumer message splitting
The stream consumer measured message length using Python's len() (Unicode
code points), but Telegram's actual limit is in UTF-16 code units. This
caused messages with supplementary characters (emoji, CJK, etc.) to exceed
Telegram's 4096-character limit, resulting in truncated messages with
formatting artifacts.

Changes:
- Add message_len_fn property to BasePlatformAdapter (defaults to len)
- Override in TelegramAdapter to return utf16_len
- Stream consumer uses adapter.message_len_fn for:
  - safe_limit calculation
  - overflow detection
  - truncate_message calls
  - split point calculation (via _custom_unit_to_cp)
  - fallback final send chunking

Fixes truncated messages with black square artifacts on Telegram when
the model generates responses containing multi-byte Unicode characters.
2026-05-10 16:21:07 -07:00
hrygo
ff14666cdc fix(gateway): stream consumer first message drops thread context
Cherry-picked from PR #13077 commits:
- 5500c7d8 fix(gateway): stream consumer first message drops thread context
- e84403b9 test(gateway): add regression tests for stream consumer thread routing

Fixes: Streaming first message drops thread/topic context in Feishu group
topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id
ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and
_send_new_chunk. Also fixes Feishu _send_raw_message fallback path
(reply -> create) to use receive_id_type='thread_id' so the new message
lands in the correct topic instead of the main channel.

Authored by hrygo via PR #13077 (re-attributed from the bot-authored
salvage commit on the original branch).
2026-05-10 15:20:40 -07:00
teknium1
00ce5f04d9 feat(session): make /handoff actually transfer the session live
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.

State machine on sessions table replaces the boolean flag:
  None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.

CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.

Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:

  1. Resolves the platform's home channel.
  2. Asks the adapter for a fresh thread via the new
     create_handoff_thread(parent_chat_id, name) capability so the
     handed-off conversation gets its own scrollback. Adapters that
     don't support threads (or fail) return None and the watcher
     falls back to the home channel directly.
  3. Constructs a SessionSource keyed as 'thread' when a thread was
     created, 'dm' otherwise, then session_store.switch_session
     re-binds the destination key to the CLI session_id. The full
     role-aware transcript replays via load_transcript on the next
     turn (no flat-text injection into context_prompt).
  4. Forges a synthetic MessageEvent(internal=True) with the handoff
     notice and dispatches through _handle_message; the agent runs
     against the loaded transcript and adapter.send delivers the
     reply.
  5. Marks the row 'completed' on success, 'failed' (+error) on any
     exception.

Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:

  - Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
    so DM topics (Bot API 9.4+) and forum supergroups both work.
  - Discord (gateway/platforms/discord.py): parent.create_thread on
    text channels with a seed-message + message.create_thread
    fallback for permission edge cases. Skips DMs and other
    non-thread-capable parents.
  - Slack (gateway/platforms/slack.py): posts a seed message and
    returns its ts as the thread anchor — Slack threads are
    message-anchored.

In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.

CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).

Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.

Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
2026-05-10 13:06:25 -07:00
Teknium
50f9fee988
feat(gateway): add LINE Messaging API platform plugin (#23197)
* feat(gateway): add LINE Messaging API platform plugin

Adds LINE as a bundled platform plugin under `plugins/platforms/line/`,
synthesized from the strongest pieces of seven open community PRs. The
adapter requires zero core edits — `Platform("line")` is auto-discovered
via the bundled-plugin scan in `gateway/config.py`, and all hooks
(setup, env-enablement, cron delivery, standalone send) are wired
through `register_platform()` kwargs the way IRC and Teams do it.

Highlights merged into one plugin:

- **Reply token preferred, Push fallback.** Try the free reply token
  first (single-use, ~60s TTL); fall back to metered Push when the
  token is absent, expired, or rejected. (PR #21023)
- **Slow-LLM Template Buttons postback.** When the LLM is still running
  past `LINE_SLOW_RESPONSE_THRESHOLD` (default 45s), the adapter burns
  the original reply token to send a "Get answer" button bubble. The
  user taps it to fetch the cached answer via a fresh reply token —
  also free. State machine: PENDING → READY → DELIVERED, ERROR for
  cancelled runs (orphan resolves to `LINE_INTERRUPTED_TEXT` after
  /stop). Set threshold to 0 to disable. (PR #18153)
- **Three-allowlist gating** — separate user / group / room allowlists
  with `LINE_ALLOW_ALL_USERS=true` dev-only escape hatch. (PR #18153)
- **Markdown URL preservation.** Strip bold/italic/code-fence/heading
  markers (LINE renders them literally) but keep `[label](url)` →
  `label (url)` so URLs stay tappable. (PR #18153)
- **System-message bypass** for ` Interrupting`, ` Queued`, etc. —
  busy-acks reach the user as visible bubbles instead of being
  swallowed into the postback cache. (PR #18153)
- **Media via public HTTPS URLs.** LINE doesn't accept binary uploads;
  images/audio/video must be HTTPS-reachable. The adapter serves
  registered tempfiles under `/line/media/<token>/<filename>` from the
  same aiohttp app. Allowed-roots traversal guard covers
  `tempfile.gettempdir()`, `/tmp` (→ `/private/tmp` on macOS), and
  `HERMES_HOME`. `LINE_PUBLIC_URL` overrides URL construction for
  setups behind tunnels/proxies. (PR #8398)
- **5-message-per-call batching.** LINE rejects >5 messages per
  Reply/Push; smart-chunker caps text at 4500 chars per bubble.
- **Inbound dedup** via `webhookEventId` LRU. (PR #21023)
- **Self-message filter** via `/v2/bot/info` userId lookup. (PR #21023)
- **Loading-animation indicator** wired to LINE's `chat/loading/start`
  endpoint, DM-only (LINE rejects it for groups/rooms). (PR #21023)
- **Out-of-process cron delivery** via `_standalone_send`, so
  `deliver: line` cron jobs work even when cron runs detached from
  the gateway.
- **Webhook hardening** — 1 MiB body cap, constant-time HMAC-SHA256
  signature verification, dedup, scoped lock so two profiles can't
  bind the same channel.

Validation
----------

- `scripts/run_tests.sh tests/gateway/test_line_plugin.py` →
  73 passed in 1.05s
- `scripts/run_tests.sh tests/gateway/test_line_plugin.py
  tests/gateway/test_irc_adapter.py
  tests/gateway/test_plugin_platform_interface.py
  tests/gateway/test_platform_registry.py
  tests/gateway/test_config.py` → 193 passed, 7 skipped
- E2E import + register + signature roundtrip + `Platform("line")`
  bundled-plugin discovery verified against current `origin/main`.

Closes the seven open LINE PRs (#18153, #16832, #6676, #21023, #14942,
#14988, #8398) by superseding them with a single plugin-form
implementation that takes the best idea from each.

Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>

* docs(platforms): document platform-specific slow-LLM UX pattern

Add a 'Platform-Specific Slow-LLM UX' section to the platform-adapter
developer guide covering the _keep_typing override pattern that LINE
uses for its Template Buttons postback flow.

Three subsections:
- Pattern: subclass _keep_typing to layer mid-flight UX (with code)
- Pattern: subclass send to route through a cache instead of sending
- When this pattern is appropriate (vs. always-Push fallback)

Plus a short pointer in gateway/platforms/ADDING_A_PLATFORM.md so
tree-readers find the prose walkthrough on the docsite.

Filed because the LINE plugin (PR #23197) was the first bundled
adapter to need this pattern — every prior plugin (irc, teams,
google_chat) handles slow responses with the default typing-loop and
a regular send_text. Documenting now while the rationale is fresh.

---------

Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>
2026-05-10 06:40:46 -07:00
Teknium
448c11f16d fix(telegram): default notifications to 'important' (silence intermediate)
Per-tool-call push notifications on Telegram are noisy enough that
'all' is the wrong default — long agent runs spam the user's notification
shade with status messages they didn't ask to be pinged about. Final
responses, approval prompts, and slash confirmations still notify;
intermediate progress, streaming, and tool-progress messages now
deliver silently via disable_notification.

Users who want the legacy behavior can opt back in with:
  display:
    platforms:
      telegram:
        notifications: all
or HERMES_TELEGRAM_NOTIFICATIONS=all.
2026-05-09 13:38:25 -07:00
Denis
236f3b0521 feat(gateway): add Telegram notification mode to suppress intermediate push notifications
Add a configurable notifications mode for the Telegram platform adapter
that controls which messages trigger push notifications.

- display.platforms.telegram.notifications: "all" (default) | "important"
- HERMES_TELEGRAM_NOTIFICATIONS env var override
- In "important" mode, all sends use disable_notification=True except:
  - Approvals (send_exec_approval) and slash confirmations
  - Final response messages (metadata["notify"]=True)
- Zero overhead in default "all" mode
- Zero impact on non-Telegram platforms

Closes #22771
2026-05-09 13:38:25 -07:00
Wesley Simplicio
a671d8a27a fix(email): use real hermes version in IMAP ID command 2026-05-09 13:35:50 -07:00
Wesley Simplicio
3fd4ccbd8b fix(email): send IMAP ID extension to support 163/NetEase mailbox
163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe
Login` unless the client first identifies itself via the RFC 2971 ID
command after LOGIN.  Without this, the email gateway logs in OK but
then fails on the very first poll and the connection is torn down.

Send the ID payload best-effort after both `imap.login()` sites
(`EmailAdapter.connect` and `_fetch_new_messages`).  Failures are
swallowed at debug level so non-supporting IMAP servers (Gmail,
Outlook, Fastmail, Yahoo, etc.) keep working unchanged.

Closes #22271
2026-05-09 13:35:50 -07:00
Teknium
ea2d66ddc0
perf(gateway): defer QQAdapter and YuanbaoAdapter imports via PEP 562 (#22790)
`gateway/platforms/__init__.py` eagerly imported `QQAdapter` and
`YuanbaoAdapter` at package-init time, which transitively pulled in
qqbot's chunked-upload + keyboards + onboard machinery and yuanbao's
websocket stack. About 84 ms wall and 23 MB RSS on every fresh process
that touched anything under `gateway.platforms` — including `hermes
chat` (via run_agent → cli's plugin discovery transitive import).

Nothing in the codebase actually consumes these symbols from the
package root; every real call site uses the long-form path
(`from gateway.platforms.qqbot import QQAdapter`,
`from gateway.platforms.yuanbao import YuanbaoAdapter` in gateway/run.py).
The eager re-export was only there for convenience.

Replace with a PEP 562 module-level `__getattr__` that lazily imports
on first attribute access. Public API stays identical:
`from gateway.platforms import QQAdapter` keeps working but only
pays the import cost when the symbol is actually touched. `__dir__`
preserves help() / autocomplete behavior.

Measured impact (7-run medians, 9950X3D):
  import gateway.platforms        127 →  43 ms  (-66%)
                                   50 →  27 MB  (-46%)
  import gateway.platforms.base   127 →  44 ms  (-65%)
                                   50 →  27 MB  (-46%)
  import cli (full chat path)     745 → 710 ms  ( -5%)
                                   96 →  90 MB  ( -6%)
  hermes chat -q (cold)                  -5 MB

The per-import win is biggest because qqbot/yuanbao deps don't overlap
with anything on the gateway-platforms path — full `import cli`
already loads aiohttp/websockets transitively from other places, so
the marginal CLI win is smaller than the isolated import benchmark.
The `gateway.platforms.base` win is what matters most for long-lived
gateway processes: every gateway boot saves 23 MB resident.

All 144 qqbot tests pass; broader gateway suite (5132 tests) passes
modulo 4 pre-existing flakes also failing on main without this change.
2026-05-09 13:17:48 -07:00
Teknium
2124ad72a2
fix(api-server): emit length/error finish_reason for truncation/failure (#22775)
Non-streaming /v1/chat/completions wrapped any AIAgent result \u2014 including
partial/failed runs \u2014 as a successful 200 with finish_reason='stop' and
the internal failure string substituted into message.content. API
clients had no way to distinguish 'agent answered: X' from
'agent crashed and the X you see is its error message'.

After the fix:
  - completed: True             \u2192 200 finish_reason='stop' (unchanged)
  - partial + truncated text    \u2192 200 finish_reason='length' + hermes extras
  - partial + no text / failed  \u2192 502 OpenAI error envelope (SDKs raise)
  - other failures              \u2192 200 finish_reason='error' + hermes extras

Adds X-Hermes-Completed / X-Hermes-Partial / X-Hermes-Error headers
plus a 'hermes' extras object on partial responses for clients that
want the full picture.

Closes #22496.
2026-05-09 12:48:08 -07:00
kshitijk4poor
dae94fa652 fix: follow-up for salvaged PR #22263
- Restore allowed_chats gate before thread_id check so ignored_threads
  applies universally (even to guest mentions).
- Compute _message_mentions_bot once in _should_process_message to
  eliminate redundant second entity scan when guest_mode=true and the
  message does not mention the bot.
- Remove redundant _is_group_chat from _is_guest_mention (caller already
  verified the message is a group chat).
- Update _telegram_allowed_chats docstring to note guest_mode exception.
- Add test coverage: bot_command entity, text_mention entity,
  caption_entities, and ignored_threads + guest_mode interaction.
- Add nik1t7n to AUTHOR_MAP.
2026-05-09 11:54:04 -07:00
Nikita Nosov
55f518e521 feat(gateway): add Telegram guest mention mode 2026-05-09 11:54:04 -07:00
Teknium
684fd14db0 fix(dingtalk): align override signatures with base + guard Optional[error] in tests 2026-05-09 11:11:10 -07:00
qWaitCrypto
c705c7ac9b fix(dingtalk): clarify webhook media behavior 2026-05-09 11:11:10 -07:00
briandevans
854c2ce309 fix(telegram): honor message.quote for partial-quote reply context
When a Telegram user replies using the native quote feature to select
only part of a prior message, _build_message_event was injecting the
ENTIRE replied-to message into reply_to_text via
message.reply_to_message.text/caption. python-telegram-bot exposes
the user-selected substring as message.quote (TextQuote.text); we now
prefer that and fall back to the full replied-to text only when no
native quote is present.

The agent-visible "[Replying to: \"...\"]" prefix can otherwise expand
the user's narrow quote into the full prior message, causing the agent
to act on unrelated actionable-looking text the user did not select
(e.g. multi-item briefings where the user quotes one bullet but the
prefix injects every bullet). Falls back cleanly when message.quote
is absent (PTB <21 or replies that don't quote a substring).

Fixes #22619

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:10:36 -07:00
uzunkuyruk
8fdaf4d3d6 fix(telegram): exclude row-label column from bullet items in table rendering
When a GFM table has a row-label column (first column with no header),
_render_table_block_for_telegram incorrectly included the row-label cell
in the bullet zip alongside the data cells, producing a spurious bullet
like '• 維度: 核心賣點' before the real data rows.

Detect the row-label column by comparing the first data row cell count
against the header count (has_row_label_col = len(first_data_row) ==
len(headers) + 1). When present, use cells[0] as the heading and
zip headers against cells[1:] only, correctly excluding the row-label
from the bullet list.

Fixes #22604
2026-05-09 17:39:16 +03:00
Nikita Nosov
1ac8deb3ca feat(gateway): stream Telegram edits safely 2026-05-09 04:34:55 -07:00
GodsBoy
93e25ceb13 feat(plugins): add standalone_sender_fn for out-of-process cron delivery
Plugin platforms (IRC, Teams, Google Chat) currently fail with
`No live adapter for platform '<name>'` when a `deliver=<plugin>` cron
job runs in a separate process from the gateway, even though the
platforms are eligible cron targets via `cron_deliver_env_var` (added
in #21306). Built-in platforms (Telegram, Discord, Slack, etc.) use
direct REST helpers in `tools/send_message_tool.py` so cron can deliver
without holding the gateway in the same process; plugin platforms
historically depended on `_gateway_runner_ref()` which returns `None`
out of process.

This change adds an optional `standalone_sender_fn` field to
`PlatformEntry` so plugins can register an ephemeral send path that
opens its own connection, sends, and closes without needing the live
adapter. The dispatch site in `_send_via_adapter` falls through to the
hook when the gateway runner is unavailable, with a descriptive error
when neither path applies. The hook is optional, so existing plugins
are unaffected.

Reference migrations land in the same change for IRC, Teams, and
Google Chat, exercising the hook across stdlib (asyncio + IRC protocol),
Bot Framework OAuth client_credentials, and Google service-account
flows respectively.

Security hardening on the new code paths:
* IRC: control-character stripping on chat_id and message body to
  block CRLF command injection; bounded nick-collision retries; JOIN
  before PRIVMSG so channels with the default `+n` mode accept the
  delivery.
* Teams: TEAMS_SERVICE_URL validated against an allowlist of known
  Bot Framework hosts (`smba.trafficmanager.net`,
  `smba.infra.gov.teams.microsoft.us`) to block SSRF; chat_id and
  tenant_id constrained to the documented Bot Framework character set;
  per-request timeouts so a slow STS endpoint cannot starve the
  activity POST.
* Google Chat: chat_id and thread_id validated against strict
  resource-name regexes; service-account refresh wrapped in
  `asyncio.wait_for` so a hung token endpoint cannot stall the
  scheduler.

Test coverage: 20 new tests covering happy path, missing-config errors,
network failure modes, and each defensive validation. Existing tests
unchanged. `bash scripts/run_tests.sh tests/tools/test_send_message_tool.py
tests/gateway/test_irc_adapter.py tests/gateway/test_teams.py
tests/gateway/test_google_chat.py` reports 341 passed, 0 regressions.

Documentation: new "Out-of-process cron delivery" section in
website/docs/developer-guide/adding-platform-adapters.md and an entry
in gateway/platforms/ADDING_A_PLATFORM.md naming the hook.
2026-05-09 02:56:29 -07:00
heathley
7e578f02c8 feat(feishu): add native update prompt cards 2026-05-09 02:32:55 -07:00
kshitij
2a7047c2ed
fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043)
SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl
byte-range locks that don't reliably work on network filesystems. Upstream
documents this explicitly:
  https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode

On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises
'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before
this change, every feature backed by state.db or kanban.db broke silently:
  - /resume, /title, /history, /branch returned 'Session database not
    available.' with no cause
  - gateway logged the init failure at DEBUG (invisible in errors.log)
  - kanban dispatcher crashed every 60s, driving the known migration race
    (duplicate column name: consecutive_failures, #21708 / #21374)

Changes:
  - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL
    and falls back to DELETE on SQLITE_PROTOCOL-style errors with one
    WARNING explaining why
  - hermes_state.get_last_init_error() + format_session_db_unavailable():
    capture the init failure cause and surface it in user-facing strings
    (with an NFS/SMB pointer for 'locking protocol')
  - hermes_cli/kanban_db.connect(): use the shared helper
  - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING
    (matches cli.py's existing correct behavior)
  - cli.py (4 sites) + gateway/run.py (5 sites): replace bare
    'Session database not available.' with format_session_db_unavailable()

Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new
test in tests/hermes_cli/test_kanban_db.py. Existing suites (state,
kanban, gateway, cli) remain green for all tests unrelated to pre-existing
failures on main.

Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home,
local_lock=none) reporting 'Session database not available.' on /resume;
'locking protocol' appears in 4 distinct log entries across backup,
kanban, TUI, and CLI paths in the same session.

closes #22032
2026-05-09 02:09:35 -07:00
kshitijk4poor
aef297a45e fix(telegram): skip send_chat_action for DM topic reply-fallback lanes
The send path uses Hermes' reply-anchor fallback for DM topic lanes
(message_thread_id + reply_to_message_id), but send_chat_action only
accepts message_thread_id — Telegram's Bot API 10.0 rejects it for
these lanes. Without this short-circuit, every typing tick (~every 2s
during agent runs) makes a doomed API call that gets logged as a
'thread not found' debug warning. Skip the call entirely when the
metadata indicates a DM topic reply-fallback lane; the user-visible
behavior is unchanged (no typing indicator either way for these
lanes), but the logs stay clean.

Identified during salvage review of #22053.
2026-05-09 01:39:37 -07:00
Jhin Lee
b3239572f0 fix(telegram): preserve DM topic routing via reply fallback 2026-05-09 01:39:37 -07:00
Teknium
cc38282b04 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 14:27:40 -07:00
Teknium
324567c936 fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper
On Windows, Python's ``os.kill(pid, 0)`` is NOT a no-op. CPython's
implementation (``Modules/posixmodule.c::os_kill_impl``) treats sig=0
as ``CTRL_C_EVENT`` because the two integer values collide at the C
layer, and routes it through ``GenerateConsoleCtrlEvent(0, pid)`` —
which sends a Ctrl+C to the ENTIRE console process group containing
the target PID, not just the PID itself. Any caller that wanted to
check "is PID X alive" via the classic POSIX ``os.kill(pid, 0)``
idiom was silently killing that process (and often unrelated
processes in the same console group) on Windows. Long-standing
Python Windows quirk; see bpo-14484 (open since 2012).

This manifested in Hermes as: every ``hermes gateway status``
invocation would read the gateway's PID from the PID file, call
``os.kill(pid, 0)`` via ``gateway.status.get_running_pid()`` as a
"liveness check", and instantly terminate the gateway it was trying
to report on. No shutdown log, no traceback, no atexit hook fire,
no exit-diag entry — just silent termination of the detached pythonw
process. "Bot answered one message then stopped typing" was the
characteristic end-user symptom because `os.kill(pid, 0)` fires
mid-response-send and kills the gateway between logs.

Reproduction (verified in this branch before the fix):

  $ hermes gateway start       # gateway alive, PID 37520
  $ hermes gateway status      # reports "No gateway process detected"
  $ tasklist /FI "PID eq 37520"  # INFO: No tasks are running
                                 # — gateway terminated silently

Root-cause fix is a new ``gateway.status._pid_exists(pid)`` helper:

- On Windows: Win32 ``OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION |
  SYNCHRONIZE, False, pid)`` + ``WaitForSingleObject(handle, 0)``
  via ctypes. Zero signal delivery, zero console-group side effects.
  Pins ctypes return types to avoid DWORD-vs-signed-int parse bugs
  on WAIT_TIMEOUT (0x102). Distinguishes ERROR_INVALID_PARAMETER
  (PID gone) from ERROR_ACCESS_DENIED (alive but another user).
- On POSIX: the canonical ``os.kill(pid, 0)`` idiom that actually is
  a no-op there.

Then patch every ``os.kill(pid, 0)`` liveness-check callsite to
route through ``_pid_exists`` instead. Total 14 callsites across
11 files; every single one was a latent silent-kill on Windows:

  gateway/run.py:2810      — /restart watcher (inline subprocess)
  gateway/run.py:15195     — --replace wait loop
  gateway/status.py:572    — acquire_gateway_runtime_lock stale check
  gateway/status.py:828    — get_running_pid (THE killer for status)
  gateway/platforms/whatsapp.py:111
  hermes_cli/gateway.py:228, 522, 1012  — gateway-related drain loops
  hermes_cli/kanban_db.py:2826         — _pid_alive was claiming to
                                         be cross-platform but used
                                         os.kill(pid, 0) on Windows
  hermes_cli/main.py:5792        — CLI process-kill polling
  hermes_cli/profiles.py:782     — profile stop wait loop
  plugins/google_meet/process_manager.py:74
  tools/browser_tool.py:1215, 1255  — browser daemon ownership probes
  tools/mcp_tool.py:1255, 3374     — MCP stdio orphan tracking

The watcher source in gateway/run.py:2810 is a multi-line string
that gets spawned as an inline ``python -c "..."`` subprocess, so
it can't import gateway.status. The fix for that callsite inlines
the same ctypes probe directly into the watcher source.

Tested on Windows 10 with the hermes gateway + Telegram bot:
- gateway start → alive
- 5 consecutive ``hermes gateway status`` invocations → gateway
  alive after every one, same PID reported each time (37520, 21952)
- gateway.log shows uninterrupted operation; no spurious shutdown
  entries; cron ticker and kanban dispatcher still running on
  their 60-second cadence
- bot continues answering Telegram messages throughout

Ships alongside an exit-path diagnostic wrapper in
``hermes_cli/gateway.py::run_gateway()`` that captures every way
``asyncio.run(start_gateway(...))`` can return (success, SystemExit,
KeyboardInterrupt, BaseException, atexit) with full traceback to
``logs/gateway-exit-diag.log``. This was used to prove the gateway
was being hard-killed externally (no exit event fired) and should
be kept for future Windows debugging.

Refs: https://bugs.python.org/issue14484
See also: references/windows-subprocess-sigint-storm.md in
the hermes-agent skill.
2026-05-08 14:27:40 -07:00
Teknium
cbce5e93fc codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.

Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs).  That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.

After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly.  Works identically on every platform
and every locale, no surprise behavior.

Mechanical sweep via:
  ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix     --exclude 'tests,venv,.venv,node_modules,website,optional-skills,               skills,tinker-atropos,plugins' .

All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8').  Nothing
else changed.  Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).

Scope notes:
  - tests/ excluded: test fixtures can use locale encoding intentionally
    (exercising edge cases).  If we want to tighten tests later that's
    a separate PR.
  - plugins/ excluded: plugin-specific conventions may differ; plugin
    authors own their code.
  - optional-skills/ and skills/ excluded: skill scripts are user-authored
    and we don't want to mass-edit them.
  - website/ and tinker-atropos/ excluded: vendored / generated content.

46 files touched, 89 +/- lines (symmetric replacement).  No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
2026-05-08 14:27:40 -07:00
Teknium
e93bfc6c93 feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.

Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.

## New module

- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
  windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
  All no-ops on non-Windows.

## CRITICAL fixes (would crash or silently break on Windows)

- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
  AttributeError on import on Windows, breaking `hermes --tui` entirely (it
  spawns this module as a subprocess).  Guard each signal.signal() call with
  hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.

- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
  unguarded.  os.WNOHANG doesn't exist on Windows.  Gate the whole reap loop
  behind `os.name != "nt"` — Windows has no zombies anyway.

- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
  most Windows builds.  Fall back to loopback TCP (AF_INET on 127.0.0.1:0
  ephemeral port) when _IS_WINDOWS.  HERMES_RPC_SOCKET env var now accepts
  either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
  Generated sandbox client parses both.

- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded.  Use
  shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
  readable error when bash is genuinely absent.

- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
  (npm install + node version probe), browser_tool.py x2.  On Windows npm
  is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
  fails with WinError 193.  shutil.which(...) returns the absolute .cmd
  path which CreateProcessW accepts because the extension routes through
  cmd.exe /c.  POSIX behaviour unchanged (shutil.which still returns the
  same path subprocess would resolve itself).

## HIGH fixes (silent misbehaviour on Windows)

- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
  Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
  via MSYS2's virtual /tmp but native Python couldn't open.  Result: cwd
  tracking silently broken — `cd` in terminal tool did nothing.  Windows
  branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
  (works in both bash and Python, guaranteed no spaces).

- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
  in split(":")` heuristic mangles Windows PATH (";" separator).  Gate
  the injection behind `not _IS_WINDOWS`.

- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
  Popen + watcher-script Popen both used start_new_session=True, which
  Windows silently ignores.  Watcher stayed attached to CLI's console,
  died when user closed terminal after `hermes update`, left gateway
  stale.  Now branches through windows_detach_popen_kwargs() helper
  (CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
  Windows, start_new_session=True on POSIX — identical to main).

## MEDIUM fixes

- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
  chain crashes on Windows when user triggers /update in-gateway.  Now
  has sys.platform=="win32" branch using sys.executable + a tiny
  Python watcher with proper detach flags.  POSIX path is unchanged.

- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
  style paths that break subprocess.Popen(cwd=...) and Path().resolve().
  Added _normalize_git_bash_path() helper that translates /c/Users,
  /cygdrive/c, /mnt/c variants to native C:\Users form.  POSIX no-op.
  _git_repo_root() now routes every result through it.

- cli.py worktree .worktreeinclude: os.symlink on directories failed
  hard on Windows (requires admin or Developer Mode).  Falls back to
  shutil.copytree with a warning log.

## Tests

- 29 new tests in tests/tools/test_windows_native_support.py covering:
  subprocess_compat helpers, TUI entry signal guards, kanban waitpid
  guard, code_execution TCP fallback source-level invariants, cron bash
  resolution, npm/npx bare-spawn lint per-file, local env Windows temp
  dir, PATH injection gating, git bash path normalization, symlink
  fallback, gateway detached watcher flags.

- One existing test assertion adjusted in test_browser_homebrew_paths:
  it compared captured Popen argv to the BARE `"npx"` literal; after the
  shutil.which() change argv[0] is the absolute path.  New assertion
  checks the shape (two items, second is `agent-browser`) rather than
  the exact first-item string.  Behaviour unchanged; test was too strict.

All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.

## What's still deferred (LOW priority)

- Visible cmd-window flashes on short-lived console apps (~14 sites) —
  cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
  hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
  reachable only when all env-var candidates fail.
2026-05-08 14:27:40 -07:00
Teknium
b8d7e0e6d3 fix(msgraph_webhook): harden auth surface + IP allowlisting + response hygiene
Defense-in-depth polish on top of the webhook listener before it becomes
a real attack surface once the pipeline starts creating subscriptions
and Graph starts POSTing to the configured public URL.

- Timing-safe clientState comparison. Previously used `==` on strings;
  switches to hmac.compare_digest so a mismatch does not leak how many
  leading characters matched. client_state is documented as a strong
  shared secret (openssl rand -hex 32 in the setup docs), so a
  timing-safe primitive is the right call.

- Split GET and POST handlers. Graph validates a subscription by sending
  GET with validationToken in the query; anything else on GET is now a
  400 so the endpoint cannot be probed or mistakenly used for data
  exfil. Previously a bare GET fell through to the POST path and blew
  up on request.json() with a confusing 400.

- Empty response bodies on success. 202 is returned with no body so
  internal counters (accepted / duplicates / scheduled) do not leak to
  any caller that can reach the endpoint; counters remain observable
  via /health for operators. 403 on every-item-bad-clientState batches
  (so forged POSTs stop retrying), 400 on malformed / unknown-resource
  batches (sender configuration issue).

- Optional source-IP allowlist. New `allowed_source_cidrs` extra field
  (list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS`
  env var let operators restrict the webhook to Microsoft Graph's
  published webhook source ranges in production. Empty = allow all,
  preserving dev-tunnel / localhost workflows. Invalid CIDRs are
  logged and ignored rather than crashing. Also gates the handshake
  endpoint so disallowed IPs cannot probe it.

- Tests updated for the new response contract (empty-body 202,
  auth-only 403, config-error 400) and extended to cover: bare GET
  rejection, POST-with-validationToken handshake tolerance,
  timing-safe compare actually invoked via hmac.compare_digest spy,
  malformed body / missing value array, IP allowlist accept/reject
  paths, handshake IP allowlist, invalid CIDR entries, comma-string
  CIDR list parsing. 52/52 passed (was 40).

Full gateway suite: 5049 passed / 1 pre-existing failure in
test_discord_free_response (unrelated, reproduces on clean origin/main).
2026-05-08 10:29:58 -07:00
Dilee
26a59e4f6c fix(msgraph): normalize webhook dedupe and resource matching 2026-05-08 10:29:58 -07:00
Dilee
2a215de9af fix(msgraph): bound webhook receipt dedupe cache 2026-05-08 10:29:58 -07:00
Dilee
46a6f39024 feat(msgraph): add webhook listener platform 2026-05-08 10:29:58 -07:00
Zhicheng Han
526c0e018a feat(api-server): expose run approval events 2026-05-08 07:30:14 -07:00
JC
03ddff8897 fix(gateway): defer goal status notices until after response delivery
Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.
2026-05-07 17:33:09 -07:00
Teknium
2564132a1f
fix(telegram): preserve thread_id=1 for forum General typing indicator (#21390)
The May 5 refactor in d5357f816 made _message_thread_id_for_typing()
symmetric with _message_thread_id_for_send() by mapping the General
topic (thread id "1") to None upfront for both. That's correct for
sendMessage — Telegram rejects message_thread_id=1 on sends and the
topic must be omitted — but it's wrong for sendChatAction.

Observed behavior (confirmed via before/after Telegram wire traces):
  Before d5357f816: thread_id=1 → message_thread_id=1 → bubble visible in General
  After  d5357f816: thread_id=1 → message_thread_id=None → no visible typing

Omitting message_thread_id on sendChatAction does NOT fall back to
the General topic's view in a forum-enabled supergroup; the bubble
ends up hidden from the client's General-topic pane entirely. For
any user on a forum-group, the typing indicator stopped appearing.

Fix: drop the symmetric "1 → None" mapping from the typing resolver.
sendMessage still maps 1 → None via _message_thread_id_for_send (that
side was never broken). The asymmetry is real and required by
Telegram's API — document it in the resolver docstring.

Partial revert of d5357f816; restores the behavior from 0cf7d570e
("fix(telegram): restore typing indicator and thread routing for
forum General topic"). Does not re-introduce the retry-without-thread
fallback that 41545f7ec scoped down for DM topics — with the resolver
fixed, the first call already hits the right wire shape.

Test updated from test_send_typing_general_topic_uses_none_thread_id
(which encoded the broken contract) to
test_send_typing_preserves_general_topic_thread_id, asserting the
single correct call with message_thread_id=1. 10 other tests in the
file untouched and passing.
2026-05-07 08:39:21 -07:00
WideLee
4de3ef38b1 feat(qqbot): wire native tool-approval UX via inline keyboards
Makes the in-tree QQ inline keyboards actually light up when the agent
blocks on a dangerous-command approval. Matches the cross-adapter
gateway contract already implemented by Discord, Telegram, Slack,
Matrix, and Feishu.

Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval
and falls back to a text prompt when it's missing. Without this wiring,
QQ users stared at plain '/approve' text even though the adapter shipped
button primitives.

### send_exec_approval(chat_id, command, session_key, description, metadata)

Matches the signature the gateway calls with. Builds an ApprovalRequest
(command_preview, description, timeout) and delegates to send_approval_request.
Uses the last inbound msg_id as reply_to so QQ accepts the passive
message. The 'metadata' parameter is accepted for contract parity but
intentionally unused — QQ doesn't have thread_id/DM-targeting overrides.

### send_update_prompt(chat_id, prompt, default, session_key, metadata)

Signature updated to match the cross-adapter contract used by
'hermes update --gateway' watcher. Renders a 'Update Needs Your Input'
prompt with the optional default hint and a Yes/No keyboard. Replaces
the earlier 3-arg helper that wasn't wired anywhere.

### Default interaction dispatcher

_default_interaction_dispatch() auto-registered as the adapter's
interaction callback in __init__. Routes:

- approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval
  Button → choice mapping:
    allow-once  → 'once'
    allow-always → 'always'
    deny        → 'deny'
  (QQ's 3-button mobile layout deliberately collapses 'session' + 'always'
  into one button; /approve session text fallback remains available.)
- update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response
  (the detached 'hermes update --gateway' watcher polls this file)
- anything else → logged and dropped

Resolve exceptions are caught and logged — never propagate into the WS
loop. Callers can override via set_interaction_callback() to route
clicks elsewhere or pass None to drop them entirely.

### Net effect

QQ users now get native tap-to-approve UX on dangerous-command prompts
and update-confirmation prompts, without having to type /approve or /deny
as text. The adapter hooks into tools.approval the same way every other
button-capable platform does.

### Tests

14 new tests cover:
- Default callback installed on __init__
- send_exec_approval / send_update_prompt exist as class methods (so the
  gateway's type-probe detects them)
- allow-once/always/deny each map to the correct resolve choice
- update_prompt:y / update_prompt:n each write atomically to the response
  file (via monkeypatched get_hermes_home)
- Unknown button_data / empty button_data / resolve exceptions are harmless
- send_exec_approval honours last_msg_id reply-to and accepts metadata
- send_update_prompt delegates with correct content + keyboard

Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc).
Also ran tools/test_approval.py alongside — no regressions (276 passed
combined).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:48:15 -07:00
teknium1
898b6d7d55 fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs
Follow-up to the previous commit:
- Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1,
  ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is
  treated as non-loopback since unset usually means public default bind.
- Fix mixed-indent comment in the safety rail (comment now aligned with
  the if-block) and collapse the nested-if into one condition.
- Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN
  IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level
  parametrized coverage of _is_loopback_host for spellings we can't bind
  in the hermetic test env (::1, ip6-localhost, ip6-loopback).
- Pin test_connect_starts_server + test_webhook_deliver_only defaults
  to 127.0.0.1 so they keep passing under the new rail.
- Document the behavior in website/docs/user-guide/messaging/webhooks.md.
2026-05-07 07:38:43 -07:00
0z!
fb4f953569 fix: block INSECURE_NO_AUTH on non-localhost webhook bindings 2026-05-07 07:38:43 -07:00
Teknium
5c08b851df
docs(platforms): document env_enablement_fn + cron_deliver_env_var hooks (#21331)
Following PR #21306 which added the new generic plugin-platform hooks,
update the three platform-authoring docs so plugin authors find them:

- website/docs/developer-guide/adding-platform-adapters.md: expand the
  'What the Plugin System Handles Automatically' table with env-only
  auto-enable + cron delivery + hermes-config UI entries rows.  Add
  three new sections — 'Env-Driven Auto-Configuration', 'Cron
  Delivery', 'Surfacing Env Vars in hermes config' — covering the
  hook signatures, plugin.yaml rich-dict format, and the
  home_channel-key special case.  Update the main register() example
  to pass env_enablement_fn + cron_deliver_env_var inline so readers
  see them on their first pass.  Upgrade the PLUGIN.yaml snippet to
  show bare-string + rich-dict + optional_env.

- website/docs/guides/build-a-hermes-plugin.md: the thin platform
  example in the build-a-plugin tour now includes env_enablement_fn
  and cron_deliver_env_var, plus an optional_env block in the inline
  plugin.yaml.  Keeps pointing to the developer-guide page for the
  full treatment.

- gateway/platforms/ADDING_A_PLATFORM.md: the in-repo reference
  shallow-points at the docsite but now names the three new hooks
  explicitly so contributors reading the source tree know what
  they're for.  Also adds teams + google_chat as reference
  implementations alongside irc.
2026-05-07 07:36:42 -07:00
WideLee
5b121c6e35 feat(qqbot): process attachments in quoted (reply) messages
When a user replies while quoting another message, QQ sets
'message_type = 103' and pushes the referenced message's content +
attachments inside 'msg_elements[0]'. The old adapter ignored
msg_elements entirely, so:

- Bare quote-replies (no user text) surfaced nothing to the LLM.
- Quoted images/files/voice were never downloaded or described.
- Quoted voice messages specifically produced no transcript — the model
  had no way to see what the user was referring to when saying 'about
  this voice note…'.

This commit adds _process_quoted_context(d) which extracts msg_elements,
unions their attachments, and runs them through the SAME
_process_attachments pipeline as the main message body. Quoted voice
gets an STT transcript (tried via QQ's asr_refer_text first, then the
configured STT provider); quoted images get cached just like main-body
images; quoted files surface with their original filename intact (not
the CDN URL hash).

The quoted content is prepended to the user's text as a '[Quoted message]:'
block so the LLM sees the full referential context on one turn.
Images-only quotes surface a '[Quoted message]: (image)' marker so the
model knows an image was referenced even if no text came with it.

All four inbound handlers (_handle_c2c_message, _handle_group_message,
_handle_guild_message, _handle_dm_message) now call the helper uniformly
— one merge pattern, not four divergent implementations.

Filename preservation is carried by _process_attachments' existing
'[Attachment: {filename or ct}]' line; nothing else needed for that.

12 new tests under TestProcessQuotedContext and TestMergeQuoteInto cover:

- Non-quote messages short-circuit to empty
- message_type=103 with no msg_elements is harmless
- Text-only quotes render with '[Quoted message]:' prefix
- Voice attachments in the quote flow through STT
- File attachments in the quote preserve the original filename
- Image attachments surface cached paths + media types
- Images-only quote still emits a marker
- Multiple msg_elements are concatenated
- Malformed message_type values return empty
- _merge_quote_into prepends with a blank-line separator

Full qqbot suite: 130 passed (72 existing + 19 chunked + 27 keyboards
+ 12 quoted).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:36:30 -07:00
WideLee
de584cd1dd feat(qqbot): add inline-keyboard approvals and update prompts
The QQ Bot v2 API supports inline keyboards on outbound messages. When a
user taps a button, the platform dispatches an INTERACTION_CREATE
gateway event; the bot ACKs it via PUT /interactions/{id} and decodes
the button's data payload to route the click.

This commit adds:

New module gateway/platforms/qqbot/keyboards.py

- Inline-keyboard dataclasses (InlineKeyboard, KeyboardRow, KeyboardButton,
  KeyboardButtonAction, KeyboardButtonRenderData, KeyboardButtonPermission)
  that serialize to the JSON shape the QQ API expects.
- build_approval_keyboard(session_key) — 3-button layout:
   允许一次 /  始终允许 /  拒绝, all sharing group_id='approval'
  so clicking one greys out the rest.
- build_update_prompt_keyboard() — Yes/No keyboard for update confirms.
- parse_approval_button_data() / parse_update_prompt_button_data() —
  decode the button_data payload from INTERACTION_CREATE.
  approve:<session_key>:<decision>  (decision = allow-once|allow-always|deny)
  update_prompt:<answer>            (answer = y|n)
- build_approval_text(ApprovalRequest) — markdown renderer for the
  surrounding message body (exec-approval and plugin-approval variants,
  with severity icons 🔴/🔵/🟡).
- parse_interaction_event(raw) → InteractionEvent dataclass — normalizes
  the nested raw payload (id / scene / openids / button_data / etc.).

Adapter changes (gateway/platforms/qqbot/adapter.py)

- _dispatch_payload routes INTERACTION_CREATE → _on_interaction.
- _on_interaction parses the event, ACKs via PUT /interactions/{id}, then
  invokes a user-registered interaction callback. Exceptions from the
  callback are caught and logged (never propagate into the WS loop).
- set_interaction_callback(cb) lets gateway wiring register a routing
  handler that inspects button_data and resolves the corresponding
  pending approval / update prompt.
- _send_c2c_text / _send_group_text now accept an optional keyboard kwarg
  and append it to the outbound body.
- send_with_keyboard(chat_id, content, keyboard, reply_to=None) — public
  helper that sends a single short message with a keyboard attached.
  Does NOT chunk-split (a keyboard message has one interactive surface).
  Guild chats are rejected non-retryably — they don't support keyboards.
- send_approval_request(chat_id, ApprovalRequest, reply_to=None) +
  send_update_prompt(chat_id, content, reply_to=None) — convenience
  wrappers over send_with_keyboard.

Tests

27 new unit tests under TestApprovalButtonData, TestUpdatePromptButtonData,
TestBuildApprovalKeyboard, TestBuildUpdatePromptKeyboard, TestBuildApprovalText,
TestInteractionEventParsing, and TestAdapterInteractionDispatch. Cover:

- Button-data round-trip (build → parse returns original session/decision)
- Keyboard JSON shape + mutual-exclusion group_id
- Exec vs plugin approval text templates + severity icons
- Interaction event parsing (c2c / group / guild scene codes)
- _on_interaction end-to-end: ACK invoked, callback receives parsed event,
  callback exceptions are swallowed, missing id skips ACK, no registered
  callback is harmless.

Full qqbot suite: 118 passed (72 existing + 19 chunked + 27 keyboards).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:36:30 -07:00
WideLee
9feaeb632b feat(qqbot): add chunked upload with structured error types
The v2 'single POST /v2/{users|groups}/{id}/files' upload path is capped
at ~10 MB inline (base64 'file_data' or 'url'). For larger files the QQ
platform provides a three-step flow:

  1. POST /upload_prepare           → upload_id + pre-signed COS part URLs
  2. PUT each part to its COS URL → POST /upload_part_finish
  3. POST /files with {upload_id}   → file_info token

This commit adds a new gateway/platforms/qqbot/chunked_upload.py module
that implements the flow, wires it into QQAdapter._send_media for local
files (URL uploads keep the existing inline path), and introduces
structured exceptions so the caller can surface actionable error text:

- UploadDailyLimitExceededError  (biz_code 40093002, non-retryable)
- UploadFileTooLargeError        (file exceeds the platform limit)

Both carry file_name / file_size_human / limit_human so the model can
compose user-friendly replies instead of seeing opaque HTTP codes.

The part_finish 40093001 retryable-error loop respects the server-
provided retry_timeout (capped at 10 minutes locally) with a 1 s
polling interval. COS PUTs retry transient failures up to 2 times
with exponential backoff. complete_upload retries up to 2 times.

Covers files up to the platform's ~100 MB per-file limit; before this
the adapter silently rejected anything over ~10 MB.

19 new unit tests under TestChunkedUpload* cover the happy path,
prepare-response parsing, helper functions, part retries, COS PUT
retries, group vs c2c routing, and the structured-error mapping.

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:36:30 -07:00
Teknium
69d025e4a7 feat(gateway): add allowed_{chats,channels,rooms} whitelist to Telegram, Mattermost, Matrix, DingTalk
Mirrors the Slack `allowed_channels` feature (PR #7401) and Discord's
`allowed_channels` (PR #7044) across the remaining group-capable platforms.
All five platforms (Slack + Discord + the four added here) now follow the
same pattern: primary config via config.yaml, env-var fallback as an escape
hatch — matching the project policy that .env is for secrets only and
behavioral settings belong in config.yaml.

Also fixes a duplicate `slack` key in DEFAULT_CONFIG introduced by PR
#7401 (the later entry silently overwrote `allowed_channels`, `require_mention`,
and `free_response_channels` at dict-literal evaluation time).

Platforms added:
- Telegram: `telegram.allowed_chats` (env alias: `TELEGRAM_ALLOWED_CHATS`)
- Mattermost: `mattermost.allowed_channels` (env alias: `MATTERMOST_ALLOWED_CHANNELS`)
- Matrix: `matrix.allowed_rooms` (env alias: `MATRIX_ALLOWED_ROOMS`)
- DingTalk: `dingtalk.allowed_chats` (env alias: `DINGTALK_ALLOWED_CHATS`)

Mattermost and Matrix previously had NO config.yaml bridging for any of
their gating settings; this PR adds `load_gateway_config` bridges for them
(Mattermost gets require_mention + free_response_channels + allowed_channels;
Matrix gets allowed_rooms on top of its existing bridges for require_mention
and free_response_rooms).

Semantics identical everywhere:
- Empty = no restriction (fully backward compatible).
- Non-empty = hard whitelist: non-listed chats are silently ignored,
  even when the bot is @mentioned.
- DMs bypass the check entirely.

DEFAULT_CONFIG merges the duplicate `slack` block and adds new `mattermost`
and `matrix` blocks so all gating settings surface in defaults.

Not included: Feishu (has its own per-chat `chat_rules` system that covers
this use case differently), WhatsApp (already has `group_allow_from` via
`group_policy: allowlist`), pure-DM platforms (Signal, SMS, BlueBubbles,
Yuanbao — no group concept).
2026-05-07 06:54:29 -07:00