Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.
Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).
Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
`_sent_message_timestamps` (the reply-to-own-message quote cache) used a
`set` evicted with `set.pop()`, which removes an ARBITRARY element — so once
more than the cap (500) outbound timestamps are tracked, a still-recent
timestamp could be dropped while older ones survive, missing a genuine
reply-to-own-message. Convert it to an OrderedDict with FIFO (oldest-first)
eviction, mirroring the recently-hardened echo ring (#31250). This closes the
same bug class on the sibling cache.
Adds a regression test asserting oldest-first eviction + MRU promotion.
Review follow-up on the salvaged self-mention strip (#31217): the original
only stripped the bot's rendered @<number>/@<uuid> self-mention inside the
`require_mention=true` branch, so groups with require_mention=false still
leaked it into the agent text. Hoist the strip to run for every group message
(fixing the whole bug class), and collapse the doubled space a mid-sentence
removal leaves while preserving intentional newlines.
Widen the env_float() guard from #48735 across the whole bug class: a
non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd
port) raised an unhandled ValueError and crashed adapter/agent init.
Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to
the canonical utils.env_int / utils.env_float helpers (the established house
pattern), instead of duplicating per-module helpers or inline try/except:
- gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT
- gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL
- gateway/platforms/feishu.py: dedup cache + text/media batch settings
- gateway/platforms/wecom.py, discord/adapter.py: text batch delays
- gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT
- gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT
- hermes_cli/auth.py: CODEX/XAI refresh timeouts
- agent/chat_completion_helpers.py: API/stream read/stale timeouts
- run_agent.py, agent/auxiliary_client.py: API + nous timeouts
Sites already guarded by try/except or local helpers are left untouched.
The HERMES_MAX_ITERATIONS sites are already guarded on main via
_current_max_iterations(), so they are not included.
Review follow-up on the salvaged AAC + markdown changes:
- Fix an inaccurate comment claiming the STT layer has a sniff-and-remux
fallback (verified: no such fallback exists; the ffmpeg-absent path caches
raw ADTS and STT may reject it).
- Type the _markdown_to_signal wrapper as tuple[str, list[str]] to match the
shared helper instead of a bare tuple.
- Replace the hardcoded /home/pi/... test fixture with a runtime-generated
ADTS AAC sample so the remux round-trip actually runs in CI (skips only
when ffmpeg is absent) instead of always-skipping.
Android Signal delivers voice notes as raw ADTS AAC frames, which
share the `0xFF 0xFx` sync word with MPEG-1/2 Layer 3 (MP3). The
`_guess_extension` byte-signature test in gateway/platforms/signal.py
was matching both, so ADTS AAC was being misclassified as MP3 — saved
to disk with the wrong extension and rejected by every major STT API
(Groq, OpenAI) because their server-side format sniffers inspect the
actual codec, not the file extension.
Two changes:
1. Tighten the MP3 vs ADTS disambiguator. ADTS packs `ID`,
`layer`, and `protection_absent` into bits 3-0 of byte 1, where
`ID=0` and `layer=00` for AAC. Real MP3 has `ID=1` and
`layer` in {01, 10, 11}. The mask `0xF6` against target `0xF0`
cleanly separates them.
2. Remux raw ADTS AAC to MP4 container at the cache step via
`ffmpeg -c:a copy`. Single demux/remux, no re-encode, no quality
loss, sub-100ms on a Pi 5. The cached file is a normal `.m4a`
that all major STT providers accept. ffmpeg is a transitive
dependency of many other Hermes features (TTS, video skills) so
this isn't a new install requirement; the remux degrades
gracefully to a no-op if ffmpeg is missing.
The new helper `_remux_aac_to_m4a` is unit-tested with a real
Android voice note from the audio cache that originally triggered
the bug, plus synthetic ADTS frames for the byte-level
disambiguator and garbage-input graceful failure.
Closes the gap that broke transcription for any Android Signal user
sending voice messages to Hermes.
Route Signal send paths through shared markdown formatting helpers and render markdown bullets consistently as Unicode bullets. Add coverage for Signal formatting and send_message integration.
Behavior-preserving cleanups on the managed-node resolver:
- Hoist _candidate_node_command_names() out of the inner dir loop in
find_hermes_node_executable (computed once, not per directory).
- Drop redundant os.environ.copy() at the two with_hermes_node_path(
os.environ.copy()) sites \u2014 the helper already copies os.environ when
called with no argument (verified env-equivalent).
- Add reciprocal keep-in-sync comments between iter_hermes_node_dirs()
(hermes_constants.py) and hermesManagedNodePathEntries() (electron
main.cjs), which mirror the same platform-ordering rule across the
Python/Node boundary.
Sets the Telegram bot's short description (the line under its name) to
"Online" on gateway connect and "Offline" on clean disconnect, gated
behind extra.status_indicator (off by default).
Telegram bots have no presence/online dot — that's a user-account
feature the Bot API doesn't expose for bots. The short description is
the closest available surface, so this gives users a way to tell whether
the gateway is up from the bot's profile.
- New extra.status_indicator flag (+ status_online/status_offline text
overrides), read in __init__ via config.extra — no config-schema change.
- _set_status_indicator() helper: best-effort, swallows API errors so it
never blocks connect/disconnect; truncates to Telegram's 120-char cap.
- Wired Online after _mark_connected(), Offline at top of disconnect()
while the bot HTTP client is still alive.
- 9 unit tests + Telegram docs section.
Requested by @ilTrumpista, cc @Teknium.
Serve webhook inbound for multiple profiles off the one shared listener via a
URL prefix, with no second port bound.
- SessionSource gains a 'profile' field (round-trips through to_dict/from_dict;
omitted when unset so existing serialization is unchanged). It carries which
profile an inbound message was routed to.
- WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the
existing /webhooks/{route_name}. _resolve_request_profile validates the
prefix against profiles_to_serve(): None when absent or multiplexing is off
(ignored, handled as default — no spurious 404), the profile name when valid,
_PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile
is stamped onto the SessionSource.
- session-key namespacing and the per-turn home/credential scope now prefer
source.profile: SessionStore._resolve_profile_for_key(source),
_session_key_for_source fallback, and _resolve_profile_home_for_source all
honor it (→ the agent turn resolves that profile's config/skills/credentials
via the Phase 2 _profile_runtime_scope).
Constraint: routing inbound needs no per-profile platform credential, but the
agent still needs the routed profile's provider key — delivered by Phase 2's
secret scope. api_server (OpenAI-compatible surface) profile routing is a
focused follow-on; its source-construction path differs from webhook's.
Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_
profile accept/reject/ignore matrix.
Phase 4F (F.1 + F.2 + F.3, agent side). F.4 is the operator-run live smoke
(needs a NAS deployment); recorded in the PR, not code.
F.1 — on_jobs_changed wiring:
- cron/scheduler.py: _notify_provider_jobs_changed() — resolve the active
provider, call on_jobs_changed(), swallow errors. Lives in scheduler.py (not
jobs.py) so the store stays free of provider imports (no import cycle).
- Wired at the consumer surfaces AFTER a successful mutation: the cronjob model
tool (tools/cronjob_tools.py, create/update/remove/pause/resume) — which the
`hermes cron` CLI also routes through — and the REST handlers
(gateway/platforms/api_server.py, same five). Built-in's no-op default = zero
behavior change on the default path. Sleeping-agent direct jobs.json writes
(no tool/CLI/REST) are covered by reconcile-on-wake in start().
F.2 — config: cron.chronos.{portal_url,callback_url,expected_audience,
nas_jwks_url}. All non-secret; the agent holds no scheduler creds and the
outbound provision call reuses the existing Nous token (no token key). Additive
deep-merge key, no version literal.
F.3 — docs:
- docs/chronos-managed-cron-contract.md: authoritative agent↔NAS wire contract
(the three agent-cron endpoints + inbound /api/cron/fire + the 3-hop trust
model + at-most-once/re-arm semantics). This is what the NAS-side agent builds
against.
- cron-internals.md: "Managed cron (Chronos) for scale-to-zero" section.
- cli-commands.md: cron.provider accepts chronos + the cron.chronos.* keys.
- User docs name no scheduler vendor (QStash is a NAS-internal detail).
INVARIANT re-verified: zero qstash/upstash hits across plugins/cron, gateway,
hermes_cli, tools, website/docs (the one remaining repo hit is an unrelated
Context7 MCP comment in tools/mcp_tool.py).
Tests: test_jobs_changed_notify (5) — notify calls provider hook, swallows
errors, built-in harmless, tool create/remove notify. Full cron + chronos +
webhook + config + api_server_jobs suites green (504 in the cron+chronos+webhook
run).
Phase 4E (E.1 + E.2). The inbound side of Chronos: NAS POSTs the agent when a
one-shot fires; the agent verifies a NAS-minted JWT and runs the job.
E.1 — plugins/cron/chronos/verify.py:
- verify_nas_fire_token(token, expected_audience, jwks_or_key, issuer): verifies
signature against the NAS JWKS (RS/ES family; symmetric rejected), aud == this
agent, exp/nbf, iss, and purpose == "cron_fire" (so a general agent JWT can't
be replayed against the fire endpoint). Returns claims or None; never raises.
Crypto delegated to PyJWT[crypto] (already a declared dep) — no hand-rolled
JWT, no new dependency. No key configured → refuse (never unsigned-decode a
security boundary).
- get_fire_verifier(): pluggable indirection so the DQ-4 escape hatch
(direct per-job cron-key) can swap in with no handler change.
E.2 — gateway/platforms/api_server.py:
- POST /api/cron/fire (registered only when _CRON_AVAILABLE). Authenticated by
the NAS-JWT via get_fire_verifier() — NOT API_SERVER_KEY (NAS holds no API
key; this is the only inbound that triggers remote job execution, so it gets
its own purpose-scoped check). Verifier args come from cron.chronos.* config.
401 on bad/missing/forged token. 400 on missing job_id. On success: 202 +
fire_due runs in the background (so a long agent turn never trips NAS's HTTP
timeout); the store CAS claim inside fire_due de-dupes a scheduler retry.
Tests:
- test_chronos_verify (11): REAL RS256 signing — valid→claims, wrong-aud,
missing/wrong purpose, expired, wrong-iss, tampered-signature (attacker key),
no-key-refuse, empty-token, JWKS-URL key resolution, get_fire_verifier.
- test_cron_fire_webhook (5): valid→202+fire, invalid→401+no-fire, missing
token→401, missing job_id→400, and fire path does NOT require API_SERVER_KEY.
api_server regression suites (214) green.
E.3 (NAS endpoints) is a separate cross-repo PR; the wire contract lands next
(docs/chronos-managed-cron-contract.md).
Telegram does not echo a sendRichMessage's content back in
reply_to_message (.text/.caption empty, .api_kwargs None), so replies
to rich sends (briefings, the gateway's own rich finals) arrived with
no quotable text and the [Replying to: ...] injection was skipped.
Remember message_id -> text at send time in a best-effort JSON index
(gateway/rich_sent_store.py), and recover it on inbound when text and
caption are both empty. Best-effort and no-throw throughout: any
failure degrades to prior behavior and never breaks a send or message.
Salvaged from #47375 by @x1erra. Dropped the cross-platform run.py
reply-prefix rewrite (out of scope; bloated every reply on every
platform) and scrubbed a docstring reference to an out-of-repo script.
Kept the inbound reply_to logging enrichment used to verify the fix.
PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise.
SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata.
Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check.
Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.
`BasePlatformAdapter.send_multiple_images` passes `metadata=metadata` to
`send_image` / `send_image_file` / `send_animation` on every send. The
WhatsApp and email `send_image` overrides stopped their signature at
`reply_to`, so any image delivered as a URL (the common case — image-gen
backends return URLs) raised:
TypeError: send_image() got an unexpected keyword argument "metadata"
and the image silently failed to send. Their sibling overrides
(`send_image_file` / `send_video` / `send_voice` / `send_document`)
already absorb it via **kwargs, which is why only plain image-URL sends
broke.
- whatsapp/email `send_image`: accept `metadata` (matches the base
signature); WhatsApp forwards it to the super() text fallback.
- Add `tests/gateway/test_media_metadata_contract.py`: asserts WhatsApp +
email accept it, plus a best-effort sweep over every adapter so the next
slip fails at test time instead of in production.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.
Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.
- _needs_rich_rendering(): rich reserved for table/task-list/details/math
(adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.
Closes the root cause behind #45911 / #46009.
Port 465 expects implicit TLS (SMTP_SSL) from the first byte. The email
adapter always used SMTP() + starttls(), which is correct for port 587
but hangs/fails on port 465 providers (e.g., Swiss ISPs).
Additionally, when the SMTP host has AAAA DNS records but IPv6 is
unreachable, socket.create_connection() tries IPv6 first and hangs
until timeout. Add an IPv4 fallback via AF_INET socket.
Extract _connect_smtp() helper to consolidate the 4 duplicate SMTP
connection sites into a single method with correct protocol selection
and IPv6 fallback logic.
Carry forward focused follow-ups from PR #45741: treat PTB's raw Bot API 10.1 response shapes safely, recognize real missing-endpoint errors, preserve link preview settings on rich sends, and lock the rich limit to Telegram's character-based cap.
Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default dm_policy/group_policy to "open", which forwards every sender. The gateway's adapter-trust shortcut in _is_user_authorized blanket-trusted those platforms when no env allowlist was set, so an operator who enabled one with only credentials authorized the entire external network -- the fail-open SECURITY.md section 2.6 forbids ("an allowlist is required for every enabled network-exposed adapter").
Trust the adapter only when its effective policy for the chat type is an actual "allowlist" restriction (the case #34515 was protecting). "open"/"pairing"/anything else falls through to default-deny, where {PLATFORM}_ALLOW_ALL_USERS / GATEWAY_ALLOW_ALL_USERS and the pairing flow remain the explicit opt-ins.
Old Office formats (.xls, .doc, .ppt) were missing from the
SUPPORTED_DOCUMENT_TYPES dict in gateway/platforms/base.py while their
newer counterparts (.xlsx, .docx, .pptx) were included.
Sending an .xls file via Telegram triggers 'Unsupported document type'
and the file is silently dropped instead of being cached and forwarded
to the agent.
Add the three legacy MIME types so these files are handled the same way
as their modern equivalents.
Remove the rich_messages config toggle entirely so Telegram replies always try the Bot API 10.1 rich-message path first, with the existing MarkdownV2 fallback/latch behavior for unsupported endpoints and per-message failures.
Restore the Telegram platform hint to encourage rich Markdown tables/task lists/math now that the rich path is the default, and remove the config/docs surface for the old toggle.
- Use reply_parameters per the sendRichMessage spec instead of the
undocumented reply_to_message_id scalar (silently ignored -> reply
anchor quietly dropped).
- Latch rich sends off after an endpoint-capability failure (old PTB /
server without sendRichMessage) so every later reply doesn't pay a
doomed extra roundtrip; per-message BadRequests do NOT latch.
- Default rich_messages to OFF (opt-in) while the day-old Bot API 10.1
endpoint is validated live; revert the prompt-hint table guidance
until the default flips on.
- Tests: reply_parameters shape, send-latch behavior, BadRequest
non-latch; rich tests opt in explicitly via extra.
Introduce opportunistic support for Telegram Bot API 10.1 rich messages by sending raw agent Markdown via sendRichMessage and streaming previews via sendRichMessageDraft. Implements a rich-path fast‑path in gateway/platforms/telegram.py (RICH_MESSAGE_MAX_BYTES=32768, feature gate platforms.telegram.extra.rich_messages, bot capability checks, routing/thread handling, and conservative fallback rules: permanent/capability errors fall back to the legacy MarkdownV2 path, transient/network errors are surfaced without legacy-resend). Also add a latch for draft capability failures (_rich_draft_disabled) and preserve legacy chunking and draft behavior when needed. Update agent prompt hints (telegram encourages rich Markdown/tables), add CLI config example option, update English and Chinese docs to describe rich messages and fallbacks, and add/adjust tests for rich send and draft behavior.
Email cached document attachments and placed them in media_urls, but
msg_type only flipped on image attachments — documents stayed TEXT and
run.py's document-context injection (gated on MessageType.DOCUMENT)
silently dropped them. Same bug class as Signal #12845. DOCUMENT wins
over PHOTO for mixed attachments since image handling keys off per-path
mime types while document injection gates strictly on message_type.
Widen the salvaged #12851 fix to match the established classification
pattern (WhatsApp/Slack/BlueBubbles/Mattermost): video/* -> VIDEO, and
any remaining MIME type falls through to DOCUMENT instead of TEXT, so
exotic types still trigger run.py's document-context injection.
Register no-op Slack event handlers for inbound reaction_added and reaction_removed events so Slack Bolt does not log unhandled-request warnings for events Hermes does not consume.