Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:
1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
broadcast/newsletter JIDs matched no pattern and fell through to
`return None, None, False`, so the caller treated them as
unresolvable and used the home channel. The bridge's /send endpoint
accepts any chatId, so only the tool-side target parsing was at fault.
Add a whatsapp branch that recognizes native JIDs as explicit targets.
The pre-existing '+'-prefixed E.164 path is preserved.
2. WhatsApp groups have no human-friendly name — the channel directory
is regenerated from session data on a timer, so a group shows up as
its raw 18-digit JID and any hand-edit to channel_directory.json is
clobbered on the next rebuild. Add a user-maintained alias overlay
(~/.hermes/channel_aliases.json) re-applied on every build AND every
load, giving durable friendly names and letting a freshly-created
group be pre-named before its first message.
Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
On Linux, systemd spawns core services (cron, nginx, sshd) with
deterministic PIDs and jiffy start_times across reboots. A service can
land on the exact same PID and start_time as a previous gateway, causing
acquire_scoped_lock to mistake it for a live gateway and block startup.
The existing stale-detection paths only covered:
- start_times both non-None and different (clear mismatch)
- start_times both None (macOS/Windows fallback to cmdline check)
The boot-time collision falls through both: times are non-None and
equal, so neither branch fired.
Add a third check: when both start_times are known and match but the
live process fails _looks_like_gateway_process, read its cmdline. If
the cmdline is readable (non-None), we have positive evidence of an
impostor and mark the lock stale. Requiring a readable cmdline keeps the
check conservative — if cmdline is unreadable we do not evict.
The #45966 cross-process coherence guard snapshots a session's on-disk
message_count next to the cached agent and rebuilds the agent when the
count changes. But the snapshot is taken at agent-BUILD time — before
the turn writes its own user + assistant (+ tool) rows — and the cache
entry is never rewritten on a reuse. So this process's OWN turn grows
message_count, and the very next turn sees a mismatch and rebuilds the
agent. That happens every turn, for every conversation, silently
destroying the per-conversation prompt caching the cache exists to
protect (AGENTS.md: prompt caching is sacred).
Add _refresh_agent_cache_message_count(): after a turn completes and the
agent has flushed its rows to the SessionDB, re-baseline the stored count
to the now-current value. The guard then fires ONLY when a DIFFERENT
process changes the transcript — preserving the #45966 fix while keeping
the cache warm for normal single-process operation.
Tests drive the real SessionDB + the real guard condition: 5 consecutive
same-process turns now all REUSE the cached agent (0 before the fix); a
cross-process append still invalidates; and the re-baseline is fail-safe
(no DB, falsy session_id, raising probe, legacy 2-tuple, pending sentinel
all no-op).
Three changes to prevent infinite re-execution loops when a user sends
a new message while long-running tools are executing:
1. Filter interrupted tool results in _build_gateway_agent_history:
skip tool messages whose content contains [Command interrupted] or
exit_code 130 — they represent partial execution, not valid results.
2. Don't replay auto-continue notes as user messages: detect
gateway-injected [System note: ...] / [IMPORTANT: ...] prefixes
and skip them in _build_gateway_agent_history so the LLM doesn't
see 4+ messages from 'the user' telling it to finish old work.
3. Fix the wording: the system note now instructs the model to
address the user's NEW message FIRST, IGNORE pending results,
and NOT re-execute old tool calls.
Closes#45230
Port 465 expects implicit TLS (SMTP_SSL) from the first byte. The email
adapter always used SMTP() + starttls(), which is correct for port 587
but hangs/fails on port 465 providers (e.g., Swiss ISPs).
Additionally, when the SMTP host has AAAA DNS records but IPv6 is
unreachable, socket.create_connection() tries IPv6 first and hangs
until timeout. Add an IPv4 fallback via AF_INET socket.
Extract _connect_smtp() helper to consolidate the 4 duplicate SMTP
connection sites into a single method with correct protocol selection
and IPv6 fallback logic.
Gateway startup now queues real inbound messages until restart-interrupted auto-resume turns have completed, preventing duplicate agents for the same session after a restart.
A stale certifi CA bundle after a partial `hermes update` used to crash
the agent on the first outbound HTTPS call with a raw traceback and
trap the gateway in a retry loop.
This patch:
* Adds `agent/errors.py` with a typed `SSLConfigurationError`
* Adds `agent/ssl_guard.py` with a `verify_ca_bundle()` pre-flight
that asserts the bundle exists, is non-trivial in size, and can build
a working SSLContext. On macOS, it falls back to the system trust
store when the bundle is empty but the system store is healthy
(covers corporate proxies / MDM setups).
* Wires the guard into `run_agent.py` and `gateway/run.py` right
after the `hermes_bootstrap` import, inside a try/except so a bug
in the guard itself can never prevent startup.
* Adds a `SSL / CA Certificates` section to `hermes_cli doctor` so
users can detect the failure with one command.
* Adds unit tests covering the healthy, missing, empty, skip-env, and
macOS-fallback paths.
* Adds an RCA document describing the failure mode and the recovery
path (`pip install -e .`).
When the bundle is broken the user sees:
\u26a0\ufe0f SSL certificate bundle issue detected.
Run: pip install -e .
`HERMES_SKIP_SSL_GUARD=1` disables the check for sandboxed
environments that ship their own trust store.
Carry forward focused follow-ups from PR #45741: treat PTB's raw Bot API 10.1 response shapes safely, recognize real missing-endpoint errors, preserve link preview settings on rich sends, and lock the rich limit to Telegram's character-based cap.
The salvaged fix's two regression tests mock adapter.handle_message, so
they only assert the pre-claimed sentinel is set/cleaned around a stub —
they never drive the real dispatch chain. Add a full-path test that
exercises _schedule_resume_pending_sessions -> _guarded_handle_message ->
adapter.handle_message -> _process_message_background -> _handle_message
and asserts the resumed session's agent runs EXACTLY ONCE: not zero (the
pre-claim must not self-bounce the resume into a queued no-op) and not
twice (the duplicate-agent bug #45456 the fix targets). Also assert no
leaked sentinel and no orphaned pending event after the drain settles.
Tighten the _guarded_handle_message docstring: on current main the real
sentinel is taken over inside _handle_message (not _process_message_background),
and note the `is _AGENT_PENDING_SENTINEL` guard only releases the slot we
ourselves placed, never one a live run owns.
When the gateway restarts and auto-resumes an interrupted session, an
inbound message arriving in the window between `asyncio.create_task()`
and the task's first await could spin up a second AIAgent for the same
session. Both agents would then process messages concurrently,
producing interleaved duplicate responses (#45456).
Fix: set `_AGENT_PENDING_SENTINEL` in `_running_agents` immediately
after the "already running" check, before creating the task. This
closes the race window — any inbound message sees the slot as occupied
and queues behind the auto-resume.
A `_guarded_handle_message` wrapper ensures the pre-claimed sentinel is
always released, even if `handle_message` raises before reaching
`_process_message_background` (whose `finally` block handles normal
cleanup).
(cherry picked from commit 85150c976b)
Keep the own-policy fail-closed hardening from PR #45444, but still trust WeCom groups.<id>.allow_from because the adapter already checked that sender allowlist before dispatching to gateway auth.
Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default dm_policy/group_policy to "open", which forwards every sender. The gateway's adapter-trust shortcut in _is_user_authorized blanket-trusted those platforms when no env allowlist was set, so an operator who enabled one with only credentials authorized the entire external network -- the fail-open SECURITY.md section 2.6 forbids ("an allowlist is required for every enabled network-exposed adapter").
Trust the adapter only when its effective policy for the chat type is an actual "allowlist" restriction (the case #34515 was protecting). "open"/"pairing"/anything else falls through to default-deny, where {PLATFORM}_ALLOW_ALL_USERS / GATEWAY_ALLOW_ALL_USERS and the pairing flow remain the explicit opt-ins.
Old Office formats (.xls, .doc, .ppt) were missing from the
SUPPORTED_DOCUMENT_TYPES dict in gateway/platforms/base.py while their
newer counterparts (.xlsx, .docx, .pptx) were included.
Sending an .xls file via Telegram triggers 'Unsupported document type'
and the file is silently dropped instead of being cached and forwarded
to the agent.
Add the three legacy MIME types so these files are handled the same way
as their modern equivalents.
After compression exhaustion the auto-reset created a fresh session but
discarded reset_session()'s return value and left the Telegram topic
binding pointing at the oversized compressed child. The next inbound
message in that topic healed the binding forward and switch_session'd the
freshly-reset lane back onto the bloated transcript, re-triggering
compression exhaustion in a loop with a new session id each time.
Capture the fresh entry and re-sync the topic binding to it so the next
message starts clean. No-op on non-topic lanes.
Regression of the #9893/#10063 auto-reset fix.
Fixes#35809
Remove the rich_messages config toggle entirely so Telegram replies always try the Bot API 10.1 rich-message path first, with the existing MarkdownV2 fallback/latch behavior for unsupported endpoints and per-message failures.
Restore the Telegram platform hint to encourage rich Markdown tables/task lists/math now that the rich path is the default, and remove the config/docs surface for the old toggle.
Salvages PR #25747 by preserving gateway session rotation even when a post-compression model call fails before returning final content.
Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
- Use reply_parameters per the sendRichMessage spec instead of the
undocumented reply_to_message_id scalar (silently ignored -> reply
anchor quietly dropped).
- Latch rich sends off after an endpoint-capability failure (old PTB /
server without sendRichMessage) so every later reply doesn't pay a
doomed extra roundtrip; per-message BadRequests do NOT latch.
- Default rich_messages to OFF (opt-in) while the day-old Bot API 10.1
endpoint is validated live; revert the prompt-hint table guidance
until the default flips on.
- Tests: reply_parameters shape, send-latch behavior, BadRequest
non-latch; rich tests opt in explicitly via extra.
Introduce opportunistic support for Telegram Bot API 10.1 rich messages by sending raw agent Markdown via sendRichMessage and streaming previews via sendRichMessageDraft. Implements a rich-path fast‑path in gateway/platforms/telegram.py (RICH_MESSAGE_MAX_BYTES=32768, feature gate platforms.telegram.extra.rich_messages, bot capability checks, routing/thread handling, and conservative fallback rules: permanent/capability errors fall back to the legacy MarkdownV2 path, transient/network errors are surfaced without legacy-resend). Also add a latch for draft capability failures (_rich_draft_disabled) and preserve legacy chunking and draft behavior when needed. Update agent prompt hints (telegram encourages rich Markdown/tables), add CLI config example option, update English and Chinese docs to describe rich messages and fallbacks, and add/adjust tests for rich send and draft behavior.
* feat(billing): /usage → portal top-up browser handoff
Add the terminal side of the billing slice (phase 2a): start a top-up by
throwing the user to the portal billing page with the top-up modal open. The
terminal does not confirm, poll, or track payment — checkout completes in the
browser and the next /usage shows the new balance.
- nous_account.py: parse organisation.slug/name from /api/oauth/account into
NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned
{base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy
{base}/billing?topup=open (never /orgs/None/...).
- portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line
(Topping up as <email> / org <name>), browser open with printed-URL fallback,
no-wait closing copy. No polling/confirmation (deferred to 2b).
- account_usage.py: the shared /usage credits block now links the org-pinned
top-up URL (auto-opens the modal) + points to the command.
Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until
that is live on the target env; until then /api/oauth/account returns
organisation: { id } only and the URL falls back to legacy.
* feat(billing): /credits command for balance + top-up handoff
Replace the standalone `hermes portal topup` subcommand with an in-session
/credits slash command — a focused money surface (balance in, top-up out) that
works in the CLI, TUI, and every messaging platform from one registry entry.
- commands.py: register /credits (Info category). Slack is at its 50-slash cap,
so /credits is routed via /hermes credits on Slack only (new
_SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the
native list and breaking Telegram parity; native everywhere else.
- account_usage.py: build_credits_view() — one portal fetch → balance lines +
identity line + org-pinned top-up URL + depleted flag, consumed by all
surfaces. Reuses the same snapshot/URL builder as /usage so numbers match.
- cli.py: _show_credits() — balance block + identity line + 3-button panel
(Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal.
ASK, never auto-launch; headless falls back to printing the URL.
- gateway/slash_commands.py: _handle_credits_command() — renders the block +
tappable top-up URL + no-wait copy; works on button and plain-text platforms.
- /usage credits line now points to /credits.
- Retire `hermes portal topup` (portal_cli.py back to baseline); the engine
(slug/name parse + nous_portal_topup_url) stays as the shared core.
No polling, no payment confirmation (billing phase 2a). Depends on NAS #409.
* fix(credits): /credits works in the TUI slash-worker (non-interactive)
In the TUI, /credits runs in the slash-worker subprocess where there is no
live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called
the 3-button modal unconditionally, which fell back to reading stdin →
exception → slash.exec rejected → the command produced no output (only the
pre-existing 'Credit access paused' banner showed).
- _show_credits: when self._app is None (TUI worker / piped / non-interactive),
render the text variant — balance block + tappable top-up URL + no-wait line,
same affordance as the messaging surfaces — and skip the modal entirely. The
3-button panel still renders in the interactive CLI.
- Depleted banner copy: 'run /usage for balance' → 'run /credits to top up'
now that /credits is the dedicated money surface (+ tests).
- Regression tests: _show_credits with self._app=None renders text and never
invokes the modal; logged-out path.
* feat(tui): credits.view RPC for the /credits tappable top-up button
Add a credits.view JSON-RPC method returning the structured CreditsView
(logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can
render a clickable <Link> top-up button instead of plain text. Account-
independent (portal fetch gated on a logged-in Nous account), fail-open to
{logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern.
Frontend (TUI-local /credits command + Ink component) lands separately.
* feat(tui): /credits command with keyboard-driven top-up confirm
TUI-local /credits: fetches the structured balance via the credits.view RPC,
prints the balance + identity + top-up URL, then arms the EXISTING confirm
overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel).
Reuses ConfirmReq — no new overlay component/state/input handler. Headless
(openExternalUrl returns false) falls back to printing the URL.
- gatewayTypes.ts: CreditsViewResponse.
- commands/credits.ts: the command (mirrors /status's rpc+guarded pattern).
- registry.ts: register creditsCommands.
- test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases).
Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.
Email cached document attachments and placed them in media_urls, but
msg_type only flipped on image attachments — documents stayed TEXT and
run.py's document-context injection (gated on MessageType.DOCUMENT)
silently dropped them. Same bug class as Signal #12845. DOCUMENT wins
over PHOTO for mixed attachments since image handling keys off per-path
mime types while document injection gates strictly on message_type.
Widen the salvaged #12851 fix to match the established classification
pattern (WhatsApp/Slack/BlueBubbles/Mattermost): video/* -> VIDEO, and
any remaining MIME type falls through to DOCUMENT instead of TEXT, so
exotic types still trigger run.py's document-context injection.
Register no-op Slack event handlers for inbound reaction_added and reaction_removed events so Slack Bolt does not log unhandled-request warnings for events Hermes does not consume.
The previous implementation captured loop vars via default arguments::
async def _wrapped(ack, body, action, _cb=_cb, _plugin_name=_plugin_name):
slack_bolt's ``kwargs_injection`` introspects each listener's signature
via ``inspect.signature`` and passes ``None`` for any parameter name it
doesn't recognise (see ``slack_bolt/kwargs_injection/async_utils.py``
``build_async_required_kwargs``). That clobbered ``_cb`` to ``None`` at
dispatch time, so the wrapped plugin handler became ``NoneType`` —
``await _cb(...)`` then raised ``'NoneType' object is not callable`` and
no plugin action handler ever fired.
Replace the default-arg trick with a small closure factory so the
wrapper's public signature is exactly ``(ack, body, action)``. Add a
regression test that introspects the wrapped function's signature.
Found via real Slack click on a Block Kit button registered through
``ctx.register_slack_action_handler`` — gateway log showed
``[Slack] Plugin 'None' action handler raised: 'NoneType' object is
not callable`` despite the registration log line confirming the
handler was wired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plugins that post Block Kit messages with interactive elements (buttons,
overflow menus, datepickers, etc.) had no documented way to receive the
resulting click events. The plugin API exposed register_tool, register_hook,
register_command, register_platform, and register_context_engine, but
nothing for slack_bolt action handlers. The only workaround was to
monkey-patch SlackAdapter.connect from inside register(), which is
fragile and breaks on every Hermes update.
This change adds:
* PluginContext.register_slack_action_handler(action_id, callback) —
validates inputs and queues the handler on the PluginManager.
action_id accepts whatever slack_bolt.App.action() accepts (literal
string, compiled re.Pattern, or constraint dict).
* PluginManager.get_slack_action_handlers() — accessor used by the
Slack adapter at connect time.
* SlackAdapter.connect — after wiring its built-in approval and
slash-confirm buttons, iterates the plugin-registered handlers
and registers each via self._app.action(matcher)(callback). Each
callback is wrapped defensively so a misbehaving plugin cannot
crash slack_bolt's dispatch loop, with a best-effort ack on
exception so Slack stops retrying the click.
* Defensive fallback when the plugin layer is unhealthy: a
RuntimeError from get_plugin_manager() is logged and swallowed
rather than blocking the gateway from starting.
* Test coverage in tests/gateway/test_slack_plugin_action_handlers.py
for input validation, multi-plugin registration, the connect-time
wiring, defensive exception handling, and the plugin-loader-
failure fallback path.
* Documentation in website/docs/guides/build-a-hermes-plugin.md
describing the new API alongside the existing register_command /
dispatch_tool documentation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file
attachment context note led with "Ask the user what they'd like you to
do with it", steering the model into asking instead of transcribing.
Rewritten to instruct the agent to transcribe/process the file itself
when the request involves its content, only asking when intent is
genuinely unclear. Contract assertion added to the existing audio
attachment note test.
When a user attached a binary document (PDF, DOCX, XLSX, …) in chat, the
context note prepended to the turn said "Ask the user what they'd like you to
do with it." That steered the model into asking the user to paste the
contents rather than extracting the text it is fully capable of reading — so
attached PDFs/DOCX appeared "unreadable" to the agent.
Rewrite the binary-document note to tell the agent the file is a non-text
format saved at the given path and to extract its text itself (e.g. via the
terminal tool or the ocr-and-documents skill) before answering. Text
documents (whose content is already inlined by the platform adapter) keep
their existing note. The note construction is pulled into a small
`_build_document_context_note` helper so it is unit-testable.