Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose
when asked for the strict {done, reason} JSON verdict. The old code
failed-open to continue on every such turn, burning the entire turn
budget with log lines like
judge returned empty response
judge reply was not JSON: "Let me analyze whether the goal..."
and /goal clear could not stop it mid-loop without /stop.
After N=3 consecutive *parse* failures (transport/API errors don't
count — those are transient), the loop auto-pauses and prints:
⏸ Goal paused — the judge model (3 turns) isn't returning the
required JSON verdict. Route the judge to a stricter model in
~/.hermes/config.yaml:
auxiliary:
goal_judge:
provider: openrouter
model: google/gemini-3-flash-preview
Then /goal resume to continue.
The counter resets on any usable reply (both "done"/"continue" and
API errors) and persists across GoalManager reloads so cross-session
resumes carry the correct state.
Also fixes test_goal_verdict_send.py sharing a hardcoded session_id
across tests — the shared id only worked because the previous
_post_turn_goal_continuation was a never-awaited coroutine. Now that
PR #19160 made it properly awaited, the xdist test-leakage bug
surfaced. Each test gets a unique session_id via uuid suffix.
Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.
channels_list was iterating directory.items() directly, yielding
("updated_at", str) and ("platforms", dict) pairs — neither passed
the isinstance(entries_list, list) check, so the inner loop never ran
and every call returned count=0 even when channel_directory.json was
populated.
The writer (gateway/channel_directory.py) wraps the payload as
{"updated_at": ..., "platforms": {...}}; every other reader in the
codebase unwraps via directory.get("platforms", {}). This aligns
channels_list with that convention.
Also tightens the existing test_channels_with_directory test, which
bypassed the bug by asserting against _load_channel_directory() directly
instead of calling channels_list. It now calls the tool end-to-end and
a new test_channels_with_directory_platform_filter covers the filter
path. Both tests fail against the pre-fix code.
Closes#21474
Co-authored-by: chrisworksai <262485129+chrisworksai@users.noreply.github.com>
* feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks
The Triage column shipped with a placeholder 'a specifier will flesh
out the spec', but the specifier itself was never built. This wires
it up as a dedicated CLI verb.
`hermes kanban specify <id>` calls the auxiliary LLM (configured under
`auxiliary.triage_specifier`) to expand a rough one-liner into a
concrete spec — tightened title plus a body with Goal / Approach /
Acceptance criteria / Out-of-scope sections — then atomically flips
`status: triage -> todo` and recomputes ready so parent-free tasks
go straight to the dispatcher on the same tick.
Surface:
hermes kanban specify <task_id> # single task
hermes kanban specify --all [--tenant T] # sweep triage column
hermes kanban specify ... --author NAME # audit-comment author
hermes kanban specify ... --json # one JSON line per task
Design choices:
- Parent gating is preserved. specify_triage_task flips to 'todo',
then recompute_ready promotes to 'ready' only when parents are
done — same rule as a normal parent-gated todo.
- No daemon, no background watcher. Every invocation is explicit —
keeps cost predictable and doesn't fight the dispatcher loop.
- Response parse is lenient: strict JSON preferred, markdown-fence
tolerated, raw-body fallback on malformed JSON so the LLM can't
strand a task in triage.
- All failure modes (no aux client, API error, task moved out of
triage mid-call) return SpecifyOutcome(ok=False, reason=...) so
--all continues past individual failures.
Changes:
hermes_cli/kanban_db.py + specify_triage_task()
hermes_cli/kanban_specify.py NEW (~220 LOC — prompt, parse, call)
hermes_cli/kanban.py + specify subcommand + _cmd_specify
hermes_cli/config.py + auxiliary.triage_specifier task slot
website/docs/user-guide/features/kanban.md specify + config notes
website/docs/reference/cli-commands.md CLI reference entry
tests/hermes_cli/test_kanban_specify_db.py NEW (10 tests)
tests/hermes_cli/test_kanban_specify.py NEW (20 tests)
Validation: 30/30 targeted tests pass. E2E: triage task -> specify ->
ends in 'ready' with events [created, specified, promoted] and the
audit comment recorded under the configured author.
* feat(kanban): wire specifier into dashboard and gateway slash
Follow-ups to the initial PR #21435 — closes the two gaps I'd left as
post-merge: dashboard button and first-class gateway surface.
Dashboard (plugins/kanban/dashboard/)
- POST /tasks/:id/specify NEW endpoint. Thin wrapper around
kanban_specify.specify_task(). Returns the CLI outcome shape
({ok, task_id, reason, new_title}); ok=false with a human reason
is a 200, not a 4xx, so the UI can render it inline without
treating 'no aux client configured' as a crash.
- Runs sync in FastAPI's threadpool because the LLM call can take
tens of seconds on reasoning models.
- Pins HERMES_KANBAN_BOARD around the specify call so the module's
argless kb.connect() lands on the right board.
- dist/index.js: doSpecify callback threaded through the drawer →
TaskDetail → StatusActions prop chain. ✨ Specify button appears
ONLY when task.status === 'triage' (elsewhere the backend would
reject anyway — hide the button to keep the action row clean).
Busy state (Specifying…) + inline success/error banner under the
button using the response.reason text.
- dist/style.css: tiny hermes-kanban-msg-ok / -err classes using
existing --color vars so themes reskin cleanly.
Gateway slash (/kanban specify)
- Already works via the existing run_slash → build_parser →
kanban_command pipeline. No code change needed — slash commands
inherit the argparse tree automatically. Added coverage:
test_run_slash_specify_end_to_end (create --triage, specify, verify
promotion + retitle) and test_run_slash_specify_help_is_reachable.
Tests
- tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the
REST endpoint — happy path, non-triage rejection as ok=false 200,
missing aux client as ok=false 200.
- tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests.
Docs
- website/docs/user-guide/features/kanban.md: dashboard action row
description mentions ✨ Specify + all three surfaces. REST table
gains /tasks/:id/specify. Slash examples include /kanban specify.
Validation: 340/340 targeted tests pass. E2E via TestClient: create a
triage task over REST → POST /specify with mocked aux client → task
moves to 'ready' column on /board with new title and body applied.
Both implement WebSearchProvider via tools/web_providers/ — matching the
existing SearXNG pattern (PR #5c906d702). Search-only; pair with any
extract provider via web.extract_backend.
- tools/web_providers/brave_free.py — Brave Search API (free tier, 2k
queries/mo). Uses BRAVE_SEARCH_API_KEY as X-Subscription-Token.
- tools/web_providers/ddgs.py — DuckDuckGo via the ddgs Python package.
No API key; gated on package importability.
- tools/web_tools.py: both backends added to _get_backend() config list
and auto-detect chain (trails paid providers), _is_backend_available,
web_search_tool dispatch, web_extract_tool + web_crawl_tool search-only
refusals, check_web_api_key, and the __main__ diagnostic. Introduces
_ddgs_package_importable() helper so tests can monkeypatch a single
symbol for the ddgs availability check.
- hermes_cli/tools_config.py: picker entries for both providers; ddgs
gets a post_setup handler that runs `pip install ddgs`.
- hermes_cli/config.py: BRAVE_SEARCH_API_KEY in OPTIONAL_ENV_VARS.
- scripts/release.py: AUTHOR_MAP entry for @Abd0r.
- tests: 14 new tests (brave-free) + 15 new tests (ddgs) covering
provider unit behavior, backend wiring, and search-only refusals.
Salvages the brave-free + ddgs portion of PR #19796. Not included: the
in-line helpers in web_tools.py (replaced with provider modules to match
the shipped architecture), the lynx-based extract path (these backends
should refuse extract with a clear error — users pair with a real
extract provider), and scripts/start-llama-server.sh (unrelated).
Co-authored-by: Abd0r <223003280+Abd0r@users.noreply.github.com>
Extends PR #21400's resource inlining with image-specific handling: ACP
resource_link and embedded blob resources with an image/* mime (or image
file suffix when mime is missing) now emit an OpenAI image_url part
with a base64 data URL, so vision models actually see the image
instead of a [Binary file omitted] note. Non-image resources keep the
existing text-inlining behavior.
Adds 3 tests: local PNG via resource_link, JPEG mime inferred from
suffix when client omits mimeType, and embedded blob PNG.
The May 5 refactor in d5357f816 made _message_thread_id_for_typing()
symmetric with _message_thread_id_for_send() by mapping the General
topic (thread id "1") to None upfront for both. That's correct for
sendMessage — Telegram rejects message_thread_id=1 on sends and the
topic must be omitted — but it's wrong for sendChatAction.
Observed behavior (confirmed via before/after Telegram wire traces):
Before d5357f816: thread_id=1 → message_thread_id=1 → bubble visible in General
After d5357f816: thread_id=1 → message_thread_id=None → no visible typing
Omitting message_thread_id on sendChatAction does NOT fall back to
the General topic's view in a forum-enabled supergroup; the bubble
ends up hidden from the client's General-topic pane entirely. For
any user on a forum-group, the typing indicator stopped appearing.
Fix: drop the symmetric "1 → None" mapping from the typing resolver.
sendMessage still maps 1 → None via _message_thread_id_for_send (that
side was never broken). The asymmetry is real and required by
Telegram's API — document it in the resolver docstring.
Partial revert of d5357f816; restores the behavior from 0cf7d570e
("fix(telegram): restore typing indicator and thread routing for
forum General topic"). Does not re-introduce the retry-without-thread
fallback that 41545f7ec scoped down for DM topics — with the resolver
fixed, the first call already hits the right wire shape.
Test updated from test_send_typing_general_topic_uses_none_thread_id
(which encoded the broken contract) to
test_send_typing_preserves_general_topic_thread_id, asserting the
single correct call with message_thread_id=1. 10 other tests in the
file untouched and passing.
When empty-response terminal scaffolding fires on a tool-result turn,
_drop_trailing_empty_response_scaffolding left the live history ending at
a bare 'tool' message. The next user input then landed as [...tool, user],
a protocol-invalid sequence that OpenRouter/Opus and other providers
silently fail on (returns empty content). That retriggered the empty-retry
recovery every turn, and recovery flags never hit SQLite (no column for
them), so history kept looking broken on every reload.
Two fixes:
1. Scaffolding strip rewinds the orphan assistant(tool_calls)+tool pair
after popping sentinels. Only fires when scaffolding flags were
actually present, so mid-iteration tool loops are untouched.
2. _repair_message_sequence runs right before every API call as a
defensive belt: drops stray tool messages with unknown tool_call_ids,
merges consecutive user messages so no user input is lost. Does NOT
rewind assistant(tool_calls)+tool+user — that pattern is valid when
the user redirected before the model got its continuation turn.
Repro: session 20260507_044111_fa7e65. Opus-4.7/OpenRouter returned
content-less response after a 42KB execute_code output, nudge+retry
chain exhausted (no fallback configured), terminal sentinel appended,
scaffolding stripped leaving bare tool tail, user typed 'wtf happened..'
and landed as tool→user violation. Every subsequent turn collapsed in
<50ms with the same 3-retry empty chain because the API request itself
was malformed.
Verified live via HTTP mock: pre-fix reproduced 5 api_calls/0.15s exit
'empty_response_exhausted'; post-fix 1 api_call/0.10s exit
'text_response(finish_reason=stop)'. Three-turn session flows cleanly
through the scenario. Full run_agent suite: 1242 passed (0 regressions,
2 pre-existing concurrent_interrupt failures unrelated).
cmd_update's auto-restart path could leave the gateway dead after a
transient failure in systemd's own auto-restart window. Reproduced
on Ubuntu 25.10 + systemd 257: after update, gateway drains and exits 75,
systemd's first respawn 60s later fails (status=200/CHDIR with
"No such file or directory" on a WorkingDirectory that demonstrably
exists), the unit ends up in RestartMaxDelaySec=300 backoff, and
cmd_update's fallback 'systemctl restart' never recovers it — leaving
users with a permanently silent gateway until they manually run
'systemctl reset-failed'.
The fix mirrors the recovery pattern 'hermes gateway restart'
(systemd_restart) got in PR #20949: always reset-failed before
restart, on both the initial fallback and the retry. Also rewrites
the final failure message to tell the user to reset-failed +
restart (not just restart, which is the step that already failed
twice).
cron/scheduler.py:run_job() constructed AIAgent(...) without ever calling
discover_mcp_tools(). The CLI and gateway paths do this at startup; cron
jobs inherited none of it and the user's configured mcp_servers were
invisible inside every cron run.
Insert discover_mcp_tools() right before AIAgent(), wrapped in try/except
so a broken MCP server can't kill an otherwise-working cron job. The call
is idempotent: register_mcp_servers() short-circuits on already-connected
servers, so subsequent ticks in the same scheduler process pay ~0ms.
Scoped to the LLM path only; no_agent script jobs skip it entirely.
Closes#4219.
Makes the in-tree QQ inline keyboards actually light up when the agent
blocks on a dangerous-command approval. Matches the cross-adapter
gateway contract already implemented by Discord, Telegram, Slack,
Matrix, and Feishu.
Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval
and falls back to a text prompt when it's missing. Without this wiring,
QQ users stared at plain '/approve' text even though the adapter shipped
button primitives.
### send_exec_approval(chat_id, command, session_key, description, metadata)
Matches the signature the gateway calls with. Builds an ApprovalRequest
(command_preview, description, timeout) and delegates to send_approval_request.
Uses the last inbound msg_id as reply_to so QQ accepts the passive
message. The 'metadata' parameter is accepted for contract parity but
intentionally unused — QQ doesn't have thread_id/DM-targeting overrides.
### send_update_prompt(chat_id, prompt, default, session_key, metadata)
Signature updated to match the cross-adapter contract used by
'hermes update --gateway' watcher. Renders a 'Update Needs Your Input'
prompt with the optional default hint and a Yes/No keyboard. Replaces
the earlier 3-arg helper that wasn't wired anywhere.
### Default interaction dispatcher
_default_interaction_dispatch() auto-registered as the adapter's
interaction callback in __init__. Routes:
- approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval
Button → choice mapping:
allow-once → 'once'
allow-always → 'always'
deny → 'deny'
(QQ's 3-button mobile layout deliberately collapses 'session' + 'always'
into one button; /approve session text fallback remains available.)
- update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response
(the detached 'hermes update --gateway' watcher polls this file)
- anything else → logged and dropped
Resolve exceptions are caught and logged — never propagate into the WS
loop. Callers can override via set_interaction_callback() to route
clicks elsewhere or pass None to drop them entirely.
### Net effect
QQ users now get native tap-to-approve UX on dangerous-command prompts
and update-confirmation prompts, without having to type /approve or /deny
as text. The adapter hooks into tools.approval the same way every other
button-capable platform does.
### Tests
14 new tests cover:
- Default callback installed on __init__
- send_exec_approval / send_update_prompt exist as class methods (so the
gateway's type-probe detects them)
- allow-once/always/deny each map to the correct resolve choice
- update_prompt:y / update_prompt:n each write atomically to the response
file (via monkeypatched get_hermes_home)
- Unknown button_data / empty button_data / resolve exceptions are harmless
- send_exec_approval honours last_msg_id reply-to and accepts metadata
- send_update_prompt delegates with correct content + keyboard
Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc).
Also ran tools/test_approval.py alongside — no regressions (276 passed
combined).
Co-authored-by: WideLee <limkuan24@gmail.com>
_scan_cron_prompt ran at cron create/update time on the user-supplied
prompt but skill content loaded inside _build_job_prompt at runtime
was never scanned. Combined with non-interactive auto-approval, a
malicious skill carrying an injection payload could execute with full
tool access every tick.
- cron/scheduler.py: new CronPromptInjectionBlocked exception and
_scan_assembled_cron_prompt helper. _build_job_prompt now routes
both return paths (with skills / without skills) through the helper,
raising on match. run_job catches the exception and returns a clean
(False, blocked_doc, "", error) tuple so the operator sees a BLOCKED
delivery with the scanner result and an audit hint, rather than a
scheduler crash or a silent skip.
- tests/cron/test_cron_prompt_injection_skill.py: 10 regression tests.
Unit coverage on _scan_assembled_cron_prompt (clean/injection/exfil/
invisible-unicode). End-to-end coverage via _build_job_prompt with
planted skills (injection payload, env exfil, zero-width space,
clean control, missing-skill-doesn't-crash). Fixture patches
tools.skills_tool.SKILLS_DIR / HERMES_HOME so planted skills are
visible. Importantly uses the current cron.scheduler module object
(not a top-level import) so tests don't break when other fixtures
reload cron.scheduler — CronPromptInjectionBlocked identity depends
on which module object defined it.
For every connected MCP server we register four "utility" tool schemas
(mcp_<server>_list_resources, read_resource, list_prompts, get_prompt).
The existing gate was `hasattr(server.session, method)` — but
`mcp.ClientSession` defines all four methods on the class regardless of
what the remote server supports, so the gate never filtered anything.
Tools-only servers (e.g. @upstash/context7-mcp which advertises only
`tools`) ended up with 4 dead stubs; every model call to them returned
JSON-RPC -32601 Method not found, which made the model conclude the
server was broken even when the real tools worked.
Capture the `InitializeResult` returned by `await session.initialize()`
on the `MCPServerTask`, then gate each utility schema on the
corresponding `capabilities` sub-object (resources / prompts). A
legacy `hasattr` fallback runs when `initialize_result` is missing
(older test fixtures / not-yet-captured code paths) so pre-existing
behavior is preserved.
Verified against real `mcp.types.InitializeResult` pydantic models:
- Context7 shape (tools only) → 0 utility stubs registered (was 4)
- Resources-only server → 2 stubs (list_resources, read_resource)
- Prompts-only server → 2 stubs (list_prompts, get_prompt)
- Fully capable server → all 4 stubs
Closes#18051.
Co-authored-by: nikolay-bratanov <nikolay-bratanov@users.noreply.github.com>
Follow-up to the previous commit:
- Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1,
ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is
treated as non-loopback since unset usually means public default bind.
- Fix mixed-indent comment in the safety rail (comment now aligned with
the if-block) and collapse the nested-if into one condition.
- Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN
IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level
parametrized coverage of _is_loopback_host for spellings we can't bind
in the hermetic test env (::1, ip6-localhost, ip6-loopback).
- Pin test_connect_starts_server + test_webhook_deliver_only defaults
to 127.0.0.1 so they keep passing under the new rail.
- Document the behavior in website/docs/user-guide/messaging/webhooks.md.
When a user replies while quoting another message, QQ sets
'message_type = 103' and pushes the referenced message's content +
attachments inside 'msg_elements[0]'. The old adapter ignored
msg_elements entirely, so:
- Bare quote-replies (no user text) surfaced nothing to the LLM.
- Quoted images/files/voice were never downloaded or described.
- Quoted voice messages specifically produced no transcript — the model
had no way to see what the user was referring to when saying 'about
this voice note…'.
This commit adds _process_quoted_context(d) which extracts msg_elements,
unions their attachments, and runs them through the SAME
_process_attachments pipeline as the main message body. Quoted voice
gets an STT transcript (tried via QQ's asr_refer_text first, then the
configured STT provider); quoted images get cached just like main-body
images; quoted files surface with their original filename intact (not
the CDN URL hash).
The quoted content is prepended to the user's text as a '[Quoted message]:'
block so the LLM sees the full referential context on one turn.
Images-only quotes surface a '[Quoted message]: (image)' marker so the
model knows an image was referenced even if no text came with it.
All four inbound handlers (_handle_c2c_message, _handle_group_message,
_handle_guild_message, _handle_dm_message) now call the helper uniformly
— one merge pattern, not four divergent implementations.
Filename preservation is carried by _process_attachments' existing
'[Attachment: {filename or ct}]' line; nothing else needed for that.
12 new tests under TestProcessQuotedContext and TestMergeQuoteInto cover:
- Non-quote messages short-circuit to empty
- message_type=103 with no msg_elements is harmless
- Text-only quotes render with '[Quoted message]:' prefix
- Voice attachments in the quote flow through STT
- File attachments in the quote preserve the original filename
- Image attachments surface cached paths + media types
- Images-only quote still emits a marker
- Multiple msg_elements are concatenated
- Malformed message_type values return empty
- _merge_quote_into prepends with a blank-line separator
Full qqbot suite: 130 passed (72 existing + 19 chunked + 27 keyboards
+ 12 quoted).
Co-authored-by: WideLee <limkuan24@gmail.com>
The QQ Bot v2 API supports inline keyboards on outbound messages. When a
user taps a button, the platform dispatches an INTERACTION_CREATE
gateway event; the bot ACKs it via PUT /interactions/{id} and decodes
the button's data payload to route the click.
This commit adds:
New module gateway/platforms/qqbot/keyboards.py
- Inline-keyboard dataclasses (InlineKeyboard, KeyboardRow, KeyboardButton,
KeyboardButtonAction, KeyboardButtonRenderData, KeyboardButtonPermission)
that serialize to the JSON shape the QQ API expects.
- build_approval_keyboard(session_key) — 3-button layout:
✅ 允许一次 / ⭐ 始终允许 / ❌ 拒绝, all sharing group_id='approval'
so clicking one greys out the rest.
- build_update_prompt_keyboard() — Yes/No keyboard for update confirms.
- parse_approval_button_data() / parse_update_prompt_button_data() —
decode the button_data payload from INTERACTION_CREATE.
approve:<session_key>:<decision> (decision = allow-once|allow-always|deny)
update_prompt:<answer> (answer = y|n)
- build_approval_text(ApprovalRequest) — markdown renderer for the
surrounding message body (exec-approval and plugin-approval variants,
with severity icons 🔴/🔵/🟡).
- parse_interaction_event(raw) → InteractionEvent dataclass — normalizes
the nested raw payload (id / scene / openids / button_data / etc.).
Adapter changes (gateway/platforms/qqbot/adapter.py)
- _dispatch_payload routes INTERACTION_CREATE → _on_interaction.
- _on_interaction parses the event, ACKs via PUT /interactions/{id}, then
invokes a user-registered interaction callback. Exceptions from the
callback are caught and logged (never propagate into the WS loop).
- set_interaction_callback(cb) lets gateway wiring register a routing
handler that inspects button_data and resolves the corresponding
pending approval / update prompt.
- _send_c2c_text / _send_group_text now accept an optional keyboard kwarg
and append it to the outbound body.
- send_with_keyboard(chat_id, content, keyboard, reply_to=None) — public
helper that sends a single short message with a keyboard attached.
Does NOT chunk-split (a keyboard message has one interactive surface).
Guild chats are rejected non-retryably — they don't support keyboards.
- send_approval_request(chat_id, ApprovalRequest, reply_to=None) +
send_update_prompt(chat_id, content, reply_to=None) — convenience
wrappers over send_with_keyboard.
Tests
27 new unit tests under TestApprovalButtonData, TestUpdatePromptButtonData,
TestBuildApprovalKeyboard, TestBuildUpdatePromptKeyboard, TestBuildApprovalText,
TestInteractionEventParsing, and TestAdapterInteractionDispatch. Cover:
- Button-data round-trip (build → parse returns original session/decision)
- Keyboard JSON shape + mutual-exclusion group_id
- Exec vs plugin approval text templates + severity icons
- Interaction event parsing (c2c / group / guild scene codes)
- _on_interaction end-to-end: ACK invoked, callback receives parsed event,
callback exceptions are swallowed, missing id skips ACK, no registered
callback is harmless.
Full qqbot suite: 118 passed (72 existing + 19 chunked + 27 keyboards).
Co-authored-by: WideLee <limkuan24@gmail.com>
The v2 'single POST /v2/{users|groups}/{id}/files' upload path is capped
at ~10 MB inline (base64 'file_data' or 'url'). For larger files the QQ
platform provides a three-step flow:
1. POST /upload_prepare → upload_id + pre-signed COS part URLs
2. PUT each part to its COS URL → POST /upload_part_finish
3. POST /files with {upload_id} → file_info token
This commit adds a new gateway/platforms/qqbot/chunked_upload.py module
that implements the flow, wires it into QQAdapter._send_media for local
files (URL uploads keep the existing inline path), and introduces
structured exceptions so the caller can surface actionable error text:
- UploadDailyLimitExceededError (biz_code 40093002, non-retryable)
- UploadFileTooLargeError (file exceeds the platform limit)
Both carry file_name / file_size_human / limit_human so the model can
compose user-friendly replies instead of seeing opaque HTTP codes.
The part_finish 40093001 retryable-error loop respects the server-
provided retry_timeout (capped at 10 minutes locally) with a 1 s
polling interval. COS PUTs retry transient failures up to 2 times
with exponential backoff. complete_upload retries up to 2 times.
Covers files up to the platform's ~100 MB per-file limit; before this
the adapter silently rejected anything over ~10 MB.
19 new unit tests under TestChunkedUpload* cover the happy path,
prepare-response parsing, helper functions, part retries, COS PUT
retries, group vs c2c routing, and the structured-error mapping.
Co-authored-by: WideLee <limkuan24@gmail.com>
Adds a per-task override for the consecutive-failure circuit breaker,
so individual tasks can opt out of the global ``kanban.failure_limit``
without dragging everyone else with them.
Resolution order (now three tiers):
1. per-task ``max_retries`` (new, this commit)
2. caller-supplied ``failure_limit`` — the gateway threads
``kanban.failure_limit`` from config here
3. ``DEFAULT_FAILURE_LIMIT`` (2)
Changes:
- ``tasks.max_retries INTEGER`` column + migration for existing DBs
(NULL = no override, matches pre-column behavior).
- ``Task.max_retries`` field + ``from_row`` plumbing.
- ``create_task(..., max_retries=N)`` kwarg.
- ``_record_task_failure`` reads the per-task value first and records
``limit_source`` + ``effective_limit`` on the ``gave_up`` event so
operators can see which tier won.
- CLI: ``hermes kanban create --max-retries N`` (rejects ``< 1``).
- CLI: ``hermes kanban show`` surfaces the effective threshold +
source (``(task)``, ``(config kanban.failure_limit)``, ``(default)``).
- CLI: ``_task_to_dict`` includes ``max_retries`` in ``--json`` output.
Key design choice vs. the earlier #20972 attempt:
- No new config key. The existing ``kanban.failure_limit`` (landed in
#21183) is the dispatcher-tier source — no silent break for users
who already tuned it.
- No ``!=`` sentinel for "is config set" (which would misfire when
config equals the default). The tier-winner is determined purely
by "is per-task override set" — the dispatcher always wins when
per-task is NULL, regardless of whether the caller passed the
default or a configured value.
E2E verified across four scenarios: default-only (trips at 2),
config-only (trips at caller's value), per-task-only beats default
(trips at task value), per-task beats larger config (trips at task
value). ``gave_up`` event metadata correctly records ``limit_source``
and ``effective_limit`` in all cases.
Tests:
- ``test_per_task_max_retries_overrides_dispatcher_limit`` — task=1
beats caller=10.
- ``test_per_task_max_retries_allows_more_than_default`` — task=5
does not trip at caller=default of 2.
- ``test_max_retries_none_falls_through_to_dispatcher_limit`` — None
honors caller's config value (4), records ``limit_source=dispatcher``.
Full kanban trio (db + core + cli + tools + dashboard-plugin): 342
passed, no regressions.
Supersedes: #20972 (@jelrod27) — credit in PR close comment.
Ref: #20263 (tangentially — the reporter asked about adapter API
drift, not retry caps, but the CLI discussion there is what
surfaced the original ask).
PairingStore.approve_code() didn't consult _is_locked_out(), so after
MAX_FAILED_ATTEMPTS bad approvals the lockout flag was set but a valid
code still got accepted — any pending code (legitimately issued or
attacker-obtained) could be approved during the 1-hour lockout window,
nullifying the brute-force protection.
- gateway/pairing.py: lockout check runs in approve_code() right after
_cleanup_expired, before the pending lookup. Returns None on lockout.
- tests/gateway/test_pairing.py: test_lockout_blocks_code_approval pins
the regression — reporter's exact reproducer (generate valid code,
exhaust attempts with WRONGCODE, try to approve valid code) must
return None and leave is_approved == False. Also pins recovery: once
lockout expires, the still-pending code approves normally.
- hermes_cli/pairing.py: _cmd_approve distinguishes the two None cases.
On lockout, prints 'Platform locked out... clears in N minutes. To
reset sooner, delete the _lockout:<platform> entry from
_rate_limits.json' instead of the misleading 'Code not found or
expired' message. 29/29 pairing tests pass; E2E-verified with
reporter's exact Python reproducer.
Adds Google Chat as a new gateway platform, shipped under
plugins/platforms/google_chat/ following the canonical bundled-plugin
pattern (Teams, IRC). Rewired from the original PR #18425 to use the
new env_enablement_fn + cron_deliver_env_var plugin interfaces landed
in the preceding commit, so the adapter touches ZERO core files.
What it does:
- Inbound DM + group messages via Cloud Pub/Sub pull subscription (no
public URL needed), with attachments (PDFs, images, audio, video)
downloaded through an SSRF-guarded Google-host allowlist.
- Outbound text replies with the 'Hermes is thinking…' patch-in-place
pattern — no tombstones.
- Native file attachment delivery via per-user OAuth. Google Chat's
media.upload endpoint rejects service-account auth, so each user
runs /setup-files once in their own DM to grant
chat.messages.create for themselves; the adapter then uploads as
them. Tokens stored per email at
~/.hermes/google_chat_user_tokens/<email>.json.
- Thread isolation: side-threads get isolated sessions, top-level DM
messages share one continuous session. Persistent thread-count
store survives gateway restart.
- Supervisor reconnect with exponential backoff.
- Multi-user out of the box.
How it plugs in (no core edits):
- env_enablement_fn seeds PlatformConfig.extra with project_id,
subscription_name, service_account_json, and the home_channel dict
(which the core hook turns into a HomeChannel dataclass). Reads
GOOGLE_CHAT_PROJECT_ID (falls back to GOOGLE_CLOUD_PROJECT),
GOOGLE_CHAT_SUBSCRIPTION_NAME (falls back to GOOGLE_CHAT_SUBSCRIPTION),
GOOGLE_CHAT_SERVICE_ACCOUNT_JSON (falls back to
GOOGLE_APPLICATION_CREDENTIALS), GOOGLE_CHAT_HOME_CHANNEL.
- cron_deliver_env_var='GOOGLE_CHAT_HOME_CHANNEL' gets cron delivery
for free — cron/scheduler.py consults the platform registry for any
name not in its hardcoded built-in sets.
- plugin.yaml's rich requires_env / optional_env blocks auto-populate
OPTIONAL_ENV_VARS via the new hermes_cli/config.py injector, so
'hermes config' UI surfaces them with description / url / prompt /
password metadata.
- Module-level Platform('google_chat') call in adapter.py triggers the
Platform._missing_() registration so Platform.GOOGLE_CHAT attribute
access works without an enum entry.
Distribution: ships inside the existing hermes-agent package. Users
opt in via 'pip install hermes-agent[google_chat]' and follow the
8-step GCP walkthrough at
website/docs/user-guide/messaging/google_chat.md.
Test coverage: 153 tests in tests/gateway/test_google_chat.py, all
passing. Spans platform registration, env config loading, Pub/Sub
envelope routing, outbound send + chunking + typing patch-in-place,
attachment send paths, SSRF guard, thread/session model,
supervisor reconnect, authorization, per-user OAuth, and the new
plugin-registry cron delivery wiring.
Credit: adapter + OAuth + tests + docs authored by @donramon77
(PR #18425). Rewire onto the new plugin hooks + salvage commit by
Teknium.
Co-Authored-By: Ramón Fernández <112875006+donramon77@users.noreply.github.com>
MCP tool results can include ImageContent blocks (screenshots from
Playwright/Blockbench/Puppeteer etc). The tool result handler only
extracted block.text, so image blocks were silently dropped and the
agent saw an empty or text-only response — losing the actual payload.
Add _cache_mcp_image_block() that base64-decodes the block, validates
the bytes via gateway.platforms.base.cache_image_from_bytes (which
sniffs for PNG/JPEG/WebP signatures and rejects non-images), writes to
the shared `~/.hermes/cache/images/` dir, and returns a MEDIA:<path>
tag. The handler appends that tag to the result parts so downstream
gateway adapters render the image inline.
Logs and drops on malformed base64 / non-image payload rather than
raising — a single bad block shouldn't kill the tool call.
Distilled from #17915 (c3115644151) and #10848 (gnanirahulnutakki), both
too stale to cherry-pick (branches diverged enough to revert dozens of
unrelated fixes). Went with #10848's approach of plumbing through
Hermes' existing MEDIA tag / cache_image_from_bytes infrastructure
rather than #17915's raw tempfile path, because it integrates with the
remote-backend mount system and messaging adapters that already handle
MEDIA tags natively.
Co-authored-by: c3115644151 <c3115644151@users.noreply.github.com>
Co-authored-by: gnanirahulnutakki <gnanirahulnutakki@users.noreply.github.com>
* fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run
On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException`
(not `Exception`), so the broad `except Exception as exc:` in
`MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation
from gateway restart / explicit `task.cancel()` silently escaped past
the reconnect logic — the MCP server task died without going through
the shutdown/reconnect code paths that check `_shutdown_event`.
Add an explicit `except asyncio.CancelledError: raise` before the broad
catch so cancellation propagation is self-documenting rather than an
accident of exception hierarchy, and future sibling-site work (e.g.
distinguishing shutdown-cancel from transport-cancel) has an obvious
hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception
subclass is also corrected: the old path would have caught it and
treated it as a connection failure worth retrying.
Closes#9930.
* fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport
Two surgical correctness bugs in the SSE branch of MCPServerTask._run_http,
distilled from @amiller's PR #5981 that couldn't be cherry-picked wholesale
(branch too stale).
1. sse_read_timeout was set to the tool timeout (default 60s). That's the
wrong dimension — it governs how long sse_client will wait between
events on the SSE stream, not per-call latency. SSE servers routinely
hold the stream idle for minutes between events; a 60s read timeout
drops the connection after the first slow stretch (Router Teamwork,
Supermemory on Cloudflare Workers idle-disconnect at ~60s). Bump to
300s to match the Streamable HTTP path's httpx read timeout.
2. OAuth auth was built via get_manager().get_or_build_provider() but
never forwarded to sse_client. SSE MCP servers behind OAuth 2.1 PKCE
would silently fail with 401s on every request.
Keepalive (the other half of #5981) intentionally left for a follow-up —
it's a real improvement but a bigger change, and these two are obvious
corrections to ship now. Credits to @amiller.
Co-authored-by: Andrew Miller <socrates1024@gmail.com>
---------
Co-authored-by: Andrew Miller <socrates1024@gmail.com>
On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException`
(not `Exception`), so the broad `except Exception as exc:` in
`MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation
from gateway restart / explicit `task.cancel()` silently escaped past
the reconnect logic — the MCP server task died without going through
the shutdown/reconnect code paths that check `_shutdown_event`.
Add an explicit `except asyncio.CancelledError: raise` before the broad
catch so cancellation propagation is self-documenting rather than an
accident of exception hierarchy, and future sibling-site work (e.g.
distinguishing shutdown-cancel from transport-cancel) has an obvious
hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception
subclass is also corrected: the old path would have caught it and
treated it as a connection failure worth retrying.
Closes#9930.
PR #21238 introduced top-level `allOf: [{if/then/required}]` blocks in the
built-in memory tool's parameters schema as conditional-required hints.
Two problems:
1. OpenAI's Codex backend (chatgpt.com/backend-api/codex, gpt-5.x) rejects
top-level `allOf`/`anyOf`/`oneOf`/`enum`/`not` outright with a
non-retryable 400 — affected every user on openai-codex/gpt-5.x.
2. The `if/then` hints were silently ignored by every other provider
(Chat Completions doesn't honour them on function schemas), so they
never actually enforced anything anywhere.
The runtime handler in `memory_tool()` already validates the per-action
required fields and returns actionable error messages, so removing the
block changes nothing behaviourally.
Paired with the defense-in-depth sanitizer in the previous commit, this
closes the bug both at the source (schema no longer emits the forbidden
form) and at the wire boundary (sanitizer strips it if anything else
re-introduces it).
- Rewrites `tests/tools/test_memory_tool_schema.py` to guard against
regressing the forbidden-combinator shape instead of asserting it.
- Adds AUTHOR_MAP entry for @hrkzogw (author of the sanitizer fix).
Mirrors the Slack `allowed_channels` feature (PR #7401) and Discord's
`allowed_channels` (PR #7044) across the remaining group-capable platforms.
All five platforms (Slack + Discord + the four added here) now follow the
same pattern: primary config via config.yaml, env-var fallback as an escape
hatch — matching the project policy that .env is for secrets only and
behavioral settings belong in config.yaml.
Also fixes a duplicate `slack` key in DEFAULT_CONFIG introduced by PR
#7401 (the later entry silently overwrote `allowed_channels`, `require_mention`,
and `free_response_channels` at dict-literal evaluation time).
Platforms added:
- Telegram: `telegram.allowed_chats` (env alias: `TELEGRAM_ALLOWED_CHATS`)
- Mattermost: `mattermost.allowed_channels` (env alias: `MATTERMOST_ALLOWED_CHANNELS`)
- Matrix: `matrix.allowed_rooms` (env alias: `MATRIX_ALLOWED_ROOMS`)
- DingTalk: `dingtalk.allowed_chats` (env alias: `DINGTALK_ALLOWED_CHATS`)
Mattermost and Matrix previously had NO config.yaml bridging for any of
their gating settings; this PR adds `load_gateway_config` bridges for them
(Mattermost gets require_mention + free_response_channels + allowed_channels;
Matrix gets allowed_rooms on top of its existing bridges for require_mention
and free_response_rooms).
Semantics identical everywhere:
- Empty = no restriction (fully backward compatible).
- Non-empty = hard whitelist: non-listed chats are silently ignored,
even when the bot is @mentioned.
- DMs bypass the check entirely.
DEFAULT_CONFIG merges the duplicate `slack` block and adds new `mattermost`
and `matrix` blocks so all gating settings surface in defaults.
Not included: Feishu (has its own per-chat `chat_rules` system that covers
this use case differently), WhatsApp (already has `group_allow_from` via
`group_policy: allowlist`), pure-DM platforms (Signal, SMS, BlueBubbles,
Yuanbao — no group concept).
When the parent agent uses a composite toolset like hermes-cli, calling
delegate_task with individual toolsets (e.g. web, terminal) resulted in
zero tools because the name-based intersection failed: 'web' != 'hermes-cli'.
Add _expand_parent_toolsets() which collects all tool names from parent
toolsets, then recognises any individual toolset whose tools are a subset
of the parent's available tools. This allows delegate_task(toolsets=['web'])
to work correctly when the parent has hermes-cli enabled.
Fixes#19447
Add tencent/hy3-preview (without :free suffix) as a paid model route
alongside the existing free variant. This allows seamless transition
when the model moves from free to paid on OpenRouter — both routes
coexist so neither side's timing causes breakage.
Changes:
- models.py: add ("tencent/hy3-preview", "") to OPENROUTER_MODELS
- model-catalog.json: add paid variant entry
- tests: add assertions for paid route presence
The :free entry can be removed in a follow-up PR once OpenRouter
confirms the free route is deprecated.
Co-authored-by: simonweng <simonweng@tencent.com>
Some exception classes (e.g. anyio.ClosedResourceError) are raised without
a message argument, so str(exc) returns an empty string. The existing error
format f'{type(exc).__name__}: {exc}' would produce messages like
'MCP call failed: ClosedResourceError: ' with nothing after the colon.
Add _exc_str() helper that falls back to repr(exc) when str(exc) is empty,
and apply it to all 6 MCP error formatting sites (5 tool/prompt/resource
handlers + 1 sampling handler).
Fixes#19417
Treat closed-resource, closed-transport, broken-pipe, and EOF MCP failures as stale session equivalents so the existing reconnect/retry-once path can recover. Add regression coverage for the stale-pipe marker variants.\n\nChecks:\n- python -m py_compile tools/mcp_tool.py tests/tools/test_mcp_tool_session_expired.py\n- python -m pytest tests/tools/test_mcp_tool_session_expired.py -q -o addopts=\n- selected secret scan over touched files
Track elapsed wall time in _run_on_mcp_loop, cancel the in-flight future when a timeout expires, and raise a descriptive TimeoutError that includes the elapsed and configured timeout. Add regression coverage for the new timeout diagnostics.