* fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run
On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException`
(not `Exception`), so the broad `except Exception as exc:` in
`MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation
from gateway restart / explicit `task.cancel()` silently escaped past
the reconnect logic — the MCP server task died without going through
the shutdown/reconnect code paths that check `_shutdown_event`.
Add an explicit `except asyncio.CancelledError: raise` before the broad
catch so cancellation propagation is self-documenting rather than an
accident of exception hierarchy, and future sibling-site work (e.g.
distinguishing shutdown-cancel from transport-cancel) has an obvious
hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception
subclass is also corrected: the old path would have caught it and
treated it as a connection failure worth retrying.
Closes#9930.
* fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport
Two surgical correctness bugs in the SSE branch of MCPServerTask._run_http,
distilled from @amiller's PR #5981 that couldn't be cherry-picked wholesale
(branch too stale).
1. sse_read_timeout was set to the tool timeout (default 60s). That's the
wrong dimension — it governs how long sse_client will wait between
events on the SSE stream, not per-call latency. SSE servers routinely
hold the stream idle for minutes between events; a 60s read timeout
drops the connection after the first slow stretch (Router Teamwork,
Supermemory on Cloudflare Workers idle-disconnect at ~60s). Bump to
300s to match the Streamable HTTP path's httpx read timeout.
2. OAuth auth was built via get_manager().get_or_build_provider() but
never forwarded to sse_client. SSE MCP servers behind OAuth 2.1 PKCE
would silently fail with 401s on every request.
Keepalive (the other half of #5981) intentionally left for a follow-up —
it's a real improvement but a bigger change, and these two are obvious
corrections to ship now. Credits to @amiller.
Co-authored-by: Andrew Miller <socrates1024@gmail.com>
---------
Co-authored-by: Andrew Miller <socrates1024@gmail.com>
On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException`
(not `Exception`), so the broad `except Exception as exc:` in
`MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation
from gateway restart / explicit `task.cancel()` silently escaped past
the reconnect logic — the MCP server task died without going through
the shutdown/reconnect code paths that check `_shutdown_event`.
Add an explicit `except asyncio.CancelledError: raise` before the broad
catch so cancellation propagation is self-documenting rather than an
accident of exception hierarchy, and future sibling-site work (e.g.
distinguishing shutdown-cancel from transport-cancel) has an obvious
hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception
subclass is also corrected: the old path would have caught it and
treated it as a connection failure worth retrying.
Closes#9930.
PR #21238 introduced top-level `allOf: [{if/then/required}]` blocks in the
built-in memory tool's parameters schema as conditional-required hints.
Two problems:
1. OpenAI's Codex backend (chatgpt.com/backend-api/codex, gpt-5.x) rejects
top-level `allOf`/`anyOf`/`oneOf`/`enum`/`not` outright with a
non-retryable 400 — affected every user on openai-codex/gpt-5.x.
2. The `if/then` hints were silently ignored by every other provider
(Chat Completions doesn't honour them on function schemas), so they
never actually enforced anything anywhere.
The runtime handler in `memory_tool()` already validates the per-action
required fields and returns actionable error messages, so removing the
block changes nothing behaviourally.
Paired with the defense-in-depth sanitizer in the previous commit, this
closes the bug both at the source (schema no longer emits the forbidden
form) and at the wire boundary (sanitizer strips it if anything else
re-introduces it).
- Rewrites `tests/tools/test_memory_tool_schema.py` to guard against
regressing the forbidden-combinator shape instead of asserting it.
- Adds AUTHOR_MAP entry for @hrkzogw (author of the sanitizer fix).
When the parent agent uses a composite toolset like hermes-cli, calling
delegate_task with individual toolsets (e.g. web, terminal) resulted in
zero tools because the name-based intersection failed: 'web' != 'hermes-cli'.
Add _expand_parent_toolsets() which collects all tool names from parent
toolsets, then recognises any individual toolset whose tools are a subset
of the parent's available tools. This allows delegate_task(toolsets=['web'])
to work correctly when the parent has hermes-cli enabled.
Fixes#19447
Some exception classes (e.g. anyio.ClosedResourceError) are raised without
a message argument, so str(exc) returns an empty string. The existing error
format f'{type(exc).__name__}: {exc}' would produce messages like
'MCP call failed: ClosedResourceError: ' with nothing after the colon.
Add _exc_str() helper that falls back to repr(exc) when str(exc) is empty,
and apply it to all 6 MCP error formatting sites (5 tool/prompt/resource
handlers + 1 sampling handler).
Fixes#19417
Treat closed-resource, closed-transport, broken-pipe, and EOF MCP failures as stale session equivalents so the existing reconnect/retry-once path can recover. Add regression coverage for the stale-pipe marker variants.\n\nChecks:\n- python -m py_compile tools/mcp_tool.py tests/tools/test_mcp_tool_session_expired.py\n- python -m pytest tests/tools/test_mcp_tool_session_expired.py -q -o addopts=\n- selected secret scan over touched files
Track elapsed wall time in _run_on_mcp_loop, cancel the in-flight future when a timeout expires, and raise a descriptive TimeoutError that includes the elapsed and configured timeout. Add regression coverage for the new timeout diagnostics.
Fixes#9930
When an agent session is interrupted (Ctrl+C or gateway timeout), the
current thread's interrupt flag is set in _interrupted_threads. asyncio
executor threads are pooled and reused across sessions, so a thread that
carried an interrupt flag from a prior session will immediately cancel
any new asyncio work dispatched to it — including MCP server discovery.
Fix: in register_mcp_servers(), temporarily clear the interrupt flag on
the current thread before running _discover_all(), then restore it
afterward in a finally block so the original interrupt state is not lost.
Image generation plugins were dispatched without a model name, leaving
the plugin to pick its default. Users on OpenRouter, ComfyUI, or custom
backends had no way to select a specific model through config — they
had to fork the plugin or patch the tool.
Add _read_configured_image_model() that reads image_gen.model from the
active profile's config.yaml and forwards it into
_dispatch_to_plugin_provider(). When model is set, the plugin call
gains a 'model' kwarg; when unset, the plugin falls back to its own
default, so single-model users see no behavior change.
Example config:
image_gen:
provider: openrouter
model: flux-pro
Tests: all 170 image tool tests pass. The new code path is opt-in via
config and no existing test exercises it, so the change is strictly
additive.
Two fixes for the local terminal backend on Windows (Git Bash):
1. `_drain()` in base.py: `select.select()` only works on sockets on
Windows, not pipe file descriptors. On Windows, use blocking
`os.read()` in the daemon thread instead. EOF arrives promptly
when bash exits, so this is safe.
2. `_run_bash()` in local.py: When `self.cwd` is updated from `pwd`
output, it contains Git Bash-style paths (`/c/Users/...`).
`subprocess.Popen(cwd=...)` needs a native Windows path
(`C:\Users\...`). Added a conversion before Popen.
Without these fixes, all terminal() calls on Windows return empty
output (exit code 126), and cwd tracking breaks.
Tested on Windows 11 with Git for Windows + Python 3.13.
Fixes#14638
Lists the skills sitting in ~/.hermes/skills/.archive/ so users have
something to pass to `hermes curator restore`. `curator status` already
shows counts; this fills the name-discovery gap.
Archive layout is flat (`archive_skill` writes to `.archive/<skill>/`),
so the directory name IS the skill name — no frontmatter parsing
needed. Timestamped collision directories (`<skill>-<ts>`) are listed
literally; user can still pass them to `restore`.
Reshape of @EvilDrag0n's #20651, simplified: drop the frontmatter
rglob + preamble/trailer output + duplicate subcommand registration.
Co-authored-by: EvilDrag0n <lxl694522264@gmail.com>
Cloud metadata endpoints (169.254.169.254 etc.) are now always blocked
by browser_navigate regardless of hybrid routing, allow_private_urls,
or backend.
Bug: commit 42c076d3 (#16136) added hybrid routing that flips
auto_local_this_nav=True for private URLs and short-circuits
_is_safe_url(). IMDS endpoints are technically private (169.254/16
link-local), so the sidecar happily routed them to a local Chromium,
and the agent could read IAM credentials via browser_snapshot. On
EC2/GCP/Azure this is a full SSRF-to-credential-theft.
Fix: new is_always_blocked_url() in url_safety.py — a narrow floor
that checks _BLOCKED_HOSTNAMES, _ALWAYS_BLOCKED_IPS,
_ALWAYS_BLOCKED_NETWORKS only. Applied as an independent gate in
browser_navigate's pre-nav and post-redirect checks, BEFORE
auto_local_this_nav gets a chance to short-circuit. Ordinary private
URLs (localhost, 192.168.x, 10.x, .local, CGNAT) still route to the
local sidecar as the #16136 feature intends.
Secondary fix (reporter's finding): _url_is_private() now explicitly
checks 172.16.0.0/12. ipaddress.is_private only covers that range on
Python ≥3.11 (bpo-40791), so on 3.10 runtimes those URLs were routed
to cloud instead of the local sidecar. No security impact — just a
correctness fix for the hybrid-routing feature.
Closes#16234.
Add support for MCP servers using the SSE transport protocol
(SseServerTransport) alongside the existing Streamable HTTP and stdio
transports. Many MCP servers use SSE (GET /sse + POST /messages/)
which was previously unsupported -- the client silently fell back to
Streamable HTTP, causing 10s connection timeouts.
Changes:
- Import mcp.client.sse.sse_client with graceful fallback
- Check config.get('transport') == 'sse' in _run_http() to select
the SSE transport path with proper timeout handling
- Read transport type from config in get_mcp_status() instead of
hardcoding 'http' for URL-based servers
- Update docstring, example config, and feature list
The MCP SDK discovers OAuth server metadata (token_endpoint, etc.) on
demand and keeps it in memory only. Without disk persistence, a restart
with valid cached refresh tokens forces the SDK to fall back to the
guessed '{server_url}/token' path — which returns 404 on most real
providers (Notion, Atlassian, GitHub remote MCP, etc.) and triggers a
full browser re-authorization even though the refresh token is fine.
Add a .meta.json file next to the existing tokens/client_info files:
HERMES_HOME/mcp-tokens/<server>.json -- tokens (existing)
HERMES_HOME/mcp-tokens/<server>.client.json -- client info (existing)
HERMES_HOME/mcp-tokens/<server>.meta.json -- oauth metadata (new)
Changes:
- HermesTokenStorage.save_oauth_metadata / load_oauth_metadata / _meta_path
— disk layer for the discovered OAuthMetadata.
- HermesTokenStorage.remove() now also clears .meta.json so
'hermes mcp remove <name>' and the manager's remove() path clean up fully.
- HermesMCPOAuthProvider._initialize cold-restores from disk before the
existing pre-flight discovery runs. If disk has metadata we skip the
discovery HTTP round-trips entirely.
- HermesMCPOAuthProvider._prefetch_oauth_metadata now persists ASM as
soon as it's discovered, so even the first pre-flight run seeds disk.
- HermesMCPOAuthProvider._persist_oauth_metadata_if_changed() is called
at the end of async_auth_flow so metadata discovered via the SDK's
lazy 401-branch (not pre-flight) is also saved for next time.
Tests cover the storage roundtrip (save/load/missing/corrupt/remove) and
the manager provider path (cold-load restore, skip-when-in-memory,
persist-on-discover, noop-when-unchanged, end-to-end async_auth_flow).
Co-authored-by: nocturnum91 <50326054+nocturnum91@users.noreply.github.com>
Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.
This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.
Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.
Verified the targeted use case: info-graph emits
信息图已生成(...)
[[as_document]]
MEDIA:/tmp/info-graph-x/infographic.jpg
→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).
Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The delegate_task tool schema descriptions referenced 'claude --acp --stdio'
as an example, but Claude Code CLI does not support --acp or --stdio flags.
The ACP subprocess transport (agent/copilot_acp_client.py) is specifically
built for GitHub Copilot CLI ('copilot --acp --stdio').
Changes:
- Per-task acp_command example: 'claude' → 'copilot'
- Top-level acp_command description: remove 'Claude Code' reference,
clarify requirement for ACP-compatible CLI (currently Copilot only)
- acp_args description: remove misleading claude-opus-4-6 example
Fixes#19055
The kanban_heartbeat tool called heartbeat_worker but never
heartbeat_claim, so a worker that loops the tool while a single tool
call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got
reclaimed by release_stale_claims. The function name and
heartbeat_claim's own docstring imply otherwise:
"Workers that know they'll exceed 15 minutes should call this
every few minutes to keep ownership."
But there was no caller in the worker tool path. Workers couldn't
invoke heartbeat_claim themselves either — it isn't exposed as a tool.
Fix: _handle_heartbeat now calls heartbeat_claim first, reading
HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins
this in _default_spawn). Falls back to _claimer_id() for locally-
driven workers that didn't go through dispatcher spawn.
Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires
rewinds claim_expires into the past, calls the tool, and asserts the
new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to
fail against the unfixed code (claim_expires stays at the rewound
value).
Closes the root cause underlying the symptom in #21141 (15-min
respawns of long-running workers). #21141 separately addresses
post-reclaim cleanup; this fixes the upstream "shouldn't have been
reclaimed in the first place" half.
When terminal.backend is docker, inbound documents uploaded via messaging
platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host
path under ~/.hermes/cache/documents, but the container sandbox only sees them
at the auto-mounted /root/.hermes/cache/documents path.
This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the
natural sibling to get_cache_directory_mounts()) and calls it at the
document-context-injection site in gateway/run.py so the agent always receives
a path it can open directly, matching the mount layout already established
by get_cache_directory_mounts() (#4846).
Scope: only Docker backend for now; other backends use different mount
semantics and are left unchanged until verified.
Fixes#18787
_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.
Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.
Mirrors the fix shipped for agent/google_oauth.py in #19673.
Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).
Adds SearXNG as a free, self-hosted web search provider. SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.
## What this adds
- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
`WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key
## Config
```yaml
# Use SearXNG for search, any paid provider for extract
web:
search_backend: "searxng"
extract_backend: "firecrawl"
# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
backend: "searxng"
```
SearXNG is search-only — it does not implement WebExtractProvider. Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).
Closes#19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)
Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.
Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
DEFAULT_CONFIG (empty = inherit from web.backend)
Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
to before — _get_search_backend() falls through to _get_backend()
This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.
12 new tests, 49 existing tests pass with zero regressions.
Ref: #19198
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.
One config line to enable:
browser:
engine: lightpanda
New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda
Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode
Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).
25 new tests + 81 total browser tests pass (0 failures).
README:24 claimed "Six terminal backends" while tools/environments/ exposes
seven top-level backend choices through TERMINAL_ENV: local, docker, ssh,
singularity, modal, daytona, vercel_sandbox. Modal additionally has direct
and Nous-managed modes selected via terminal.modal_mode (the
ManagedModalEnvironment class is a Modal sub-mode, not a separate top-level
backend).
The same drift appeared in five other doc and code-comment sites with
inconsistent counts (six, seven, or implicit) and varying lists. Updated
all sites to a consistent seven-backend list in canonical order. The
configuration guide also clarifies how Modal's two modes are selected so
operators do not search for a non-existent backend: managed_modal value.
CONTRIBUTING.md:160 lists six backend filenames in a code tree but does
not carry the "Six terminal" prose; left out of scope per cohesion sweep
guidance to bundle only identical wording.
Files updated:
- README.md (line 24, marketing copy)
- website/docs/index.md (line 49, landing page)
- website/docs/user-guide/configuration.md (line 86, config guide)
- tools/environments/__init__.py (lines 3-6, package docstring)
- tools/file_operations.py (line 6, module docstring)
- environments/README.md (line 43, RL training docs — TERMINAL_ENV list)
The comment at tools/web_tools.py:700-702 stated the runtime default for
auxiliary.web_extract.timeout is 360s. The actual runtime default is 30s
(_DEFAULT_AUX_TIMEOUT in agent/auxiliary_client.py:3140), used by
_get_task_timeout when no auxiliary.web_extract.timeout key is present in
config.yaml.
The 360s figure is the config template default written by
hermes_cli/config.py:697 into freshly-generated config.yaml files. It only
takes effect when that key exists in the user's config — not as a fallback.
Users on configs that predate commit 20b4060d (Apr 5, 2026), or who removed
the key, fall through to the 30s _DEFAULT_AUX_TIMEOUT runtime default.
The comment was introduced in 20b4060d alongside the template-default bump
from 30 to 360. The runtime default in auxiliary_client.py was not changed
in that commit and has remained 30s since 839d9d74 (Mar 28, 2026).
Workers completing a kanban task can now claim the ids of cards they
created via an optional ``created_cards`` field on ``kanban_complete``.
The kernel verifies each id exists and was created by the completing
worker's profile; any phantom id blocks the completion with a
``HallucinatedCardsError`` and records a
``completion_blocked_hallucination`` event on the task so the rejected
attempt is auditable. Successful completions also get a non-blocking
prose-scan pass over their ``summary`` + ``result`` that emits a
``suspected_hallucinated_references`` event for any ``t_<hex>``
reference that doesn't resolve.
Closes#20017.
Recovery UX (kernel + CLI + dashboard)
--------------------------------------
A structural gate alone isn't enough — operators also need to see and
act on stuck workers, especially when a profile's model is the root
cause. This PR ships the full loop:
* ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that
releases an active worker claim immediately (unlike
``release_stale_claims`` which only acts after claim_expires has
passed). Emits a ``reclaimed`` event with ``manual: True`` payload.
* ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` —
switch a task to a different profile, optionally reclaiming a stuck
running worker in the same call.
* ``hermes kanban reclaim <id> [--reason ...]`` and
``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]``
CLI subcommands wired through to the same helpers.
* ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and
``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the
dashboard plugin.
Dashboard surfacing
-------------------
* ⚠ **warning badge** on cards with active hallucination events.
* **attention strip** at the top of the board listing all flagged
tasks; dismissible per session.
* **events callout** in the task drawer — hallucination events render
with a red left border, amber icon, and phantom ids as styled chips.
* **recovery section** in the task drawer with three actions: Reclaim,
Reassign (with profile picker + reclaim-first checkbox), and a
copy-to-clipboard hint for ``hermes -p <profile> model`` since
profile config lives on disk and can't be edited from the browser.
Auto-opens when the task has warnings, collapsed otherwise.
Keyed by task id so state doesn't leak between drawers.
Active-vs-stale rule: warnings clear when a clean ``completed`` or
``edited`` event supersedes the hallucination, so recovery is never
permanently stigmatising — the audit events persist for debugging but
the badge goes away once the worker succeeds.
Skill updates
-------------
* ``skills/devops/kanban-worker/SKILL.md`` documents the
``created_cards`` contract with good/bad examples.
* ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering
stuck workers" section with the three actions and when to use each.
Tests
-----
* Kernel gate: verified-cards manifest, phantom rejection + audit
event, cross-worker rejection, prose scan positive + negative.
* Recovery helpers: reclaim on running task, reclaim on non-running
returns False, reassign refuses running without reclaim_first,
reassign with reclaim_first succeeds on running.
* API endpoints: warnings field present on /board and /tasks/:id,
warnings cleared after clean completion, reclaim 200 + 409 paths,
reassign 200 + 409 + reclaim_first paths.
* CLI smoke: reclaim + reassign subcommands.
Live-verified end-to-end on a dashboard with seeded scenarios:
attention strip renders, badges land on the right cards, drawer
callout shows phantom chips, Reclaim on a running task flips status to
ready + emits manual reclaimed event + refreshes the drawer,
Reassign swaps the assignee and triggers board refresh.
359/359 kanban-suite tests pass
(test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).
* revert(gateway): remove stale-code self-check and auto-restart
Removes the _detect_stale_code / _trigger_stale_code_restart mechanism
introduced in #17648 and iterated in #19740. On every incoming message
the gateway compared the boot-time git HEAD SHA to the current SHA on
disk, and if they differed it would reply with
Gateway code was updated in the background --
restarting this gateway so your next message runs
on the new code. Please retry in a moment.
and then kick off a graceful restart. This is unwanted behaviour:
users who run a long-lived gateway and do their own ad-hoc git
operations on the checkout end up with their chat interrupted and
the current message dropped every time HEAD moves, with no way to
opt out.
If an operator really needs the old protection against stale
sys.modules after "hermes update", the SIGKILL-survivor sweep in
hermes update (hermes_cli/main.py, also tagged #17648) already
handles the supervisor-respawn case on its own.
Removed:
gateway/run.py:
- _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS
- _read_git_head_sha(), _compute_repo_mtime() module helpers
- class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha /
_stale_code_restart_triggered defaults
- __init__ boot-snapshot block (_boot_*, _cached_current_sha*,
_repo_root_for_staleness, _stale_code_notified)
- _current_git_sha_cached(), _detect_stale_code(),
_trigger_stale_code_restart() methods
- stale-code check + user-facing restart notice at the top of
_handle_message()
tests/gateway/test_stale_code_self_check.py (deleted, 412 lines)
No new logic added. Zero remaining references to any removed
symbol. Gateway test suite passes the same 4589 tests it passed
before; the 3 pre-existing unrelated failures (discord free-channel,
feishu bot admission, teams typing) are unchanged by this commit.
* feat(i18n): add display.language for static message translation (zh/ja/de/es)
Adds a thin-slice i18n layer covering the highest-impact static user-facing
messages: the CLI dangerous-command approval prompt and a handful of gateway
slash-command replies (restart-drain, goal cleared, approval expired, config
read/save errors).
Out of scope (stays English): agent responses, log lines, tool outputs,
slash-command descriptions, error tracebacks.
Infrastructure:
- agent/i18n.py: catalog loader, t() helper, language resolution
(HERMES_LANGUAGE env var > display.language config > en)
- locales/{en,zh,ja,de,es}.yaml: ~19 translated strings per language
- display.language in DEFAULT_CONFIG (hermes_cli/config.py)
Tests:
- tests/agent/test_i18n.py: 21 tests covering catalog parity, placeholder
parity across locales, fallback behavior, env-var override, alias
normalization, missing-key graceful degradation.
Docs:
- website/docs/user-guide/configuration.md: display.language entry plus a
short section explaining scope so users don't expect agent responses to
translate via this knob.
Sends a lightweight list_tools() probe every 3 minutes during idle
periods to prevent TCP connections from going stale behind LB / NAT
idle timeouts (commonly 300-600s). When the keepalive fails, the
reconnect event fires so the transport rebuilds the session cleanly.
Salvages the keepalive portion of @vominh1919's PR #17016. The
circuit-breaker half-open recovery from the same PR was independently
landed on main via #benbarclay's commit 8cc3cebca ("fix(mcp): add
half-open state to circuit breaker", Apr 21); only the keepalive is
salvaged here.
Fixes#17003.
Previously, pinning a skill blocked every skill_manage write action
(edit, patch, delete, write_file, remove_file). The 'hard fence'
design conflated two concerns:
1. Pin as deletion protection — don't let the curator archive
or the agent delete a stable skill.
2. Pin as content freeze — don't let the agent rewrite it mid-conversation.
In practice (1) is what users pin for: they want a skill to survive
curator passes. (2) created friction — agents finding a new pitfall
in a pinned skill had to ask the user to unpin, then the agent
patches, then the user re-pins. The dance discouraged skill
maintenance and pinned skills went stale.
This narrows the _pinned_guard to skill_manage(action='delete') only.
Patches, edits, and supporting-file writes go through on pinned
skills so the agent can keep improving them. The curator's own
pinned-skip behavior (agent/curator.py:271 for auto-archive,
line 349 for the LLM review prompt) is unchanged — curator still
never touches pinned skills.
Changes:
- tools/skill_manager_tool.py: remove _pinned_guard calls from
_edit_skill, _patch_skill, _write_file, _remove_file; keep on
_delete_skill. Updated _pinned_guard docstring and error message.
- tools/skill_manager_tool.py: updated skill_manage model-facing tool
description to reflect the new semantic.
- website/docs/user-guide/features/curator.md: updated pinning
section.
- tests/tools/test_skill_manager_tool.py: flipped refuses-pinned
tests for edit/patch/write_file/remove_file into allowed-when-pinned;
kept test_delete_refuses_pinned (strengthened assertion to check the
'cannot be deleted' wording).
Closes#18354
Closes the gap where write_file skipped the post-edit syntax check that
patch already ran, so silent file corruption (bad quote escaping,
truncated writes, etc.) would persist on disk until a later read.
## Changes
tools/file_operations.py:
- Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC).
Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers.
Zero subprocess overhead; preferred over shell linters when both apply.
- _check_lint() now accepts optional content and routes to in-process
linter first. Shell linter (py_compile, node --check, tsc, go vet,
rustfmt) remains the fallback for languages without an in-process
equivalent.
- New _check_lint_delta() implements the post-first/pre-lazy pattern
borrowed from Cline and OpenCode: lint post-write state first; only
if errors are found AND pre-content was captured does it lint the
pre-state and diff. If the pre-existing file had the SAME errors the
edit didn't introduce anything new, so the file is reported as 'still
broken, pre-existing' with success=False but a message explaining the
errors were pre-existing. If the edit introduced genuinely new errors,
those are surfaced and pre-existing ones are filtered out.
- WriteResult gains a lint field.
- write_file() captures pre-content for in-process-lintable extensions
and calls _check_lint_delta after a successful write.
- patch_replace() switches from _check_lint to _check_lint_delta,
reusing the pre-edit content it already has in scope.
tools/file_tools.py:
- Update write_file schema description to mention the post-write lint.
tests/tools/test_file_operations_edge_cases.py:
- Update existing brace-path tests to use .js (shell linter) now that
.py is in-process.
- Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML
in-process linters.
- Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy
short-circuit, new-file path, and the single-error-parser caveat.
## Performance
In-process linters are microseconds per call (ast.parse, json.loads).
The hot path (clean write) runs exactly one lint — matches main's cost
for patch. Pre-state capture is skipped when the file has no applicable
linter. Measured 4.89ms/write average over 100 .py writes including lint.
## Inspiration
- Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write
diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts).
- OpenCode's WriteTool — runs lsp.diagnostics() after write and appends
errors to tool output (packages/opencode/src/tool/write.ts).
- Claude Code's DiagnosticTrackingService — captures baseline via
beforeFileEdited() and returns new-diagnostics-only from
getNewDiagnostics() (src/services/diagnosticTracking.ts).
## Validation
- tests/tools/test_file_operations.py + test_file_operations_edge_cases.py
+ test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py
+ test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py:
228 passed locally.
- Live E2E reproduction of the tips.py corruption incident: broken
content written; lint field surfaces 'SyntaxError: invalid syntax.
Perhaps you forgot a comma? (line 6, column 5)' — the exact error
that would have self-corrected the bug on the next turn.
MCP servers commonly emit JSON Schema `pattern` (e.g. `\\d{4}-\\d{2}-\\d{2}`
for date-time params) and `format` keywords. llama.cpp's
`json-schema-to-grammar` converter rejects regex escape classes
(\\d/\\w/\\s) and most format values, returning HTTP 400
"parse: error parsing grammar: unknown escape at \\d" — the whole request
fails.
Cloud providers (OpenAI, Anthropic, OpenRouter, Gemini) accept these
keywords fine and use them as prompting hints. Stripping unconditionally
loses useful hints for every cloud user to fix a llama.cpp-only bug.
Approach: classify the llama.cpp grammar-parse 400 in the error
classifier, and on match do a one-shot in-place strip of pattern/format
from `self.tools`, then retry. Follows the existing
`thinking_signature` recovery pattern. Cloud users hit zero overhead;
llama.cpp users pay one failed request per session.
Changes
- agent/error_classifier.py: new `FailoverReason.llama_cpp_grammar_pattern`
+ narrow HTTP-400 branch matching "error parsing grammar",
"json-schema-to-grammar", or "unable to generate parser ... template".
- tools/schema_sanitizer.py: new `strip_pattern_and_format()` helper —
reactive, walks schema nodes, skips property names (search_files.pattern
survives). Returns strip count for logging.
- run_agent.py: new one-shot recovery block in the retry loop. Strips,
logs, continues. Falls through to normal retry if nothing to strip.
- tests: 4 classifier tests (3 variants + 1 non-400 negative), 7 strip
tests including the property-name preservation and idempotency checks.
Co-authored-by: Chris Danis <cdanis@gmail.com>
Follow-up to #19928 which fixed the foreground path in _run_bash.
The background process spawn in process_registry.py had the same
vulnerability: Popen(cwd=session.cwd) and PtyProcess.spawn(cwd=...)
would raise FileNotFoundError if the directory was deleted.
Apply _resolve_safe_cwd() at session creation time so both the PTY
and pipe-mode Popen paths receive a validated cwd.
Address Copilot review on PR #17569:
1. _resolve_safe_cwd never tested the filesystem root because the loop
exited when `os.path.dirname(parent) == parent`, which is true once
`parent == '/'`. Restructure so the root is checked before the
self-equal exit. Adds `test_returns_root_when_only_root_exists` —
regression-guarded by reverting the loop and watching it fail.
2. The fake `Popen.stdout` was a `MagicMock`; `BaseEnvironment._wait_for_process`
calls `proc.stdout.fileno()` then `select.select`/`os.read` against it,
which raised `TypeError: fileno() returned a non-integer` (visible as a
thread exception in test output) and could in theory read from an
unrelated real fd. Hand `fake_popen` a real `os.pipe()` with the write
end pre-closed so the drain loop sees EOF immediately. Helper records
each fd so the test cleans up after itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a tool call deletes its own working directory (`cd /tmp/foo &&
rm -rf /tmp/foo`), the next `subprocess.Popen(args, cwd=self.cwd)` raised
`FileNotFoundError: [Errno 2]` before bash even started — every subsequent
terminal/file-tool call hit the same wedge until the gateway restarted.
Fix in `LocalEnvironment._run_bash`: before handing `self.cwd` to Popen,
resolve a safe alternative when the path is gone (walk up to the nearest
existing ancestor, falling back to `tempfile.gettempdir()` only as a last
resort). Log a warning so the recovery is visible — not silent — and
update `self.cwd` so the next call doesn't repeat the message.
Defense in depth in `LocalEnvironment._update_cwd`: only adopt the new
cwd when it still exists as a directory. `pwd -P` from a deleted cwd can
leave a stale value in the marker file; refusing to store a missing path
keeps `self.cwd` valid by construction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MiniMax deprecated the old v1/t2a_v2 endpoint (api.minimax.io) and
moved to v1/text_to_speech (api.minimax.chat). The new API:
- Uses a flat payload: {model, text, voice_id} instead of nested
voice_setting / audio_setting objects
- Returns raw audio bytes (Content-Type: audio/mpeg) instead of
JSON with hex-encoded audio
- Uses model 'speech-01' instead of 'speech-2.8-hd'
- Updated default voice_id to 'female-shaonv' for Chinese TTS
The implementation detects Content-Type to handle both old and new
API responses, maintaining backward compatibility for any users who
manually configured the legacy base_url.
* feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern)
Adds a no_agent=True option to the cronjob system. When enabled, the
scheduler runs the attached script on schedule and delivers its stdout
directly to the job's target — no LLM, no agent loop, no token spend.
This is the classic bash-watchdog pattern (memory alert every 5 min,
disk alert every 15 min, CI ping) reimplemented as a first-class Hermes
primitive instead of a systemd timer + curl + bot token triplet living
outside the system.
## What
hermes cron create "every 5m" \
--no-agent \
--script memory-watchdog.sh \
--deliver telegram \
--name memory-watchdog
Agent tool:
cronjob(action='create',
schedule='every 5m',
script='memory-watchdog.sh',
no_agent=True,
deliver='telegram')
Semantics:
- Script stdout (trimmed) → delivered verbatim as the message
- Empty stdout → silent tick (no delivery; watchdog pattern)
- wakeAgent=false gate → silent tick (same gate LLM jobs use)
- Non-zero exit/timeout → delivered as an error alert
(broken watchdogs shouldn't fail silently)
- No LLM ever invoked; no tokens spent; no provider fallback applied
## Implementation
cron/jobs.py
* create_job gains no_agent: bool = False
* prompt becomes Optional (no_agent jobs don't need one)
* Validation: no_agent=True requires a script at create time
* Field roundtrips via load_jobs / save_jobs / update_job
cron/scheduler.py
* run_job: new short-circuit branch at the top that runs the script,
wraps its output into the (success, doc, final_response, error)
tuple downstream delivery already expects, and returns before any
AIAgent import or construction
* _run_job_script: picks interpreter by extension — .sh/.bash run
under /bin/bash, anything else under sys.executable (Python).
Shell support unlocks the bash-watchdog pattern without wrapping
scripts in Python. Extension is explicit; we deliberately do NOT
trust the file's own shebang. Path-containment guard (scripts dir)
unchanged.
tools/cronjob_tools.py
* Schema: new no_agent boolean property with clear trigger guidance
* cronjob() accepts no_agent and validates mode-specific shape:
- no_agent=True requires script; prompt/skills optional
- no_agent=False keeps the existing 'prompt or skill required' rule
* update path rejects flipping no_agent=True on a job without a script
* _format_job surfaces no_agent in list output
* Handler lambda forwards no_agent from tool args
hermes_cli/main.py, hermes_cli/cron.py
* 'hermes cron create --no-agent' and edit's --no-agent / --agent
pair for toggling at CLI parity with the agent tool
* Existing --script help text updated to describe both modes
* List / create / edit output now shows 'Mode: no-agent (...)' when set
## Tests
tests/cron/test_cron_no_agent.py — 18 tests covering:
* create_job: no_agent shape, validation, field persistence
* update_job: flag roundtrip across reload
* cronjob tool: schema validation, update toggling, mode-specific
requirements, prompt-relaxation rule
* run_job short-circuit:
- success path delivers stdout verbatim
- empty stdout → SILENT_MARKER (no delivery downstream)
- wakeAgent=false gate → silent
- script failure → error alert
- run_job does NOT import AIAgent (verified via mock)
* _run_job_script:
- .sh executes via bash (no shebang required)
- .bash executes via bash
- .py still runs via sys.executable (regression)
- path-traversal still blocked (security regression)
All 18 new tests pass. 341/342 pre-existing cron tests still pass; the
one failure (test_script_empty_output_noted) was already broken on main
and is unrelated to this change.
## Docs
website/docs/guides/cron-script-only.md — new dedicated guide covering
the watchdog pattern, interpreter rules, delivery mapping, worked
examples (memory / disk alerts), and the comparison table vs hermes send,
regular LLM cron jobs, and OS-level cron.
website/docs/user-guide/features/cron.md — new 'No-agent mode' section
in the cron feature reference, cross-linked to the guide.
website/docs/guides/automate-with-cron.md — new tip box pointing users
to no-agent mode when they don't need LLM reasoning.
## Compatibility
- Existing jobs: unchanged. no_agent defaults to False, existing code
paths untouched until the flag is set.
- Schema additive only; older jobs.json without the field load fine
via .get() with False default.
- New CLI flags are opt-in and don't alter existing flag behavior.
* fix(cron): lazy-import AIAgent + SessionDB so no_agent ticks pay zero
The unconditional `from run_agent import AIAgent` + SessionDB() init at
the top of run_job() meant every no_agent tick still paid the full agent
module load cost (~300ms + transitive imports + DB open) even though it
never touched any of that machinery.
Move both to live under the default (LLM) path, after the no_agent
short-circuit has returned. Now a no_agent tick's sys.modules stays
clean — verified end-to-end:
assert 'run_agent' not in sys.modules # before
run_job(no_agent_job)
assert 'run_agent' not in sys.modules # after
The existing mock-based unit test (test_run_job_no_agent_never_invokes_aiagent)
kept passing because patch() replaces the class AFTER import; the leak
was only visible via real subprocess-style verification. End-to-end
demo confirmed: agent calls cronjob(no_agent=True) → script runs →
stdout delivered → no LLM machinery loaded.
* docs(cron): tighten no_agent tool schema — defaults, silent semantics, pick rule
Previous description buried the important bits in one long sentence.
Agents could plausibly miss three things an LLM-facing schema should
make unmissable:
1. What the default is — now first sentence + JSON Schema `default: false`
2. What 'silent run' actually means for the user — now spelled out:
'nothing is sent to the user and they won't see anything happened'
3. When to pick True vs False — now a concrete decision rule with
examples on both sides (watchdogs/metrics/pollers → True;
summarize/draft/pick/rephrase → False)
Also adds explicit 'prompt and skills are ignored when True' since the
agent could otherwise still pass them out of habit.
No behavior change — schema text only.
On VPS/Docker and some Ubuntu 23.10+ hosts, Chromium refuses to start
without --no-sandbox:
- uid=0 (root): hard requirement (VPS/Docker deployments)
- AppArmor apparmor_restrict_unprivileged_userns=1 (Ubuntu 23.10+):
non-root too, under systemd or unprivileged containers
Detect both conditions and inject AGENT_BROWSER_CHROME_FLAGS with
--no-sandbox --disable-dev-shm-usage when the user hasn't already
set the flags themselves.
Salvage of #15771 — only the browser_tool.py fix is cherry-picked.
The PR's accompanying MCP preset addition (new feature surface)
was dropped so the bug fix can land independently.
Co-authored-by: ygd58 <buraysandro9@gmail.com>