hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
Teknium	c8e3e39185	fix(mcp): surface image tool results as MEDIA tags instead of dropping them (#21328 ) MCP tool results can include ImageContent blocks (screenshots from Playwright/Blockbench/Puppeteer etc). The tool result handler only extracted block.text, so image blocks were silently dropped and the agent saw an empty or text-only response — losing the actual payload. Add _cache_mcp_image_block() that base64-decodes the block, validates the bytes via gateway.platforms.base.cache_image_from_bytes (which sniffs for PNG/JPEG/WebP signatures and rejects non-images), writes to the shared `~/.hermes/cache/images/` dir, and returns a MEDIA:<path> tag. The handler appends that tag to the result parts so downstream gateway adapters render the image inline. Logs and drops on malformed base64 / non-image payload rather than raising — a single bad block shouldn't kill the tool call. Distilled from #17915 (c3115644151) and #10848 (gnanirahulnutakki), both too stale to cherry-pick (branches diverged enough to revert dozens of unrelated fixes). Went with #10848's approach of plumbing through Hermes' existing MEDIA tag / cache_image_from_bytes infrastructure rather than #17915's raw tempfile path, because it integrates with the remote-backend mount system and messaging adapters that already handle MEDIA tags natively. Co-authored-by: c3115644151 <c3115644151@users.noreply.github.com> Co-authored-by: gnanirahulnutakki <gnanirahulnutakki@users.noreply.github.com>	2026-05-07 07:14:16 -07:00
Teknium	dd2dc2bddf	fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport (#21323 ) * fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException` (not `Exception`), so the broad `except Exception as exc:` in `MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation from gateway restart / explicit `task.cancel()` silently escaped past the reconnect logic — the MCP server task died without going through the shutdown/reconnect code paths that check `_shutdown_event`. Add an explicit `except asyncio.CancelledError: raise` before the broad catch so cancellation propagation is self-documenting rather than an accident of exception hierarchy, and future sibling-site work (e.g. distinguishing shutdown-cancel from transport-cancel) has an obvious hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception subclass is also corrected: the old path would have caught it and treated it as a connection failure worth retrying. Closes #9930. * fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport Two surgical correctness bugs in the SSE branch of MCPServerTask._run_http, distilled from @amiller's PR #5981 that couldn't be cherry-picked wholesale (branch too stale). 1. sse_read_timeout was set to the tool timeout (default 60s). That's the wrong dimension — it governs how long sse_client will wait between events on the SSE stream, not per-call latency. SSE servers routinely hold the stream idle for minutes between events; a 60s read timeout drops the connection after the first slow stretch (Router Teamwork, Supermemory on Cloudflare Workers idle-disconnect at ~60s). Bump to 300s to match the Streamable HTTP path's httpx read timeout. 2. OAuth auth was built via get_manager().get_or_build_provider() but never forwarded to sse_client. SSE MCP servers behind OAuth 2.1 PKCE would silently fail with 401s on every request. Keepalive (the other half of #5981) intentionally left for a follow-up — it's a real improvement but a bigger change, and these two are obvious corrections to ship now. Credits to @amiller. Co-authored-by: Andrew Miller <socrates1024@gmail.com> --------- Co-authored-by: Andrew Miller <socrates1024@gmail.com>	2026-05-07 07:08:04 -07:00
Teknium	e0a2b08768	fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run (#21318 ) On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException` (not `Exception`), so the broad `except Exception as exc:` in `MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation from gateway restart / explicit `task.cancel()` silently escaped past the reconnect logic — the MCP server task died without going through the shutdown/reconnect code paths that check `_shutdown_event`. Add an explicit `except asyncio.CancelledError: raise` before the broad catch so cancellation propagation is self-documenting rather than an accident of exception hierarchy, and future sibling-site work (e.g. distinguishing shutdown-cancel from transport-cancel) has an obvious hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception subclass is also corrected: the old path would have caught it and treated it as a connection failure worth retrying. Closes #9930.	2026-05-07 07:04:38 -07:00
Teknium	5a3e5b23d2	fix(memory): remove dead allOf schema block at the source PR #21238 introduced top-level `allOf: [{if/then/required}]` blocks in the built-in memory tool's parameters schema as conditional-required hints. Two problems: 1. OpenAI's Codex backend (chatgpt.com/backend-api/codex, gpt-5.x) rejects top-level `allOf`/`anyOf`/`oneOf`/`enum`/`not` outright with a non-retryable 400 — affected every user on openai-codex/gpt-5.x. 2. The `if/then` hints were silently ignored by every other provider (Chat Completions doesn't honour them on function schemas), so they never actually enforced anything anywhere. The runtime handler in `memory_tool()` already validates the per-action required fields and returns actionable error messages, so removing the block changes nothing behaviourally. Paired with the defense-in-depth sanitizer in the previous commit, this closes the bug both at the source (schema no longer emits the forbidden form) and at the wire boundary (sanitizer strips it if anything else re-introduces it). - Rewrites `tests/tools/test_memory_tool_schema.py` to guard against regressing the forbidden-combinator shape instead of asserting it. - Adds AUTHOR_MAP entry for @hrkzogw (author of the sanitizer fix).	2026-05-07 07:03:21 -07:00
Hirokazu Ogawa	3924cb408b	fix: strip Codex-hostile top-level schema combinators	2026-05-07 07:03:21 -07:00
luyao618	e795b7e3ab	fix(delegate): expand composite toolsets before intersection in delegate_task When the parent agent uses a composite toolset like hermes-cli, calling delegate_task with individual toolsets (e.g. web, terminal) resulted in zero tools because the name-based intersection failed: 'web' != 'hermes-cli'. Add _expand_parent_toolsets() which collects all tool names from parent toolsets, then recognises any individual toolset whose tools are a subset of the parent's available tools. This allows delegate_task(toolsets=['web']) to work correctly when the parent has hermes-cli enabled. Fixes #19447	2026-05-07 06:41:42 -07:00
liuhao1024	f9b4b8af34	fix(mcp): include exception type in error messages when str(exc) is empty Some exception classes (e.g. anyio.ClosedResourceError) are raised without a message argument, so str(exc) returns an empty string. The existing error format f'{type(exc).__name__}: {exc}' would produce messages like 'MCP call failed: ClosedResourceError: ' with nothing after the colon. Add _exc_str() helper that falls back to repr(exc) when str(exc) is empty, and apply it to all 6 MCP error formatting sites (5 tool/prompt/resource handlers + 1 sampling handler). Fixes #19417	2026-05-07 06:33:57 -07:00
Alexander Monas	a1f85ef2b9	fix(mcp): retry stale pipe transport failures Treat closed-resource, closed-transport, broken-pipe, and EOF MCP failures as stale session equivalents so the existing reconnect/retry-once path can recover. Add regression coverage for the stale-pipe marker variants.\n\nChecks:\n- python -m py_compile tools/mcp_tool.py tests/tools/test_mcp_tool_session_expired.py\n- python -m pytest tests/tools/test_mcp_tool_session_expired.py -q -o addopts=\n- selected secret scan over touched files	2026-05-07 06:32:45 -07:00
Mason James	80548f9a4f	fix(mcp): report configured timeout in MCP call errors Track elapsed wall time in _run_on_mcp_loop, cancel the in-flight future when a timeout expires, and raise a descriptive TimeoutError that includes the elapsed and configured timeout. Add regression coverage for the new timeout diagnostics.	2026-05-07 06:28:11 -07:00
nudiltoys-cmyk	498c01406f	fix(docker): chown runtime node_modules trees to hermes user (#18800 )	2026-05-07 06:17:49 -07:00
LeonSGP43	d12be46df8	fix(skills): lock usage telemetry updates	2026-05-07 06:13:37 -07:00
altmazza0-star	5b24c0fa85	fix: require memory schema fields by action	2026-05-07 05:48:17 -07:00
Teknium	0214858ef5	fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234 ) (#21228 ) Cloud metadata endpoints (169.254.169.254 etc.) are now always blocked by browser_navigate regardless of hybrid routing, allow_private_urls, or backend. Bug: commit `42c076d3` (#16136) added hybrid routing that flips auto_local_this_nav=True for private URLs and short-circuits _is_safe_url(). IMDS endpoints are technically private (169.254/16 link-local), so the sidecar happily routed them to a local Chromium, and the agent could read IAM credentials via browser_snapshot. On EC2/GCP/Azure this is a full SSRF-to-credential-theft. Fix: new is_always_blocked_url() in url_safety.py — a narrow floor that checks _BLOCKED_HOSTNAMES, _ALWAYS_BLOCKED_IPS, _ALWAYS_BLOCKED_NETWORKS only. Applied as an independent gate in browser_navigate's pre-nav and post-redirect checks, BEFORE auto_local_this_nav gets a chance to short-circuit. Ordinary private URLs (localhost, 192.168.x, 10.x, .local, CGNAT) still route to the local sidecar as the #16136 feature intends. Secondary fix (reporter's finding): _url_is_private() now explicitly checks 172.16.0.0/12. ipaddress.is_private only covers that range on Python ≥3.11 (bpo-40791), so on 3.10 runtimes those URLs were routed to cloud instead of the local sidecar. No security impact — just a correctness fix for the hybrid-routing feature. Closes #16234.	2026-05-07 05:38:05 -07:00
Teknium	c4a7992317	fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226 ) The MCP SDK discovers OAuth server metadata (token_endpoint, etc.) on demand and keeps it in memory only. Without disk persistence, a restart with valid cached refresh tokens forces the SDK to fall back to the guessed '{server_url}/token' path — which returns 404 on most real providers (Notion, Atlassian, GitHub remote MCP, etc.) and triggers a full browser re-authorization even though the refresh token is fine. Add a .meta.json file next to the existing tokens/client_info files: HERMES_HOME/mcp-tokens/<server>.json -- tokens (existing) HERMES_HOME/mcp-tokens/<server>.client.json -- client info (existing) HERMES_HOME/mcp-tokens/<server>.meta.json -- oauth metadata (new) Changes: - HermesTokenStorage.save_oauth_metadata / load_oauth_metadata / _meta_path — disk layer for the discovered OAuthMetadata. - HermesTokenStorage.remove() now also clears .meta.json so 'hermes mcp remove <name>' and the manager's remove() path clean up fully. - HermesMCPOAuthProvider._initialize cold-restores from disk before the existing pre-flight discovery runs. If disk has metadata we skip the discovery HTTP round-trips entirely. - HermesMCPOAuthProvider._prefetch_oauth_metadata now persists ASM as soon as it's discovered, so even the first pre-flight run seeds disk. - HermesMCPOAuthProvider._persist_oauth_metadata_if_changed() is called at the end of async_auth_flow so metadata discovered via the SDK's lazy 401-branch (not pre-flight) is also saved for next time. Tests cover the storage roundtrip (save/load/missing/corrupt/remove) and the manager provider path (cold-load restore, skip-when-in-memory, persist-on-discover, noop-when-unchanged, end-to-end async_auth_flow). Co-authored-by: nocturnum91 <50326054+nocturnum91@users.noreply.github.com>	2026-05-07 05:35:33 -07:00
Teknium	e82f3b0c41	test: update send_message_tool mocks for force_document kwarg	2026-05-07 05:20:10 -07:00
Brian Su	8b32a9d0f1	feat: add Discord message deletion action	2026-05-07 05:11:09 -07:00
stephen0110	40b51c93a2	fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at The kanban_heartbeat tool called heartbeat_worker but never heartbeat_claim, so a worker that loops the tool while a single tool call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got reclaimed by release_stale_claims. The function name and heartbeat_claim's own docstring imply otherwise: "Workers that know they'll exceed 15 minutes should call this every few minutes to keep ownership." But there was no caller in the worker tool path. Workers couldn't invoke heartbeat_claim themselves either — it isn't exposed as a tool. Fix: _handle_heartbeat now calls heartbeat_claim first, reading HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins this in _default_spawn). Falls back to _claimer_id() for locally- driven workers that didn't go through dispatcher spawn. Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires rewinds claim_expires into the past, calls the tool, and asserts the new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to fail against the unfixed code (claim_expires stays at the rewound value). Closes the root cause underlying the symptom in #21141 (15-min respawns of long-running workers). #21141 separately addresses post-reclaim cleanup; this fixes the upstream "shouldn't have been reclaimed in the first place" half.	2026-05-07 05:05:20 -07:00
Gutslabs	7d36e8346b	fix(security): close TOCTOU window when saving MCP OAuth credentials _write_json (the persistence helper used by HermesTokenStorage for both tokens and client_info) created the temp file via Path.write_text and only chmod'd it to 0o600 afterward. Between create and chmod the file existed on disk at the process umask (commonly 0o644 = world-readable), briefly exposing MCP OAuth access/refresh tokens to other local users. Use os.open with O_WRONLY\|O_CREAT\|O_EXCL and an explicit S_IRUSR\|S_IWUSR mode so the file is created atomically at 0o600, plus tighten the parent dir to 0o700 so siblings can't traverse to the creds file. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Mirrors the fix shipped for agent/google_oauth.py in #19673. Adds a regression test asserting the resulting file mode is 0o600 and the parent directory is 0o700 (skipped on Windows where POSIX mode bits aren't enforced).	2026-05-07 04:56:13 -07:00
Sanjay Santhanam	033e533d05	test(docker): align Dockerfile contract tests with simplified TUI flow The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics in favour of letting npm workspaces resolve the bundled package naturally. Two contract tests still asserted the older flow: `test_dockerfile_installs_tui_dependencies` required: 'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text …but the lockfile is no longer COPIED individually \u2014 the entire `ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace reference from `ui-tui/package.json` is `file:` so npm needs the real source, not just a manifest stub). `test_dockerfile_materializes_local_tui_ink_package` required a 7-clause conjunction matching specific `rm -rf` / `npm install --omit=dev` `--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations that were stripped out when the workspace resolution was simplified. Update the assertions to pin the contract the image actually has to carry rather than the exact shell incantations the old flow used: * TUI deps install: ui-tui/package.json + ui-tui/package-lock.json + ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm install/ci step runs in ui-tui. * Bundled hermes-ink: the workspace package source is COPIED (so `await import('@hermes/ink')` resolves at runtime). This keeps the spirit of #15012 / #16690 (zombie reaping + bundled workspace materialisation must continue to work) without locking the Dockerfile into one specific implementation flavour. Validation: $ pytest tests/tools/test_dockerfile_pid1_reaping.py -q 6 passed in 1.43s No production code change. Fixes the two failures observed on `main` (run 25250051126): `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies` `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`	2026-05-07 04:53:10 -07:00
kshitij	5c906d7026	feat(web): add SearXNG as a native search-only backend Adds SearXNG as a free, self-hosted web search provider. SearXNG is a privacy-respecting metasearch engine that requires no API key — just a running instance and SEARXNG_URL pointing at it. ## What this adds - `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing `WebSearchProvider` (search only; no extract capability) - `_is_backend_available("searxng")` — gates on SEARXNG_URL - `_get_backend()` — accepts "searxng" as a configured value; adds it to auto-detect candidates (lower priority than paid services) - `web_search_tool` — dispatches to SearXNG when it is the active backend - `check_web_api_key()` — includes SearXNG in availability check - `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"] - `tools_config.py` — SearXNG appears in the `hermes tools` provider picker - `nous_subscription.py` — `direct_searxng` detection, web_active / web_available - `setup.py` — SEARXNG_URL listed in the missing-credential hint - 23 tests covering: is_configured, happy-path search, score sorting, limit, HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key ## Config ```yaml # Use SearXNG for search, any paid provider for extract web: search_backend: "searxng" extract_backend: "firecrawl" # Or: SearXNG as the sole backend (web_extract will use the next available) web: backend: "searxng" ``` SearXNG is search-only — it does not implement WebExtractProvider. Users who only configure SEARXNG_URL get web_search available; web_extract falls back to the next available extract provider (or is unavailable if none). Closes #19198 (Phase 2 Task 4 — SearXNG provider) Ref: #11562 (original SearXNG PR)	2026-05-06 10:05:29 -07:00
kshitij	cd2cbc73b7	refactor(web): per-capability backend selection for search/extract split Introduce the foundation for independently selecting web search and extract backends — enabling future combinations like SearXNG for search + Firecrawl for extract. Architecture: - tools/web_providers/base.py: WebSearchProvider and WebExtractProvider ABCs with normalized result contracts (mirrors CloudBrowserProvider) - tools/web_tools.py: _get_search_backend() and _get_extract_backend() read per-capability config keys, fall through to shared web.backend - hermes_cli/config.py: web.search_backend and web.extract_backend in DEFAULT_CONFIG (empty = inherit from web.backend) Behavioral change: - web_search_tool() now dispatches via _get_search_backend() - web_extract_tool() now dispatches via _get_extract_backend() - When per-capability keys are empty (default), behavior is identical to before — _get_search_backend() falls through to _get_backend() This is purely structural — no new backends are added. SearXNG and other search-only/extract-only providers can now be added as simple drop-in modules in follow-up PRs. 12 new tests, 49 existing tests pass with zero regressions. Ref: #19198	2026-05-06 09:16:25 -07:00
Teknium	a0fedfbb1b	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 ) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, .so/.dylib/.dll, .mp4/.mov, .zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.	2026-05-06 05:44:35 -07:00
Kshitij Kapoor	629d8b843d	fix(browser): tighten Lightpanda fallback edge cases	2026-05-06 03:41:21 -07:00
Kshitij Kapoor	3ebdd26449	fix(browser): surface Lightpanda Chrome fallback warnings	2026-05-06 03:23:19 -07:00
kshitijk4poor	395dbcc873	feat(browser): add Lightpanda engine support with automatic Chrome fallback Add Lightpanda as an optional browser engine for local mode. Lightpanda is a headless browser built from scratch in Zig -- faster navigation than Chrome with significantly less memory. One config line to enable: browser: engine: lightpanda New functions in browser_tool.py: - _get_browser_engine() -- config/env reader with validation + caching - _should_inject_engine() -- only inject in local non-cloud mode - _needs_lightpanda_fallback() -- detect empty/failed LP results - _chrome_fallback_screenshot() -- temporary Chrome session for screenshots - Engine injection in _run_browser_command (--engine flag) - browser_vision pre-routes screenshots to Chrome when engine=lightpanda Config: - browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome) - AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS - /browser status shows engine info in local mode Rebased from PR #7144 onto current main. All existing code preserved -- pure additions only (+520/-2). 25 new tests + 81 total browser tests pass (0 failures).	2026-05-06 03:23:19 -07:00
misery-hl	56b4795115	guard kanban worker lifecycle by run id	2026-05-05 15:09:28 -07:00
0xVox	0b9cbc8b23	test(kanban): cover metadata handoff round-trip	2026-05-05 15:09:28 -07:00
Teknium	b10e38e392	fix(skills): pin protects against deletion only, not edits (#20220 ) Previously, pinning a skill blocked every skill_manage write action (edit, patch, delete, write_file, remove_file). The 'hard fence' design conflated two concerns: 1. Pin as deletion protection — don't let the curator archive or the agent delete a stable skill. 2. Pin as content freeze — don't let the agent rewrite it mid-conversation. In practice (1) is what users pin for: they want a skill to survive curator passes. (2) created friction — agents finding a new pitfall in a pinned skill had to ask the user to unpin, then the agent patches, then the user re-pins. The dance discouraged skill maintenance and pinned skills went stale. This narrows the _pinned_guard to skill_manage(action='delete') only. Patches, edits, and supporting-file writes go through on pinned skills so the agent can keep improving them. The curator's own pinned-skip behavior (agent/curator.py:271 for auto-archive, line 349 for the LLM review prompt) is unchanged — curator still never touches pinned skills. Changes: - tools/skill_manager_tool.py: remove _pinned_guard calls from _edit_skill, _patch_skill, _write_file, _remove_file; keep on _delete_skill. Updated _pinned_guard docstring and error message. - tools/skill_manager_tool.py: updated skill_manage model-facing tool description to reflect the new semantic. - website/docs/user-guide/features/curator.md: updated pinning section. - tests/tools/test_skill_manager_tool.py: flipped refuses-pinned tests for edit/patch/write_file/remove_file into allowed-when-pinned; kept test_delete_refuses_pinned (strengthened assertion to check the 'cannot be deleted' wording). Closes #18354	2026-05-05 05:43:10 -07:00
Teknium	4d0f59fa5a	test(skill_usage): add mark_agent_created to regression test The cherry-picked test predates #19618/#19621 which rewrote list_agent_created_skill_names() to require an explicit created_by: 'agent' provenance marker. Without mark_agent_created(), my-skill is excluded from the list and the positive assertion fails.	2026-05-05 04:55:22 -07:00
LeonSGP43	68c1a08ad1	fix(curator): protect hub skills by frontmatter name	2026-05-05 04:55:22 -07:00
Teknium	5168226d60	feat(file_tools): post-write delta lint on write_file + patch, add JSON/YAML/TOML/Python in-process linters (#20191 ) Closes the gap where write_file skipped the post-edit syntax check that patch already ran, so silent file corruption (bad quote escaping, truncated writes, etc.) would persist on disk until a later read. ## Changes tools/file_operations.py: - Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC). Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers. Zero subprocess overhead; preferred over shell linters when both apply. - _check_lint() now accepts optional content and routes to in-process linter first. Shell linter (py_compile, node --check, tsc, go vet, rustfmt) remains the fallback for languages without an in-process equivalent. - New _check_lint_delta() implements the post-first/pre-lazy pattern borrowed from Cline and OpenCode: lint post-write state first; only if errors are found AND pre-content was captured does it lint the pre-state and diff. If the pre-existing file had the SAME errors the edit didn't introduce anything new, so the file is reported as 'still broken, pre-existing' with success=False but a message explaining the errors were pre-existing. If the edit introduced genuinely new errors, those are surfaced and pre-existing ones are filtered out. - WriteResult gains a lint field. - write_file() captures pre-content for in-process-lintable extensions and calls _check_lint_delta after a successful write. - patch_replace() switches from _check_lint to _check_lint_delta, reusing the pre-edit content it already has in scope. tools/file_tools.py: - Update write_file schema description to mention the post-write lint. tests/tools/test_file_operations_edge_cases.py: - Update existing brace-path tests to use .js (shell linter) now that .py is in-process. - Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML in-process linters. - Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy short-circuit, new-file path, and the single-error-parser caveat. ## Performance In-process linters are microseconds per call (ast.parse, json.loads). The hot path (clean write) runs exactly one lint — matches main's cost for patch. Pre-state capture is skipped when the file has no applicable linter. Measured 4.89ms/write average over 100 .py writes including lint. ## Inspiration - Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts). - OpenCode's WriteTool — runs lsp.diagnostics() after write and appends errors to tool output (packages/opencode/src/tool/write.ts). - Claude Code's DiagnosticTrackingService — captures baseline via beforeFileEdited() and returns new-diagnostics-only from getNewDiagnostics() (src/services/diagnosticTracking.ts). ## Validation - tests/tools/test_file_operations.py + test_file_operations_edge_cases.py + test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py + test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py: 228 passed locally. - Live E2E reproduction of the tips.py corruption incident: broken content written; lint field surfaces 'SyntaxError: invalid syntax. Perhaps you forgot a comma? (line 6, column 5)' — the exact error that would have self-corrected the bug on the next turn.	2026-05-05 04:54:17 -07:00
Chris Danis	28f4d6db63	fix(tool-schemas): reactive strip of pattern/format on llama.cpp grammar 400s MCP servers commonly emit JSON Schema `pattern` (e.g. `\\d{4}-\\d{2}-\\d{2}` for date-time params) and `format` keywords. llama.cpp's `json-schema-to-grammar` converter rejects regex escape classes (\\d/\\w/\\s) and most format values, returning HTTP 400 "parse: error parsing grammar: unknown escape at \\d" — the whole request fails. Cloud providers (OpenAI, Anthropic, OpenRouter, Gemini) accept these keywords fine and use them as prompting hints. Stripping unconditionally loses useful hints for every cloud user to fix a llama.cpp-only bug. Approach: classify the llama.cpp grammar-parse 400 in the error classifier, and on match do a one-shot in-place strip of pattern/format from `self.tools`, then retry. Follows the existing `thinking_signature` recovery pattern. Cloud users hit zero overhead; llama.cpp users pay one failed request per session. Changes - agent/error_classifier.py: new `FailoverReason.llama_cpp_grammar_pattern` + narrow HTTP-400 branch matching "error parsing grammar", "json-schema-to-grammar", or "unable to generate parser ... template". - tools/schema_sanitizer.py: new `strip_pattern_and_format()` helper — reactive, walks schema nodes, skips property names (search_files.pattern survives). Returns strip count for logging. - run_agent.py: new one-shot recovery block in the retry loop. Strips, logs, continues. Falls through to normal retry if nothing to strip. - tests: 4 classifier tests (3 variants + 1 non-400 negative), 7 strip tests including the property-name preservation and idempotency checks. Co-authored-by: Chris Danis <cdanis@gmail.com>	2026-05-05 04:25:18 -07:00
briandevans	9fa3a093f2	fix(local): test root as ancestor candidate; use real pipe for fake stdout Address Copilot review on PR #17569: 1. _resolve_safe_cwd never tested the filesystem root because the loop exited when `os.path.dirname(parent) == parent`, which is true once `parent == '/'`. Restructure so the root is checked before the self-equal exit. Adds `test_returns_root_when_only_root_exists` — regression-guarded by reverting the loop and watching it fail. 2. The fake `Popen.stdout` was a `MagicMock`; `BaseEnvironment._wait_for_process` calls `proc.stdout.fileno()` then `select.select`/`os.read` against it, which raised `TypeError: fileno() returned a non-integer` (visible as a thread exception in test output) and could in theory read from an unrelated real fd. Hand `fake_popen` a real `os.pipe()` with the write end pre-closed so the drain loop sees EOF immediately. Helper records each fd so the test cleans up after itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:31:47 -07:00
briandevans	9644b8ae67	fix(local): recover when persistent_shell cwd is deleted (#17558 ) When a tool call deletes its own working directory (`cd /tmp/foo && rm -rf /tmp/foo`), the next `subprocess.Popen(args, cwd=self.cwd)` raised `FileNotFoundError: [Errno 2]` before bash even started — every subsequent terminal/file-tool call hit the same wedge until the gateway restarted. Fix in `LocalEnvironment._run_bash`: before handing `self.cwd` to Popen, resolve a safe alternative when the path is gone (walk up to the nearest existing ancestor, falling back to `tempfile.gettempdir()` only as a last resort). Log a warning so the recovery is visible — not silent — and update `self.cwd` so the next call doesn't repeat the message. Defense in depth in `LocalEnvironment._update_cwd`: only adopt the new cwd when it still exists as a directory. `pwd -P` from a deleted cwd can leave a stale value in the marker file; refusing to store a missing path keeps `self.cwd` valid by construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:31:47 -07:00
Yoimex	c050ee6573	fix(file_ops): resolve search_files path/line collision for hyphenated numeric filenames	2026-05-04 12:37:47 -07:00
ClawdIA	64ad7dec0d	fix(file-ops): allow file search in hidden roots	2026-05-04 12:37:09 -07:00
lhysdl	6875471916	fix(tts): update MiniMax API endpoint to v1/text_to_speech MiniMax deprecated the old v1/t2a_v2 endpoint (api.minimax.io) and moved to v1/text_to_speech (api.minimax.chat). The new API: - Uses a flat payload: {model, text, voice_id} instead of nested voice_setting / audio_setting objects - Returns raw audio bytes (Content-Type: audio/mpeg) instead of JSON with hex-encoded audio - Uses model 'speech-01' instead of 'speech-2.8-hd' - Updated default voice_id to 'female-shaonv' for Chinese TTS The implementation detects Content-Type to handle both old and new API responses, maintaining backward compatibility for any users who manually configured the legacy base_url.	2026-05-04 12:36:09 -07:00
0668001438	83080772f2	fix(delegation): honor provider override for subagents Clear inherited provider preference filters when delegation.provider is set so delegated children do not route back to the parent provider. Add a regression test for cross-provider delegation with parent OpenRouter filters. Closes #10653	2026-05-04 05:22:35 -07:00
briandevans	0b5fd40a01	fix(delegate): correct _spawn_child → _build_child_agent in comments Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:18:45 -07:00
阿泥豆	0e9416036a	test: add unit tests for heartbeat stale threshold increase	2026-05-04 05:08:51 -07:00
briandevans	6b4ccb9b14	fix(session-search): report source from resolved parent, not FTS5 child session (#15909 ) When a delegation child session (e.g. source='telegram') contains the FTS5 hit but _resolve_to_parent() maps it to a different root session (source='api_server'), the result entry was still reporting the child's source because the loop discarded session_meta as `_` and fell back to match_info.get('source'), which carries the child session's value. Use the resolved parent's session_meta for source, model, and started_at with match_info as a fallback, so the output accurately reflects the session the user actually interacted with. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:07:40 -07:00
Teknium	3fb35520c6	revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718 ) (#19721 ) Reverts `ff3d2773e2`. Teknium reviewed the merged PR and decided this behavior isn't wanted — tool-driven kanban_create should not mirror the slash-command path's auto-subscribe. Orchestrators that want their originating chat notified can call kanban_notify-subscribe explicitly; we're not going to make it implicit.	2026-05-04 05:04:01 -07:00
Teknium	ff3d2773e2	feat(kanban): auto-subscribe gateway chat on tool-driven kanban_create (#19718 ) Closes #19479. When an orchestrator agent calls kanban_create from a gateway session (e.g. a Telegram user delegating to an orchestrator profile), auto- subscribe the originating (platform, chat, thread, user) to the new task's terminal events. Mirrors the behavior of the /kanban create slash command in gateway/run.py so tool-driven creation is at parity with human-driven creation. Without this, a user who interacts with an orchestrator exclusively via the gateway never receives blocked / completed / gave_up notifications for tasks the orchestrator created on their behalf — silently breaking the gateway-first multi-agent flow the reporter describes. Reads the context-local HERMES_SESSION_* vars via get_session_env() (not os.environ — those are contextvars for asyncio concurrency safety). Falls through cleanly in CLI / cron contexts with no session active (subscribed=False in the response). Best-effort: if the gateway module isn't importable (test rigs stubbing gateway.*), the task still creates, we just skip the subscription. Response gains a 'subscribed' bool so the orchestrator knows whether terminal events will land back in the originating chat or whether it needs to poll / unblock manually. Tests: 4 new in tests/tools/test_kanban_tools.py covering CLI/no-subscribe, telegram/gateway-auto-subscribe, discord-DM/no- thread subscribe, and partial-ctx/no-chat_id no-subscribe. 40/40 kanban tool tests pass.	2026-05-04 05:02:23 -07:00
Teknium	d3b22b76d8	fix(kanban): enforce worker task-ownership on destructive tool calls (#19713 ) Closes #19534 (security). A worker spawned by the kanban dispatcher has HERMES_KANBAN_TASK set to its own task id. The destructive tools (kanban_complete, kanban_block, kanban_heartbeat) resolved task_id via _default_task_id() which preferred an explicit arg over the env var, with no ownership check — so a buggy or prompt-injected worker could complete / block / heartbeat any OTHER task (sibling, cross-tenant, anything) by supplying its id. Reporter's repro: worker for t_A passed task_id=t_B to kanban_complete and got {"ok": true}. Fix: add _enforce_worker_task_ownership(tid). If HERMES_KANBAN_TASK is set and tid doesn't match, return a structured tool error with guidance to use kanban_comment (for information handoff across tasks) or kanban_create (for follow-up work). Orchestrator profiles (no env var, but kanban toolset enabled per #18968) are exempt — their job is routing and sometimes includes closing out child tasks. Kept unrestricted (deliberately): - kanban_show — workers legitimately read parent/sibling handoff context - kanban_comment — cross-task comments are the handoff mechanism - kanban_create — orchestrator fan-out, worker follow-up spawning - kanban_link — parent/child linking Tests: 5 new regression tests in tests/tools/test_kanban_tools.py covering the grid (worker-attacks-foreign ×3 tools, worker-own-task preserved, orchestrator-unrestricted). 36/36 pass.	2026-05-04 04:54:02 -07:00
Yoimex	edf9c75621	fix(env): pass -- to cd for hyphen-prefixed workdirs	2026-05-04 04:45:03 -07:00
vominh1919	135b4c8b35	fix(mcp): decouple AnyUrl import from mcp dependency AnyUrl was imported inside the same try block as mcp.client.auth, so when the mcp package was not installed, AnyUrl was undefined and _build_client_metadata raised NameError at runtime. Moved the AnyUrl import to its own try/except block so it's available whenever pydantic is installed (which is a core dependency), regardless of whether the mcp SDK is present. Also added pytest.importorskip('mcp') to the three test_build_client_metadata tests that exercise _build_client_metadata, since that function depends on OAuthClientMetadata from the mcp package.	2026-05-04 04:42:18 -07:00
vominh1919	d1d2d43387	fix(test): add skip marker for transcription tests requiring faster_whisper TestTranscribeLocalExtended patches faster_whisper.WhisperModel, which triggers an ImportError when the faster_whisper package is not installed. Added a pytest.mark.skipif marker using importlib.util.find_spec so these tests are gracefully skipped instead of failing with ModuleNotFoundError.	2026-05-04 04:41:36 -07:00
Ioodu	e50809b771	fix(file-tools): cap read_file result size to prevent context window overflow Set max_result_size_chars=100_000 on the read_file registry entry (was float('inf')), closing the Layer 2 defense-in-depth gap in tool_result_storage.py. The existing Layer 1 guard inside _handle_read_file already returns a JSON error for oversized reads; this aligns the registry cap with every other tool. Update test_read_file_never_persisted → test_read_file_result_size_cap to assert 100_000, and add test_read_file_registry_cap_is_100k as an explicit regression guard against re-introducing float('inf'). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 03:14:59 -07:00
xyiy001	e69d11d30c	fix(browser): allow CDP override to pass requirement checks Treat explicit CDP override mode as a valid browser backend even when agent-browser is absent, and add a regression test to prevent false-negative availability gating.	2026-05-04 03:12:30 -07:00
Teknium	3c070f9f9d	fix(curator): only mark agent-created for background-review sediment (#19621 ) Tighten the provenance semantics added in #19618: skills a user asks a foreground agent to write via skill_manage(create) now stay invisible to the curator. Only skills the background self-improvement review fork sediments through skill_manage get the created_by=agent marker. - tools/skill_provenance.py — new ContextVar module mirroring the _approval_session_key pattern: set_current_write_origin / reset / get / is_background_review. Default origin is 'foreground'; the review fork sets 'background_review'. - run_agent.py — run_conversation() binds the ContextVar from self._memory_write_origin at the top of each call. The review fork runs on its own thread (fresh context), so foreground and review contexts never cross-contaminate. - tools/skill_manager_tool.py — skill_manage(action='create') now only calls mark_agent_created() when is_background_review(). All other cases (foreground create, patch, edit, write_file, delete) continue as before. - tests: test_skill_provenance.py (6 tests covering the ContextVar surface), split test_full_create_via_dispatcher into foreground vs. review-fork variants, curator status tests now mark-first. Why: the agent routinely edits existing user skills on the user's behalf; those writes must never flip provenance. And when a user explicitly asks the foreground agent to create a skill, that skill belongs to the user. The curator should only be cleaning up after its own autonomous sediment from the review nudge loop.	2026-05-04 02:42:16 -07:00

1 2 3 4 5 ...

740 commits