hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-29 01:31:41 +00:00

Author	SHA1	Message	Date
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
ddupont	413ee1a286	feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection Extends the cua-driver computer-use backend to drive backgrounded macOS windows without stealing keyboard or mouse focus from the foreground app. All changes target the cua-driver MCP backend and the shared dispatcher. ## cua_backend.py Window-aware capture: capture() now calls list_windows + get_window_state instead of the removed capture tool. Prefers structuredContent.windows (MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back to regex-parsed text for older builds. Stores the selected (pid, window_id) as sticky context so subsequent action calls do not need a redundant round-trip. Action routing: click/scroll/type_text/key all carry the sticky pid (and window_id for element-indexed clicks). type_text routes through type_text_chars (individual key events) rather than AX attribute write -- WebKit AXTextFields reject attribute writes from backgrounded processes. Key parsing: _parse_key_combo splits cmd+s-style strings into (key, [modifiers]) and routes to hotkey (modifier present) or press_key (bare key) -- cua-driver actual tool names. set_value method: new set_value(value, element) calls the cua-driver set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari, AXPress opens the native macOS popup which closes immediately when the app is non-frontmost; set_value AX-presses the matching child option directly (no menu required, no focus steal). focus_app: reimplemented as a pure window-selector (enumerates list_windows, sets sticky pid/window_id) without ever raising the window or stealing focus. list_apps: fixed tool name from listApps to list_apps; handles plain-text response via regex when structured data is absent. Structured-content extraction: _extract_tool_result now surfaces structuredContent from MCP results, enabling the list_windows window array without text parsing. Helpers: _parse_windows_from_text, _parse_elements_from_tree, _split_tree_text, _parse_key_combo extracted as module-level functions. ## schema.py Added set_value to the action enum with a description explaining when to prefer it over click (select/popup elements, sliders, no focus steal). Added value field for set_value payloads. ## tool.py Routed set_value action through _dispatch to backend.set_value. Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated). Fixed MIME-type detection in _capture_response: cua-driver may return JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png) rather than hardcoding image/png. ## agent/display.py + run_agent.py Guard _detect_tool_failure and result-preview logic against non-string function_result values: multimodal tool results (dicts with _multimodal=True) are not string-sliceable; treat them as successes and fall back to str() for length/preview.	2026-04-28 01:46:36 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
hharry11	de03a332f7	fix(security): isolate interactive sudo password cache per session	2026-04-28 01:34:16 -07:00
ygd58	6b6fc28e85	fix(delegate): clear acp_command when override_provider is set When delegation.provider is configured (e.g. minimax-cn), subagents inherited the parent's acp_command unconditionally. This caused run_agent.py to initialize CopilotACPClient, which bypassed the override credentials entirely and used its own default model (provider=copilot-acp model=qwen3.5-397b-a17b) instead of the configured delegation.provider and delegation.model. Fix: when override_provider is set but override_acp_command is not, clear effective_acp_command and effective_acp_args so the child agent uses direct API calls with the configured provider credentials. The existing override_acp_command path is unchanged — explicit ACP transport overrides still force provider=copilot-acp as before. Fixes #16816	2026-04-28 01:14:38 -07:00
Teknium	d7528d43ac	fix(web): scope dashboard config Reset button to the current tab (#16813 ) * Port from Kilo-Org/kilocode#9448: roll up subagent costs into parent session total Child subagents built by delegate_task() each track their own session_estimated_cost_usd, but the parent agent's total never folded those numbers in. On runs where the parent mostly delegates and the children do the expensive work, the footer/UI was reporting a fraction of the actual spend — sometimes $0.00 when the parent itself made no billed calls. Fix: - Capture each child's session_estimated_cost_usd into _child_cost_usd on the result entry (before child.close() drops the counter). - After the existing subagent_stop hook loop, sum the children's costs and add the total to parent.session_estimated_cost_usd. - Promote session_cost_source from 'none' -> 'subagent' when the parent had no direct spend but children did, so the UI doesn't label the total as having unknown provenance. Real sources (openrouter, anthropic, etc.) are preserved. Nested orchestrator -> worker trees roll up naturally: each layer's own delegate_task() folds its direct children in, and when the orchestrator itself returns, its parent folds the orchestrator's now-inflated total on top. Internal fields (_child_cost_usd, _child_role) are stripped from the results dict before it's serialised back to the model — same contract as _child_role already followed. Tests: TestSubagentCostRollup (5 cases) covers single-child, batch, zero-cost-children, preserved-source, and legacy-fixture paths. Source: https://github.com/Kilo-Org/kilocode/pull/9448 * fix(web): scope dashboard config Reset button to the current tab Reported by @ykmfb001 via X: clicking 'Restore Defaults' (恢复默认值) on the Auxiliary page wiped the entire config.yaml to defaults, not just the auxiliary section. The button sits next to the category tabs and users reasonably assumed 'reset this tab', not 'reset everything'. Changes: - handleReset now scopes to the fields in the current view: active category's fields (form mode) or search-matched fields (search mode). Only those keys are copied from defaults; the rest of the config is left alone. - Added a window.confirm() with the scope name before applying. - Button is hidden in YAML mode (scoping doesn't apply there). - Tooltip/aria-label now name the scope, e.g. 'Reset Auxiliary to defaults'. - i18n: new resetScopeTooltip / confirmResetScope / resetScopeToast strings in en + zh; resetDefaults key preserved for compat.	2026-04-27 21:09:14 -07:00
Teknium	30307a9802	feat(plugins): add pre_approval_request / post_approval_response hooks (#16776 ) Plugins can now observe dangerous-command approval events in real time, on both the CLI-interactive path and the async gateway path. This is the missing hook surface external tools need to build approval notifiers (macOS menu-bar allow/deny, Slack alerts, audit logs, etc.) without forking Hermes or running a parallel gateway adapter. Changes: - hermes_cli/plugins.py: add two entries to VALID_HOOKS - tools/approval.py: fire both hooks from check_all_command_guards -- around prompt_dangerous_approval (CLI surface) and around the notify_cb + blocking event.wait loop (gateway surface) - website/docs/user-guide/features/hooks.md: document both hooks with a macOS-notification example - tests/tools/test_approval_plugin_hooks.py: 5 tests covering CLI once, CLI deny, plugin-crash resilience, gateway approve, gateway timeout Hooks are observer-only: return values are ignored, so plugins cannot veto or pre-answer an approval (use pre_tool_call for that). A crashing plugin cannot break the approval flow -- invoke_hook swallows per- callback errors, and the wrapper logs and swallows dispatch-layer errors too. Surface kwarg distinguishes "cli" from "gateway"; post hook reports choice as one of once/session/always/deny/timeout.	2026-04-27 20:08:33 -07:00
Teknium	008860a23f	fix(approval): close remaining prompt_toolkit deadlock vectors (#15216 ) PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor workers didn't inherit the CLI's TLS approval callback). Two vectors remained that could still land in the deadlocking input() fallback: 1. _spawn_background_review spawns a raw threading.Thread with no approval callback installed, so any dangerous-command guard the review agent trips falls back to input() -> deadlock against the parent's prompt_toolkit TUI (same class as delegate_task subagents, fixed in `023b1bff1` / #15491). Install a _bg_review_auto_deny callback at thread start, clear on finally. 2. prompt_dangerous_approval's fallback unconditionally spawned a daemon thread calling input() when approval_callback was None. That fallback can never succeed under prompt_toolkit because the user's Enter goes to pt's raw-mode stdin capture. Detect an active pt Application via get_app_or_none() and fail closed (deny + log) instead, so future threads that forget to install a callback degrade gracefully instead of hanging 60s invisibly. Regression guards: - tests/run_agent/test_background_review.py verifies the review worker thread sees a callable auto-deny callback mid-run and that the slot is cleared in the finally block. - tests/tools/test_approval.py TestFailClosedUnderPromptToolkit verifies prompt_dangerous_approval returns 'deny' fast under a mocked pt Application, and that a real callback still wins over the guard.	2026-04-27 06:42:32 -07:00
Christian Scheid	75b460bc94	fix(email): add required Date header to outbound mail	2026-04-27 06:41:11 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
vominh1919	2e6699b319	fix: strip leaked declare-x env dump from terminal output on macOS (#15459 ) On macOS (bash 3.2 and some Homebrew bash builds) `source`ing a file that contains `declare -x` statements prints each declaration to stdout. The persistent-shell wrapper in tools/environments/base.py was only redirecting stderr when sourcing the session snapshot, so ~60 lines of env vars leaked into every terminal tool response — blowing out context and triggering HTTP 400s on context-limited providers. Fix: redirect both stdout and stderr when sourcing the snapshot. Linux bash is silent here, so the redirect is harmless there; macOS no longer leaks. Closes #15459 Co-authored-by: Sanjays2402 <51058514+Sanjays2402@users.noreply.github.com>	2026-04-27 00:19:48 -07:00
Teknium	a32d07529c	fix(file-tools): escalate to BLOCKED on repeated read_file dedup stubs (#16382 ) read_file's dedup path returned a lightweight stub on re-reads of an unchanged file, then returned early — so the consecutive-read loop guard (hard block at count>=4) at the bottom of read_file_tool never ran for stub-looped calls. Weaker tool-following models (local Qwen3.6 variants in the reported case) ignore the passive 'refer to earlier result' hint and hammer the same read_file call until iteration budget runs out. Track per-key stub returns in task_data['dedup_hits'] and, on the second stub for the same (path, offset, limit), return a hard BLOCKED error mirroring the wording the real-read path already uses. A real read, an intervening non-read tool call (notify_other_tool_call), or reset_file_dedup (on context compression) all clear the counter so the guard never stays engaged longer than the actual loop. Closes #15759	2026-04-27 00:17:26 -07:00
Teknium	517f30b043	improve(agent): guidance for plain-text URLs, subagent language/verification, hermes-config routing (#16325 ) Four small tool-description / skill-content tweaks addressing recurring model mistakes seen in @versun's docx feedback (Kimi 2.6, but the patterns apply to every model): 1. browser_navigate description: call out .md/.txt/.json/.yaml/.csv/.xml, raw.githubusercontent.com, and API endpoints as specifically preferring curl or web_extract. The generic "prefer web_search or web_extract" was too weak; models kept firing up the browser for plain-text URLs. 2. delegate_task description: two additions. (a) Pass user language / output-style preferences in 'context' when they differ from English — otherwise subagents default to English and their summaries contaminate the final reply (caused the bilingual digest bug). (b) Subagent summaries are self-reports, not verified facts. For operations with external side-effects (HTTP uploads, remote writes, file creation at shared paths), require a verifiable handle (URL, ID, path) and verify it yourself before claiming success. 3. agent/prompt_builder.py Skills-mandatory block: new explicit line "Whenever the user asks to configure / set up / modify / install / enable / disable / troubleshoot Hermes Agent itself, load the `hermes-agent` skill first." The generic "load what's relevant" didn't route Hermes-meta questions (like "how do I turn off redaction?") to the one skill that has the answer. 4. skills/autonomous-ai-agents/hermes-agent/SKILL.md: new "Security & Privacy Toggles" section covering security.redact_secrets (with the import-time-snapshot restart-required caveat), privacy.redact_pii, approvals.mode (manual/smart/off) + --yolo + HERMES_YOLO_MODE, shell hooks allowlist, and how to disable network/media tools entirely. Every command verified against the actual config keys — no invented knobs. Co-authored-by: teknium1 <teknium@noreply.github.com>	2026-04-26 20:57:19 -07:00
Teknium	9c416e20ab	feat(skills): install skills from a direct HTTP(S) URL (#16323 ) * feat(skills): install skills from a direct HTTP(S) URL Adds UrlSource adapter so `hermes skills install <url-to-SKILL.md>` and `/skills install <url>` work as first-class operations — no more improvising with curl + patch + cp. - Claims identifiers that start with http(s):// and end in .md - Skips /.well-known/skills/ URLs (WellKnownSkillSource handles those) - Skill name from YAML frontmatter, URL-slug fallback - Single-file SKILL.md only (v1 scope — multi-file skills need a manifest) - Trust level 'community'; full security scan still runs - Lock file stores the URL as identifier so `hermes skills update` re-fetches from the same URL cleanly Scope matches real user need from @versun's docx feedback where `https://sharethis.chat/SKILL.md` had no first-class install path. * feat(skills): interactive name/category for URL installs + --name override Follow-up to the UrlSource adapter. The previous commit fell back to weak heuristics when frontmatter had no ``name:`` and could produce garbage names like ``SKILL`` or ``unnamed-skill``. Now: tools/skills_hub.py - ``UrlSource._is_valid_skill_name()`` — strict identifier check (``^[a-z][a-z0-9_-]*$``), rejects sentinel values (``SKILL``, ``README``, ``INDEX``, ``unnamed-skill``, empty, non-strings). - ``_resolve_skill_name()`` returns ``Optional[str]`` — ``None`` when nothing valid is resolvable. Also ignores unsafe frontmatter names (``../evil``) and falls through to URL slug instead of returning None immediately, so a URL with a bad frontmatter but a good path still works. - ``fetch()``/``inspect()`` carry an ``awaiting_name=True`` marker in metadata/extra when resolution fails, letting ``do_install`` decide whether to prompt, apply an override, or error out. hermes_cli/skills_hub.py - ``do_install`` gains a ``name_override`` parameter. - On URL-sourced bundles with ``awaiting_name=True``: 1. If ``name_override`` is valid → use it. 2. If ``name_override`` is invalid → refuse with a clear error. 3. Else if ``skip_confirm=True`` (non-interactive: slash / TUI / gateway / scripts) → refuse with an actionable retry hint pointing at ``--name <your-name>`` on both CLI and slash forms. 4. Else (interactive TTY) → prompt for the name. - Interactive TTY also prompts for a category when none is given for a URL-sourced install, hinting existing category buckets so users can reuse ``productivity``, ``devops``, etc. Empty input → flat install. - ``_existing_categories()`` scans ``~/.hermes/skills/`` for subdirs that look like category buckets (contain nested SKILL.md files); skips top-level skills and hidden dirs. - ``_prompt_for_skill_name()`` / ``_prompt_for_category()`` helpers (EOF/Ctrl-C-safe, match the existing ``Confirm [y/N]`` prompt style). hermes_cli/main.py - ``hermes skills install`` argparse gains ``--name <name>``. hermes_cli/skills_hub.py (slash) - ``/skills install <url> --name <x>`` parsing added. Tests - tests/tools/test_skills_hub.py: updated ``UrlSource`` tests to assert the new ``awaiting_name`` metadata; added 4 new tests for ``_is_valid_skill_name`` rejection sets and the awaiting-name marker. - tests/hermes_cli/test_skills_hub.py: 8 new tests covering --name override accept/reject, non-interactive error, interactive name prompt, interactive category prompt, cancel-aborts-install, and ``_existing_categories`` scan behavior (buckets vs flat skills). - E2E verified all four paths (no-name/no-override → error; --name override → install; frontmatter name → install; invalid --name → rejection). --------- Co-authored-by: teknium1 <teknium@noreply.github.com>	2026-04-26 20:57:10 -07:00
sprmn24	b288934dff	fix(discord_tool): coerce limit parameter to int before min() call _search_members() and _fetch_messages() call min(limit, 100) assuming limit is int. Models can pass limit as a string (e.g. "10"), causing TypeError: '<' not supported between instances of 'str' and 'int'. Add try/except int() coercion with safe defaults at the top of both functions, matching the pattern used in session_search fix (#10522).	2026-04-26 20:48:38 -07:00
Teknium	478444c262	feat(checkpoints): auto-prune orphan and stale shadow repos at startup (#16303 ) Every working dir hermes ever touches gets its own shadow git repo under ~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/. The per-repo _prune is a no-op (comment in CheckpointManager._prune says so), so abandoned repos from deleted/moved projects or one-off tmp dirs pile up forever. Field reports put the typical offender at 1000+ repos / ~12 GB on active contributor machines. Adds an opt-in startup sweep that mirrors the sessions.auto_prune pattern from #13861 / #16286: - tools/checkpoint_manager.py: new prune_checkpoints() and maybe_auto_prune_checkpoints() helpers. Deletes shadow repos that are orphan (HERMES_WORKDIR marker points to a path that no longer exists) or stale (newest in-repo mtime older than retention_days). Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only runs once per min_interval_hours regardless of how many hermes processes start up. - hermes_cli/config.py: new checkpoints.auto_prune / retention_days / delete_orphans / min_interval_hours knobs. Default auto_prune: false so users who rely on /rollback against long-ago sessions never lose data silently. - cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune, called right next to the existing state.db maintenance block. - Docs updated with the new config knobs. - 11 regression tests: orphan/stale deletion, precedence, byte-freed tracking, non-shadow dir skip, interval gating, corrupt marker recovery. Refs #3015 (session-file disk growth was fixed in #16286; this covers the checkpoint side noted out-of-scope there).	2026-04-26 19:05:52 -07:00
Teknium	ced8f44cd2	fix(file-tools): broaden dedup-status write guard to cover small wrappers The write_file guard added in #16223 used strict equality against the internal dedup status message. In practice, the model sometimes prepends a short note or appends a trailing comment before calling write_file, which slipped past the strict check. Broaden the heuristic: reject writes whose stripped content equals the status message OR contains it and is <=2x its length. Short, status-dominated writes are always corruption; legitimate docs that quote the message verbatim are always much longer. Adds two tests: one for the small-wrapper corruption shape, one confirming large legitimate files that quote the status still write.	2026-04-26 19:05:36 -07:00
helix4u	977d5f56c9	fix(file-tools): keep read dedup status out of file content	2026-04-26 19:05:36 -07:00
voidborne-d	a32b325d06	fix(tools): invalidate read_file dedup cache on write_file and patch write_file_tool and patch_tool both call _update_read_timestamp to refresh the staleness tracker after writing, but they never invalidate the dedup cache entries for the written path. The dedup cache keys are (resolved_path, offset, limit) → mtime tuples populated by read_file_tool. On filesystems where a read and write land in the same mtime second (or when mtime granularity is 1s), the cached and current mtime are equal, so the dedup check incorrectly returns a 'File unchanged since last read' stub — even though the file was just overwritten. The agent then sees stale content (or a stale 'File not found' error) and enters expensive error-recovery loops, burning API calls. Fix: add _invalidate_dedup_for_path(filepath, task_id) that removes all dedup entries whose resolved path matches the written file. Called from _update_read_timestamp so both write_file_tool and patch_tool benefit automatically. Scoped to the writing task_id — other tasks' caches are not affected. 6 regression tests added covering: - read→write→read within same mtime second (core #13144 scenario) - invalidation across all offset/limit combinations - isolation: writing file A does not invalidate file B's cache - isolation: writing in task A does not invalidate task B's cache - _invalidate_dedup_for_path safety on missing task / empty dedup All 25 tests pass (19 existing + 6 new). Fixes #13144	2026-04-26 19:05:36 -07:00
Yukipukii1	dbe5015566	fix(session-search): exclude current lineage root deterministically in recent mode	2026-04-26 19:03:17 -07:00
Yoimex	f66ebe64e8	fix(cli): coerce use_gateway config flags in tool routing	2026-04-26 19:02:55 -07:00
Teknium	ab6879634e	yuanbao platform (#16298 ) Co-authored-by: loongzhao <loongzhao@tencent.com>	2026-04-26 18:50:49 -07:00
hharry11	fd474d0f00	fix(gateway): avoid cross-user mirror writes in per-user group sessions	2026-04-26 18:31:24 -07:00
Yukipukii1	7317d69f19	fix(security): treat quoted false as false in browser SSRF guards	2026-04-26 18:27:13 -07:00
Ivan Tonov	930494d687	fix(cron): reap orphaned MCP stdio subprocesses after each tick MCP stdio servers are spawned via the SDK's stdio_client, which on Linux uses start_new_session=True (setsid). When a cron job is cancelled mid-way (timeout, agent finish, exception), the subprocess often escapes the SDK's teardown and survives as a session leader. Because setsid() detaches the child from the gateway's process group / cgroup tree, systemd does not reap it on service restart either — so every cron tick that touches an MCP tool leaks a dangling server process. Fix: * tools/mcp_tool.py — _run_stdio now wraps the whole stdio+session context in try/finally. On any exit path (clean, exception, cancellation), PIDs still alive are moved from the active _stdio_pids set into a new _orphan_stdio_pids set. Orphan detection is done via os.kill(pid, 0) — a cheap liveness probe that never signals the target. * tools/mcp_tool.py — _kill_orphaned_mcp_children gains an include_active=False flag. Default behaviour now only reaps the orphan set so concurrent sessions (other parallel cron jobs or live user chats) are never disrupted. The existing shutdown path passes include_active=True to keep the previous "kill everything" semantics after the MCP loop is stopped. * cron/scheduler.py — the cleanup hook is moved from run_job()'s finally (which would race with parallel siblings after #13021) into tick() after the ThreadPoolExecutor has joined every future. At that point there are no in-flight sessions from this tick, so sweeping the orphan set is always safe. Net effect: zero regression for healthy sessions, and orphan MCP servers no longer accumulate between gateway restarts. Made-with: Cursor	2026-04-26 18:21:20 -07:00
bde3249023	75d3eaa0e4	fix(slack): exclude U/W user IDs from explicit target regex Slack's chat.postMessage API rejects user IDs (U...) and workspace IDs (W...) — they are not valid conversation IDs. Posting to them fails because the API requires a channel ID (C/G/D). To DM a user, the sender must first call conversations.open to obtain a D... ID. Tighten _SLACK_TARGET_RE from [CGDUW] to [CGD] so the send path rejects U/W values as explicit targets and instead falls through to channel- name resolution (where they'll fail with a clear 'could not resolve' error rather than silently getting stuck in a retry loop on the API). Flip the corresponding regression test to assert U/W values are not explicit. Matches the narrower regex briandevans proposed in #15939. Co-authored-by: briandevans <brian@bde.io>	2026-04-26 12:29:02 -07:00
hhuang91	802c7acb81	fix(Slack): resolve Slack channels by raw ID and enumerate joined channels send_message(target='slack:<channel_id>') failed with "Could not resolve" because _parse_target_ref had no Slack branch — Slack's uppercase alphanumeric IDs fell through to channel-name resolution, which only matched by name. As a fallback, the agent would retry with bare target='slack' and post to the home channel instead. Three fixes: - _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as explicit targets so the name-resolver is bypassed entirely. - resolve_channel_name tries a case-sensitive raw-ID match before the existing name match, so any platform's IDs resolve cleanly. - _build_slack now actually calls users.conversations against each workspace's AsyncWebClient (paginated), instead of only returning session-history entries. This populates the directory with public and private channels the bot has joined, so action='list' shows them and they can also be addressed by name. Errors from one workspace don't block others. build_channel_directory becomes async (Slack web calls require it). The two async-context callers in gateway/run.py are awaited; the cron ticker thread call bridges via asyncio.run_coroutine_threadsafe. Slack bot needs channels:read and groups:read scopes for full enumeration; missing scopes degrade gracefully per-workspace. addressing #15927	2026-04-26 12:29:02 -07:00
Teknium	5b2c59559a	feat(terminal): collapse subagent task_ids to shared container (#16177 ) Before: delegate_task children each allocated their own terminal sandbox keyed by child task_id. Starting extra containers (or Modal sandboxes / Daytona workspaces) is expensive, and the subagent's work is invisible to the parent — files written by the child in its container don't exist in the parent's when the subagent returns. After: a single `_resolve_container_task_id` helper maps any tool-call task_id to "default" UNLESS an env override is registered for it. The parent agent and all delegate_task children therefore share one long-lived sandbox — installed packages, cwd, /workspace files, and /tmp scratch carry over freely between them. RL and benchmark environments (TerminalBench2, HermesSweEnv, ...) opt in to isolation via `register_task_env_overrides(task_id, {...})`; those task_ids survive the collapse and get their own sandbox, preserving the per-task Docker image behavior these benchmarks rely on. file_state / active-subagents registry / TUI events still key off the original child task_id, so the 'subagent wrote a file the parent read' warning and UI per-subagent panels keep working. Tradeoff: parallel delegate_task children (tasks=[...]) now share one bash/container. Concurrent cd, env-var mutations, and writes to the same path will collide. If that bites a specific workflow, the subagent can opt back into isolation via register_task_env_overrides. Applied at four lookup sites: - tools/terminal_tool.py terminal_tool() and get_active_env() - tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd() - tools/code_execution_tool.py _get_or_create_environment() Docs: website/docs/user-guide/configuration.md updated to reflect the shared-container reality and document the RL/benchmark carve-out. Tests: tests/tools/test_shared_container_task_id.py (9 cases).	2026-04-26 11:55:02 -07:00
Teknium	42c076d349	feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136 ) When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is configured, browser_navigate now transparently spawns a local Chromium sidecar for URLs whose host resolves to a private/loopback/LAN address (localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, .local, .lan, *.internal, ::1, 169.254.x.x). Public URLs continue to use the cloud provider in the same conversation. Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase pinned the whole tool to cloud for the process — localhost URLs were either SSRF-blocked (default) or sent to Browserbase (where they 404'd because the cloud can't reach your LAN). Users who wanted 'cloud for public, local for localhost' had no way to express it short of toggling providers mid-session. Implementation uses a composite session key scheme: the bare task_id serves the cloud session, and a '{task_id}::local' sidecar serves the local Chromium. _last_active_session_key[task_id] tracks which of the two served the most recent nav so snapshot/click/fill/etc. hit the correct one. cleanup_browser(bare_task_id) reaps both. Feature is on by default. Opt out via: browser: auto_local_for_private_urls: false The cloud provider never sees private URLs. Post-redirect SSRF guard is preserved: redirects from public onto private addresses still block.	2026-04-26 09:57:58 -07:00
Teknium	20cb706e03	chore: extend [SYSTEM:→[IMPORTANT: rename + AUTHOR_MAP Follow-up to #6616 covering the remaining user-injected prompt markers that the original PR did not touch (reporter's second comment on #6576 explicitly flagged these). Azure OpenAI Default/DefaultV2 content filters treat any bracketed [SYSTEM: ...] as prompt-injection and reject with HTTP 400. Remaining call sites renamed: - cli.py: background-process notifications (watch_disabled, watch_match, completion), MCP reload notice (4 live + 1 docstring) - gateway/run.py: same notification paths + auto-loaded skill banner + MCP reload notice (5 live + 1 docstring) - tools/process_registry.py: comment reference Not renamed: - environments/hermes_base_env.py '[SYSTEM]\n{content}' — RL training trajectory rendering only, never sent to Azure, part of a symmetric [USER]/[ASSISTANT]/[TOOL] scheme. AUTHOR_MAP: buraysandro9@gmail.com -> ygd58.	2026-04-26 08:44:58 -07:00
Teknium	d09ab8ff13	fix(mcp-oauth): preserve server_url path for protected-resource validation (#16031 ) Stop pre-stripping the path from the configured MCP server URL before constructing OAuthClientProvider. The MCP SDK strips the path itself via OAuthContext.get_authorization_base_url() for authorization-server discovery, but uses the full server_url through resource_url_from_server_url() + check_resource_allowed() to validate against the server's RFC 9728 Protected Resource Metadata. For servers whose PRM advertises a path-scoped resource (e.g. Notion's https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to the origin, so check_resource_allowed() saw requested='/' vs configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP (and any other path-scoped resource). Closes #16015.	2026-04-26 05:43:54 -07:00
Teknium	eb28145f36	feat(approval): hardline blocklist for unrecoverable commands (#15878 ) Adds a floor below --yolo: a tiny set of commands so catastrophic they should never run via the agent, regardless of --yolo, gateway /yolo, approvals.mode=off, or cron approve mode. Opting into yolo is trusting the agent with your files and services — not trusting it to wipe the disk or power the box off. The list is deliberately small (12 patterns), covering only unrecoverable ops: - rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin, /lib, ~, $HOME - mkfs (any variant) - dd + redirection to raw block devices (/dev/sd, /dev/nvme, etc.) - fork bomb - kill -1 / kill -9 -1 - shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6, systemctl poweroff/reboot/halt/kexec Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x, chmod -R 777, curl \| sh) stay in DANGEROUS_PATTERNS where yolo can still pass them through — that's what yolo is for. Container backends (docker/singularity/modal/daytona) continue to bypass both hardline and dangerous checks, since nothing they do can touch the host. Inspired by Mercury Agent's permission-hardened blocklist.	2026-04-25 22:07:12 -07:00
Teknium	97d54f0e4d	fix(terminal): three-layer defense against watch_patterns notification spam (#15642 ) * fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass.	2026-04-25 06:41:58 -07:00
alt-glitch	81987f0350	feat(discord): split discord_server into discord + discord_admin tools Split the monolithic discord_server tool (14 actions) into two: - discord: core actions (fetch_messages, search_members, create_thread) that are useful for the agent's normal operation. Auto-enabled on the discord platform via the pipeline fix. - discord_admin: server management actions (list channels/roles, pins, role assignment) that require explicit opt-in via hermes tools. Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.	2026-04-25 04:50:14 -07:00
Teknium	0d548d1db9	fix(cron): wire context_from through the update action The tool schema promised 'On update, pass an empty array to clear' but the update branch ignored the context_from kwarg entirely — users could set the field at create time and never modify or clear it afterward. - tools/cronjob_tools.py: handle context_from in the update branch the same way script/enabled_toolsets/workdir are handled: normalize str/list to refs, validate each referenced job exists (same check the create branch does), store as list-or-None to match create_job()'s shape. Empty string or empty list clears the field. - tests/cron/test_cron_context_from.py: 6 new tests covering add/change/ clear (both shapes)/bad-ref/preserve-across-unrelated-update.	2026-04-25 04:49:28 -07:00
MorAlekss	5ac5365923	feat(cron): add context_from field for cron job output chaining	2026-04-25 04:49:28 -07:00
Teknium	023b1bff11	fix(delegate): resolve subagent approval prompts without deadlocking parent TUI (#15491 ) Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval callback lives in tools/terminal_tool.py's threading.local(), which worker threads do not inherit. When a subagent hits a dangerous-command guard, prompt_dangerous_approval() falls back to input() from the worker thread, deadlocking against the parent's prompt_toolkit TUI that owns stdin. Fix: install a non-interactive callback into every subagent worker thread via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)). The callback is config-gated by delegation.subagent_auto_approve: false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist) true -> _subagent_auto_approve (opt-in YOLO for cron/batch) Both emit a logger.warning audit line. Gateway sessions are unaffected because they resolve approvals via tools/approval.py's per-session queue, not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685). - hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False - cli-config.yaml.example: documented, commented (default) - tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve, _get_subagent_approval_callback, wired into the child timeout executor - tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion, and TLS scoping in the worker thread	2026-04-24 22:37:22 -07:00
MorAlekss	0ed37c0ca4	docs(delegate): document max_concurrent_children and max_spawn_depth + cost warning	2026-04-24 20:38:58 -07:00
Yukipukii1	fd10463069	fix(env): safely quote ~/ subpaths in wrapped cd commands	2026-04-24 15:25:12 -07:00
teknium1	2de8a7a229	fix(skills): drop raw_content to avoid doubling skill payload skill_view response went to the model verbatim; duplicating the SKILL.md body as raw_content on every tool call added token cost with no agent-facing benefit. Remove the field and update tests to assert on content only. The slash/preload caller (agent/skill_commands.py) already falls back to content when raw_content is absent, and it calls skill_view(preprocess=False) anyway, so content is already unrendered on that path.	2026-04-24 15:15:07 -07:00
helix4u	ead66f0c92	fix(skills): apply inline shell in skill_view	2026-04-24 15:15:07 -07:00
Teknium	fcc05284fc	fix(delegate): tool-activity-aware heartbeat stale detection (#13041 ) (#15183 ) A child running a legitimately long-running tool (terminal command, browser fetch, big file read) holds current_tool set and keeps api_call_count frozen while the tool runs. The previous stale check treated that as idle after 5 heartbeat cycles (~150s), stopped touching the parent, and let the gateway kill the session. Split the threshold in two: - _HEARTBEAT_STALE_CYCLES_IDLE=5 (~150s) — applied only when current_tool is None (child wedged between turns) - _HEARTBEAT_STALE_CYCLES_IN_TOOL=20 (~600s) — applied when the child is inside a tool call Stale counter also resets when current_tool changes (new tool = progress). The hard child_timeout_seconds (default 600s) is still the final cap, so genuinely stuck tools don't get to block forever.	2026-04-24 07:25:19 -07:00
Teknium	8d12fb1e6b	refactor(spotify): convert to built-in bundled plugin under plugins/spotify (#15174 ) Moves the Spotify integration from tools/ into plugins/spotify/, matching the existing pattern established by plugins/image_gen/ for third-party service integrations. Why: - tools/ should be reserved for foundational capabilities (terminal, read_file, web_search, etc.). tools/providers/ was a one-off directory created solely for spotify_client.py. - plugins/ is already the home for image_gen backends, memory providers, context engines, and standalone hook-based plugins. Spotify is a third-party service integration and belongs alongside those, not in tools/. - Future service integrations (eventually: Deezer, Apple Music, etc.) now have a pattern to copy. Changes: - tools/spotify_tool.py → plugins/spotify/tools.py (handlers + schemas) - tools/providers/spotify_client.py → plugins/spotify/client.py - tools/providers/ removed (was only used for Spotify) - New plugins/spotify/__init__.py with register(ctx) calling ctx.register_tool() × 7. The handler/check_fn wiring is unchanged. - New plugins/spotify/plugin.yaml (kind: backend, bundled, auto-load). - tests/tools/test_spotify_client.py: import paths updated. tools_config fix — _DEFAULT_OFF_TOOLSETS now wins over plugin auto-enable: - _get_platform_tools() previously auto-enabled unknown plugin toolsets for new platforms. That was fine for image_gen (which has no toolset of its own) but bad for Spotify, which explicitly requires opt-in (don't ship 7 tool schemas to users who don't use it). Added a check: if a plugin toolset is in _DEFAULT_OFF_TOOLSETS, it stays off until the user picks it in 'hermes tools'. Pre-existing test bug fix: - tests/hermes_cli/test_plugins.py::test_list_returns_sorted asserted names were sorted, but list_plugins() sorts by key (path-derived, e.g. image_gen/openai). With only image_gen plugins bundled, name and key order happened to agree. Adding plugins/spotify broke that coincidence (spotify sorts between openai-codex and xai by name but after xai by key). Updated test to assert key order, which is what the code actually documents. Validation: - scripts/run_tests.sh tests/hermes_cli/test_plugins.py \ tests/hermes_cli/test_tools_config.py \ tests/hermes_cli/test_spotify_auth.py \ tests/tools/test_spotify_client.py \ tests/tools/test_registry.py → 143 passed - E2E plugin load: 'spotify' appears in loaded plugins, all 7 tools register into the spotify toolset, check_fn gating intact.	2026-04-24 07:06:11 -07:00
Teknium	e5d41f05d4	feat(spotify): consolidate tools (9→7), add spotify skill, surface in hermes setup (#15154 ) Three quality improvements on top of #15121 / #15130 / #15135: 1. Tool consolidation (9 → 7) - spotify_saved_tracks + spotify_saved_albums → spotify_library with kind='tracks'\|'albums'. Handler code was ~90 percent identical across the two old tools; the merge is a behavioral no-op. - spotify_activity dropped. Its 'now_playing' action was a duplicate of spotify_playback.get_currently_playing (both return identical 204/empty payloads). Its 'recently_played' action moves onto spotify_playback as a new action — history belongs adjacent to live state. - Net: each API call ships 2 fewer tool schemas when the Spotify toolset is enabled, and the action surface is more discoverable (everything playback-related is on one tool). 2. Spotify skill (skills/media/spotify/SKILL.md) Teaches the agent canonical usage patterns so common requests don't balloon into 4+ tool calls: - 'play X' = one search, then play by URI (not search + scan + describe + play) - 'what's playing' = single get_currently_playing (no preflight get_state chain) - Don't retry on '403 Premium required' or '403 No active device' — both require user action - URI/URL/bare-ID format normalization - Full failure-mode reference for 204/401/403/429 3. Surfaced in 'hermes setup' tool status Adds 'Spotify (PKCE OAuth)' to the tool status list when auth.json has a Spotify access/refresh token. Matches the homeassistant pattern but reads from auth.json (OAuth-based) rather than env vars. Docs updated to reflect the new 7-tool surface, and mention the companion skill in the 'Using it' section. Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after renaming/replacing the spotify_activity tests with library + recently_played coverage). Docusaurus build clean.	2026-04-24 06:14:51 -07:00
Brian D. Evans	e87a2100f6	fix(mcp): auto-reconnect + retry once when the transport session expires (#13383 ) Streamable HTTP MCP servers may garbage-collect their server-side session state while the OAuth token remains valid — idle TTL, server restart, pod rotation, etc. Before this fix, the tool-call handler treated the resulting "Invalid or expired session" error as a plain tool failure with no recovery path, so every subsequent call on the affected server failed until the gateway was manually restarted. Reporter: #13383. The OAuth-based recovery path (``_handle_auth_error_and_retry``) already exists for 401s, but it only fires on auth errors. Session expiry slipped through because the access token is still valid — nothing 401'd, so the existing recovery branch was skipped. Fix --- Add a sibling function ``_handle_session_expired_and_retry`` that detects MCP session-expiry via ``_is_session_expired_error`` (a narrow allow-list of known-stable substrings: ``"invalid or expired session"``, ``"session expired"``, ``"session not found"``, ``"unknown session"``, etc.) and then uses the existing transport reconnect mechanism: * Sets ``MCPServerTask._reconnect_event`` — the server task's lifecycle loop already interprets this as "tear down the current ``streamablehttp_client`` + ``ClientSession`` and rebuild them, reusing the existing OAuth provider instance". * Waits up to 15 s for the new session to come back ready. * Retries the original call once. If the retry succeeds, returns its result and resets the circuit-breaker error count. If the retry raises, or if the reconnect doesn't ready in time, falls through to the caller's generic error path. Unlike the 401 path, this does not call ``handle_401`` — the access token is already valid and running an OAuth refresh would be a pointless round-trip. All 5 MCP handlers (``call_tool``, ``list_resources``, ``read_resource``, ``list_prompts``, ``get_prompt``) now consult both recovery paths before falling through: recovered = _handle_auth_error_and_retry(...) # 401 path if recovered is not None: return recovered recovered = _handle_session_expired_and_retry(...) # new if recovered is not None: return recovered # generic error response Narrow scope — explicitly not changed ------------------------------------- * Detection is string-based on a 5-entry allow-list. The MCP SDK wraps JSON-RPC errors in ``McpError`` whose exception type + attributes vary across SDK versions, so matching on message substrings is the durable path. Kept narrow to avoid false positives — a regular ``RuntimeError("Tool failed")`` will NOT trigger spurious reconnects (pinned by ``test_is_session_expired_rejects_unrelated_errors``). * No change to the existing 401 recovery flow. The new path is consulted only after the auth path declines (returns ``None``). * Retry count stays at 1. If the reconnect-then-retry also fails, we don't loop — the error surfaces normally so the model sees a failed tool call rather than a hang. * ``InterruptedError`` is explicitly excluded from session-expired detection so user-cancel signals always short-circuit the same way they did before (pinned by ``test_is_session_expired_rejects_interrupted_error``). Regression coverage ------------------- ``tests/tools/test_mcp_tool_session_expired.py`` (new, 16 cases): Unit tests for ``_is_session_expired_error``: * ``test_is_session_expired_detects_invalid_or_expired_session`` — reporter's exact wpcom-mcp text. * ``test_is_session_expired_detects_expired_session_variant`` — "Session expired" / "expired session" variants. * ``test_is_session_expired_detects_session_not_found`` — server GC variant ("session not found", "unknown session"). * ``test_is_session_expired_is_case_insensitive``. * ``test_is_session_expired_rejects_unrelated_errors`` — narrow-scope canary: random RuntimeError / ValueError / 401 don't trigger. * ``test_is_session_expired_rejects_interrupted_error`` — user cancel must never route through reconnect. * ``test_is_session_expired_rejects_empty_message``. Handler integration tests: * ``test_call_tool_handler_reconnects_on_session_expired`` — reporter's full repro: first call raises "Invalid or expired session", handler signals ``_reconnect_event``, retries once, returns the retry's success result with no ``error`` key. * ``test_call_tool_handler_non_session_expired_error_falls_through`` — preserved-behaviour canary: random tool failures do NOT trigger reconnect. * ``test_session_expired_handler_returns_none_without_loop`` — defensive: cold-start / shutdown race. * ``test_session_expired_handler_returns_none_without_server_record`` — torn-down server falls through cleanly. * ``test_session_expired_handler_returns_none_when_retry_also_fails`` — no retry loop on repeated failure. Parametrised across all 4 non-``tools/call`` handlers: * ``test_non_tool_handlers_also_reconnect_on_session_expired`` [list_resources / read_resource / list_prompts / get_prompt]. 15 of 16 fail on clean ``origin/main`` (``6fb69229``) with ``ImportError: cannot import name '_is_session_expired_error'`` — the fix's surface symbols don't exist there yet. The 1 passing test is an ordering artefact of pytest-xdist worker collection. Validation ---------- ``source venv/bin/activate && python -m pytest tests/tools/test_mcp_tool_session_expired.py -q`` → 16 passed. Broader MCP suite (5 files: ``test_mcp_tool.py``, ``test_mcp_tool_401_handling.py``, ``test_mcp_tool_session_expired.py``, ``test_mcp_reconnect_signal.py``, ``test_mcp_oauth.py``) → 230 passed, 0 regressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-24 05:28:45 -07:00
AntAISecurityLab	8c2732a9f9	fix(security): strip MCP auth on cross-origin redirect Add event hook to httpx.AsyncClient in MCP HTTP transport that strips Authorization headers when a redirect targets a different origin, preventing credential leakage to third-party servers.	2026-04-24 05:28:45 -07:00
Alexazhu	15050fd965	fix(mcp_oauth): raise RuntimeError instead of asserting OAuth port is set ``tools/mcp_oauth.py`` relied on ``assert _oauth_port is not None`` to guard the module-level port set by ``build_oauth_auth``. Python's ``-O`` / ``-OO`` optimization flags strip ``assert`` statements entirely, so a deployment that runs ``python -O -m hermes ...`` silently loses the check: ``_oauth_port`` stays ``None`` and the failure surfaces much later as an obscure ``int()`` or ``http.server.HTTPServer((host, None))`` TypeError rather than the intended "OAuth callback port not set" signal. Replace with an explicit ``if … raise RuntimeError(...)`` so the invariant is preserved regardless of the interpreter's optimization level. Docstring updated to document the new exception. Found during a proactive audit of ``assert`` statements in non-test code paths.	2026-04-24 05:28:45 -07:00
Amanuel Tilahun Bogale	5fa2f4258a	fix: serialize Pydantic AnyUrl fields when persisting MCP OAuth state OAuth client information and token responses from the MCP SDK contain Pydantic AnyUrl fields (client_uri, redirect_uris, etc.). The previous model_dump() call returned a dict with these AnyUrl objects still as their native Python type, which then crashed json.dumps with: TypeError: Object of type AnyUrl is not JSON serializable This caused any OAuth-based MCP server (e.g. alphaxiv) to fail registration with an "OAuth flow error" traceback during startup. Adding mode="json" tells Pydantic to serialize all fields to JSON-compatible primitives (AnyUrl -> str, datetime -> ISO string, etc.) before returning the dict, so the standard json.dumps can handle it. Three call sites fixed: - HermesTokenStorage.set_tokens - HermesTokenStorage.set_client_info - build_oauth_auth pre-registration write	2026-04-24 05:28:45 -07:00
Dilee	7e9dd9ca45	Add native Spotify tools with PKCE auth	2026-04-24 05:20:38 -07:00
Teknium	852c7f3be3	feat(cron): per-job workdir for project-aware cron runs (#15110 ) Cron jobs can now specify a per-job working directory. When set, the job runs as if launched from that directory: AGENTS.md / CLAUDE.md / .cursorrules from that dir are injected into the system prompt, and the terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD). When unset, old behaviour is preserved (no project context files, tools use the scheduler's cwd). Requested by @bluthcy. ## Mechanism - cron/jobs.py: create_job / update_job accept 'workdir'; validated to be an absolute existing directory at create/update time. - cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD at it and flip skip_context_files to False before building the agent. Restored in finally on every exit path. - cron/scheduler.py tick: workdir jobs run sequentially (outside the thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs still run in the parallel pool unchanged. - tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py: expose 'workdir' via the cronjob tool and 'hermes cron create/edit --workdir ...'. Empty string on edit clears the field. ## Validation - tests/cron/test_cron_workdir.py (21 tests): normalize, create, update, JSON round-trip via cronjob tool, tick partition (workdir jobs run on the main thread, not the pool), run_job env toggle + restore in finally. - Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py, test_config_cwd_bridge.py, test_worktree.py): 314/314 passed. - Live smoke: hermes cron create --workdir $(pwd) works; relative path rejected; list shows 'Workdir:'; edit --workdir '' clears.	2026-04-24 05:07:01 -07:00

1 2 3 4 5 ...

1087 commits