mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
* docs: browser CDP supervisor design (for upcoming PR) Design doc ahead of implementation — dialog + iframe detection/interaction via a persistent CDP supervisor. Covers backend capability matrix (verified live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split, non-goals, and test plan. Supersedes #12550. No code changes in this commit. * feat(browser): add persistent CDP supervisor for dialog + frame detection Single persistent CDP WebSocket per Hermes task_id that subscribes to Page/Runtime/Target events and maintains thread-safe state for pending dialogs, frame tree, and console errors. Supervisor lives in its own daemon thread running an asyncio loop; external callers use sync API (snapshot(), respond_to_dialog()) that bridges onto the loop. Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true} and enables Page+Runtime on each so iframe-origin dialogs surface through the same supervisor. Dialog policies: must_respond (default, 300s safety timeout), auto_dismiss, auto_accept. Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot payloads bounded on ad-heavy pages. E2E verified against real Chrome via smoke test — detects + responds to main-frame alerts, iframe-contentWindow alerts, preserves frame tree, graceful no-dialog error path, clean shutdown. No agent-facing tool wiring in this commit (comes next). * feat(browser): add browser_dialog tool wired to CDP supervisor Agent-facing response-only tool. Schema: action: 'accept' | 'dismiss' (required) prompt_text: response for prompt() dialogs (optional) dialog_id: disambiguate when multiple dialogs queued (optional) Handler: SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...) check_fn shares _browser_cdp_check with browser_cdp so both surface and hide together. When no supervisor is attached (Camofox, default Playwright, or no browser session started yet), tool is hidden; if somehow invoked it returns a clear error pointing the agent to browser_navigate / /browser connect. Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp / hermes-api-server toolsets alongside browser_cdp. * feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot Supervisor lifecycle: * _get_session_info lazy-starts the supervisor after a session row is materialized — covers every backend code path (Browserbase, cdp_url override, /browser connect, future providers) with one hook. * cleanup_browser(task_id) stops the supervisor for that task first (before the backend tears down CDP). * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all(). * /browser connect eagerly starts the supervisor for task 'default' so the first snapshot already shows pending_dialogs. * /browser disconnect stops the supervisor. CDP URL resolution for the supervisor: 1. BROWSER_CDP_URL / browser.cdp_url override. 2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase). browser_snapshot merges supervisor state (pending_dialogs + frame_tree) into its JSON output when a supervisor is active — the agent reads pending_dialogs from the snapshot it already requests, then calls browser_dialog to respond. No extra tool surface. Config defaults: * browser.dialog_policy: 'must_respond' (new) * browser.dialog_timeout_s: 300 (new) No version bump — new keys deep-merge into existing browser section. Deadlock fix in supervisor event dispatch: * _on_dialog_opening and _on_target_attached used to await CDP calls while the reader was still processing an event — but only the reader can set the response Future, so the call timed out. * Both now fire asyncio.create_task(...) so the reader stays pumping. * auto_dismiss/auto_accept now actually close the dialog immediately. Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome): * supervisor start/snapshot * main-frame alert detection + dismiss * iframe.contentWindow alert * prompt() with prompt_text reply * respond with no pending dialog -> clean error * auto_dismiss clears on event * registry idempotency * registry stop -> snapshot reports inactive * browser_dialog tool no-supervisor error * browser_dialog invalid action * browser_dialog end-to-end via tool handler xdist-safe: chrome_cdp fixture uses a per-worker port. Skipped when google-chrome/chromium isn't installed. * docs(browser): document browser_dialog tool + CDP supervisor - user-guide/features/browser.md: new browser_dialog section with workflow, availability gate, and dialog_policy table - reference/tools-reference.md: row for browser_dialog, tool count bumped 53 -> 54, browser tools count 11 -> 12 - reference/toolsets-reference.md: browser_dialog added to browser toolset row with note on pending_dialogs / frame_tree snapshot fields Full design doc lives at developer-guide/browser-supervisor.md (committed earlier). * fix(browser): reconnect loop + recent_dialogs for Browserbase visibility Found via Browserbase E2E test that revealed two production-critical issues: 1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's CDP proxy tears down our long-lived WebSocket whenever a short-lived client (e.g. agent-browser CLI's per-command CDP connection) disconnects. Fixed with a reconnecting _run loop that re-attaches with exponential backoff on drops. _page_session_id and _child_sessions are reset on each reconnect; pending_dialogs and frames are preserved across reconnects. 2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their Playwright-based CDP proxy dismisses alert/confirm/prompt before our Page.handleJavaScriptDialog call can respond. So pending_dialogs is empty by the time the agent reads a snapshot on Browserbase. Added a recent_dialogs ring buffer (capacity 20) that retains a DialogRecord for every dialog that opened, with a closed_by tag: * 'agent' — agent called browser_dialog * 'auto_policy' — local auto_dismiss/auto_accept fired * 'watchdog' — must_respond timeout auto-dismissed (300s default) * 'remote' — browser/backend closed it on us (Browserbase) Agents on Browserbase now see the dialog history with closed_by='remote' so they at least know a dialog fired, even though they couldn't respond. 3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a 'message' field (CDP spec has only 'result' and 'userInput') but our _on_dialog_closed was matching on message. Fixed to match by session_id + oldest-first, with a safety assumption that only one dialog is in flight per session (the JS thread is blocked while a dialog is up). Docs + tests updated: * browser.md: new availability matrix showing the three backends and which mode (pending / recent / response) each supports * developer-guide/browser-supervisor.md: three-field snapshot schema with closed_by semantics * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12 passing against real Chrome) E2E verified both backends: * Local Chrome via /browser connect: detect + respond full workflow (smoke_supervisor.py all 7 scenarios pass) * Browserbase: detect via recent_dialogs with closed_by='remote' (smoke_supervisor_browserbase_v2.py passes) Camofox remains out of scope (REST-only, no CDP) — tracked for upstream PR 3. * feat(browser): XHR bridge for dialog response on Browserbase (FIXED) Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so Page.handleJavaScriptDialog calls lose the race. Solution: bypass native dialogs entirely. The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a JavaScript override for window.alert/confirm/prompt. Those overrides perform a synchronous XMLHttpRequest to a magic host ('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable with a requestStage=Request pattern. Flow when a page calls alert('hi'): 1. window.alert override intercepts, builds XHR GET to http://hermes-dialog-bridge.invalid/?kind=alert&message=hi 2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics) 3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces it as a pending dialog with bridge_request_id set 4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog 5. Supervisor calls Fetch.fulfillRequest with JSON body: {accept: true|false, prompt_text: '...', dialog_id: 'd-N'} 6. The injected script parses the body, returns the appropriate value from the override (undefined for alert, bool for confirm, string|null for prompt) This works identically on Browserbase AND local Chrome — no native dialog ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog policies (must_respond / auto_dismiss / auto_accept) all still work. Bridge is installed on every attached session (main page + OOPIF child sessions) so iframe dialogs are captured too. Native-dialog path kept as a fallback for backends that don't auto-dismiss (so a page that somehow bypasses our override — e.g. iframes that load after Fetch.enable but before the init-script runs — still gets observed via Page.javascriptDialogOpening). E2E VERIFIED: * Local Chrome: 13/13 pytest tests green (12 original + new test_bridge_captures_prompt_and_returns_reply_text that asserts window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds) * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS: - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓ - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY' → page.prompt_ret === 'AGENT-REPLY' ✓ - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓ - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓ Docs updated in browser.md and developer-guide/browser-supervisor.md — availability matrix now shows Browserbase at full parity with local Chrome for both detection and response. * feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...) Adds iframe interaction to the CDP supervisor PR (was queued as PR 2). Design: browser_cdp gets an optional frame_id parameter. When set, the tool looks up the frame in the supervisor's frame_tree, grabs its child cdp_session_id (OOPIF session), and dispatches the CDP call through the supervisor's already-connected WebSocket via run_coroutine_threadsafe. Why not stateless: on Browserbase, each fresh browser_cdp WebSocket must re-negotiate against a signed connectUrl. The session info carries a specific URL that can expire while the supervisor's long-lived connection stays valid. Routing via the supervisor sidesteps this. Agent workflow: 1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true 2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>, params={'expression': 'document.title', 'returnByValue': True}) 3. Supervisor dispatches the call on the OOPIF's child session Supervisor state fixes needed along the way: * _on_frame_detached now skips reason='swap' (frame migrating processes) * _on_frame_detached also skips when the frame is an OOPIF with a live child session — Browserbase fires spurious remove events when a same-origin iframe gets promoted to OOPIF * _on_target_detached clears cdp_session_id but KEEPS the frame record so the agent still sees the OOPIF in frame_tree during transient session flaps E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py): browser_cdp(method='Runtime.evaluate', params={'expression': 'document.title', 'returnByValue': True}, frame_id=<OOPIF>) → {'success': True, 'result': {'value': 'Example Domain'}} The iframe is <iframe src='https://example.com/'> inside a top-level data: URL page on a real Browserbase session. The agent Runtime.evaluates INSIDE the cross-origin iframe and gets example.com's title back. Tests (tests/tools/test_browser_supervisor.py — 16 pass total): * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF, verifies routing via supervisor, Runtime.evaluate returns 1+1=2 * test_browser_cdp_frame_id_missing_supervisor — clean error when no supervisor attached * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad frame_id Docs (browser.md and developer-guide/browser-supervisor.md) updated with the iframe workflow, availability matrix now shows OOPIF eval as shipped for local Chrome + Browserbase. * test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process When asked 'did you test the iframe stuff' I had only done a mocked pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/ smoke_local_oopif.py: * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906) * Chrome with --site-per-process so the cross-origin iframe becomes a real OOPIF in its own process * Navigate, find OOPIF in supervisor.frame_tree, call browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes through the supervisor's child session * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the inner page, retrieved via OOPIF eval) PASSED on 2026-04-23. Tried to embed this as a pytest but hit an asyncio version quirk between venv (3.11) and the system python (3.13) — Page.navigate hangs in the pytest harness but works in standalone. Left a self-documenting skip test that points to the smoke script + describes the verification. chrome_cdp fixture now passes --site-per-process so future iframe tests can rely on OOPIF behavior. Result: 16 pass + 1 documented-skip = 17 tests in tests/tools/test_browser_supervisor.py. * docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count Pre-merge docs audit revealed two gaps: 1. user-guide/configuration.md browser config example was missing the two new dialog_* knobs. Added with a short table explaining must_respond / auto_dismiss / auto_accept semantics and a link to the feature page for the full workflow. 2. reference/tools-reference.md header said '54 built-in tools' — real count on main is 54, this branch adds browser_dialog so it's 55. Fixed the header. (browser count was already correctly bumped 11 -> 12 in the earlier docs commit.) No code changes.
732 lines
23 KiB
Python
732 lines
23 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Toolsets Module
|
|
|
|
This module provides a flexible system for defining and managing tool aliases/toolsets.
|
|
Toolsets allow you to group tools together for specific scenarios and can be composed
|
|
from individual tools or other toolsets.
|
|
|
|
Features:
|
|
- Define custom toolsets with specific tools
|
|
- Compose toolsets from other toolsets
|
|
- Built-in common toolsets for typical use cases
|
|
- Easy extension for new toolsets
|
|
- Support for dynamic toolset resolution
|
|
|
|
Usage:
|
|
from toolsets import get_toolset, resolve_toolset, get_all_toolsets
|
|
|
|
# Get tools for a specific toolset
|
|
tools = get_toolset("research")
|
|
|
|
# Resolve a toolset to get all tool names (including from composed toolsets)
|
|
all_tools = resolve_toolset("full_stack")
|
|
"""
|
|
|
|
from typing import List, Dict, Any, Set, Optional
|
|
|
|
|
|
# Shared tool list for CLI and all messaging platform toolsets.
|
|
# Edit this once to update all platforms simultaneously.
|
|
_HERMES_CORE_TOOLS = [
|
|
# Web
|
|
"web_search", "web_extract",
|
|
# Terminal + process management
|
|
"terminal", "process",
|
|
# File manipulation
|
|
"read_file", "write_file", "patch", "search_files",
|
|
# Vision + image generation
|
|
"vision_analyze", "image_generate",
|
|
# Skills
|
|
"skills_list", "skill_view", "skill_manage",
|
|
# Browser automation
|
|
"browser_navigate", "browser_snapshot", "browser_click",
|
|
"browser_type", "browser_scroll", "browser_back",
|
|
"browser_press", "browser_get_images",
|
|
"browser_vision", "browser_console", "browser_cdp", "browser_dialog",
|
|
# Text-to-speech
|
|
"text_to_speech",
|
|
# Planning & memory
|
|
"todo", "memory",
|
|
# Session history search
|
|
"session_search",
|
|
# Clarifying questions
|
|
"clarify",
|
|
# Code execution + delegation
|
|
"execute_code", "delegate_task",
|
|
# Cronjob management
|
|
"cronjob",
|
|
# Cross-platform messaging (gated on gateway running via check_fn)
|
|
"send_message",
|
|
# Home Assistant smart home control (gated on HASS_TOKEN via check_fn)
|
|
"ha_list_entities", "ha_get_state", "ha_list_services", "ha_call_service",
|
|
]
|
|
|
|
|
|
# Core toolset definitions
|
|
# These can include individual tools or reference other toolsets
|
|
TOOLSETS = {
|
|
# Basic toolsets - individual tool categories
|
|
"web": {
|
|
"description": "Web research and content extraction tools",
|
|
"tools": ["web_search", "web_extract"],
|
|
"includes": [] # No other toolsets included
|
|
},
|
|
|
|
"search": {
|
|
"description": "Web search only (no content extraction/scraping)",
|
|
"tools": ["web_search"],
|
|
"includes": []
|
|
},
|
|
|
|
"vision": {
|
|
"description": "Image analysis and vision tools",
|
|
"tools": ["vision_analyze"],
|
|
"includes": []
|
|
},
|
|
|
|
"image_gen": {
|
|
"description": "Creative generation tools (images)",
|
|
"tools": ["image_generate"],
|
|
"includes": []
|
|
},
|
|
|
|
"terminal": {
|
|
"description": "Terminal/command execution and process management tools",
|
|
"tools": ["terminal", "process"],
|
|
"includes": []
|
|
},
|
|
|
|
"moa": {
|
|
"description": "Advanced reasoning and problem-solving tools",
|
|
"tools": ["mixture_of_agents"],
|
|
"includes": []
|
|
},
|
|
|
|
"skills": {
|
|
"description": "Access, create, edit, and manage skill documents with specialized instructions and knowledge",
|
|
"tools": ["skills_list", "skill_view", "skill_manage"],
|
|
"includes": []
|
|
},
|
|
|
|
"browser": {
|
|
"description": "Browser automation for web interaction (navigate, click, type, scroll, iframes, hold-click) with web search for finding URLs",
|
|
"tools": [
|
|
"browser_navigate", "browser_snapshot", "browser_click",
|
|
"browser_type", "browser_scroll", "browser_back",
|
|
"browser_press", "browser_get_images",
|
|
"browser_vision", "browser_console", "browser_cdp",
|
|
"browser_dialog", "web_search"
|
|
],
|
|
"includes": []
|
|
},
|
|
|
|
"cronjob": {
|
|
"description": "Cronjob management tool - create, list, update, pause, resume, remove, and trigger scheduled tasks",
|
|
"tools": ["cronjob"],
|
|
"includes": []
|
|
},
|
|
|
|
"messaging": {
|
|
"description": "Cross-platform messaging: send messages to Telegram, Discord, Slack, SMS, etc.",
|
|
"tools": ["send_message"],
|
|
"includes": []
|
|
},
|
|
|
|
"rl": {
|
|
"description": "RL training tools for running reinforcement learning on Tinker-Atropos",
|
|
"tools": [
|
|
"rl_list_environments", "rl_select_environment",
|
|
"rl_get_current_config", "rl_edit_config",
|
|
"rl_start_training", "rl_check_status",
|
|
"rl_stop_training", "rl_get_results",
|
|
"rl_list_runs", "rl_test_inference"
|
|
],
|
|
"includes": []
|
|
},
|
|
|
|
"file": {
|
|
"description": "File manipulation tools: read, write, patch (with fuzzy matching), and search (content + files)",
|
|
"tools": ["read_file", "write_file", "patch", "search_files"],
|
|
"includes": []
|
|
},
|
|
|
|
"tts": {
|
|
"description": "Text-to-speech: convert text to audio with Edge TTS (free), ElevenLabs, OpenAI, or xAI",
|
|
"tools": ["text_to_speech"],
|
|
"includes": []
|
|
},
|
|
|
|
"todo": {
|
|
"description": "Task planning and tracking for multi-step work",
|
|
"tools": ["todo"],
|
|
"includes": []
|
|
},
|
|
|
|
"memory": {
|
|
"description": "Persistent memory across sessions (personal notes + user profile)",
|
|
"tools": ["memory"],
|
|
"includes": []
|
|
},
|
|
|
|
"session_search": {
|
|
"description": "Search and recall past conversations with summarization",
|
|
"tools": ["session_search"],
|
|
"includes": []
|
|
},
|
|
|
|
"clarify": {
|
|
"description": "Ask the user clarifying questions (multiple-choice or open-ended)",
|
|
"tools": ["clarify"],
|
|
"includes": []
|
|
},
|
|
|
|
"code_execution": {
|
|
"description": "Run Python scripts that call tools programmatically (reduces LLM round trips)",
|
|
"tools": ["execute_code"],
|
|
"includes": []
|
|
},
|
|
|
|
"delegation": {
|
|
"description": "Spawn subagents with isolated context for complex subtasks",
|
|
"tools": ["delegate_task"],
|
|
"includes": []
|
|
},
|
|
|
|
# "honcho" toolset removed — Honcho is now a memory provider plugin.
|
|
# Tools are injected via MemoryManager, not the toolset system.
|
|
|
|
"homeassistant": {
|
|
"description": "Home Assistant smart home control and monitoring",
|
|
"tools": ["ha_list_entities", "ha_get_state", "ha_list_services", "ha_call_service"],
|
|
"includes": []
|
|
},
|
|
|
|
"feishu_doc": {
|
|
"description": "Read Feishu/Lark document content",
|
|
"tools": ["feishu_doc_read"],
|
|
"includes": []
|
|
},
|
|
|
|
"feishu_drive": {
|
|
"description": "Feishu/Lark document comment operations (list, reply, add)",
|
|
"tools": [
|
|
"feishu_drive_list_comments", "feishu_drive_list_comment_replies",
|
|
"feishu_drive_reply_comment", "feishu_drive_add_comment",
|
|
],
|
|
"includes": []
|
|
},
|
|
|
|
|
|
# Scenario-specific toolsets
|
|
|
|
"debugging": {
|
|
"description": "Debugging and troubleshooting toolkit",
|
|
"tools": ["terminal", "process"],
|
|
"includes": ["web", "file"] # For searching error messages and solutions, and file operations
|
|
},
|
|
|
|
"safe": {
|
|
"description": "Safe toolkit without terminal access",
|
|
"tools": [],
|
|
"includes": ["web", "vision", "image_gen"]
|
|
},
|
|
|
|
# ==========================================================================
|
|
# Full Hermes toolsets (CLI + messaging platforms)
|
|
#
|
|
# All platforms share the same core tools (including send_message,
|
|
# which is gated on gateway running via its check_fn).
|
|
# ==========================================================================
|
|
|
|
"hermes-acp": {
|
|
"description": "Editor integration (VS Code, Zed, JetBrains) — coding-focused tools without messaging, audio, or clarify UI",
|
|
"tools": [
|
|
"web_search", "web_extract",
|
|
"terminal", "process",
|
|
"read_file", "write_file", "patch", "search_files",
|
|
"vision_analyze",
|
|
"skills_list", "skill_view", "skill_manage",
|
|
"browser_navigate", "browser_snapshot", "browser_click",
|
|
"browser_type", "browser_scroll", "browser_back",
|
|
"browser_press", "browser_get_images",
|
|
"browser_vision", "browser_console", "browser_cdp", "browser_dialog",
|
|
"todo", "memory",
|
|
"session_search",
|
|
"execute_code", "delegate_task",
|
|
],
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-api-server": {
|
|
"description": "OpenAI-compatible API server — full agent tools accessible via HTTP (no interactive UI tools like clarify or send_message)",
|
|
"tools": [
|
|
# Web
|
|
"web_search", "web_extract",
|
|
# Terminal + process management
|
|
"terminal", "process",
|
|
# File manipulation
|
|
"read_file", "write_file", "patch", "search_files",
|
|
# Vision + image generation
|
|
"vision_analyze", "image_generate",
|
|
# Skills
|
|
"skills_list", "skill_view", "skill_manage",
|
|
# Browser automation
|
|
"browser_navigate", "browser_snapshot", "browser_click",
|
|
"browser_type", "browser_scroll", "browser_back",
|
|
"browser_press", "browser_get_images",
|
|
"browser_vision", "browser_console", "browser_cdp", "browser_dialog",
|
|
# Planning & memory
|
|
"todo", "memory",
|
|
# Session history search
|
|
"session_search",
|
|
# Code execution + delegation
|
|
"execute_code", "delegate_task",
|
|
# Cronjob management
|
|
"cronjob",
|
|
# Home Assistant smart home control (gated on HASS_TOKEN via check_fn)
|
|
"ha_list_entities", "ha_get_state", "ha_list_services", "ha_call_service",
|
|
|
|
],
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-cli": {
|
|
"description": "Full interactive CLI toolset - all default tools plus cronjob management",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-cron": {
|
|
# Mirrors hermes-cli so cron's "default" toolset is the same set of
|
|
# core tools users see interactively — then `hermes tools` filters
|
|
# them down per the platform config. _DEFAULT_OFF_TOOLSETS (moa,
|
|
# homeassistant, rl) are excluded by _get_platform_tools() unless
|
|
# the user explicitly enables them.
|
|
"description": "Default cron toolset - same core tools as hermes-cli; gated by `hermes tools`",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-telegram": {
|
|
"description": "Telegram bot toolset - full access for personal use (terminal has safety checks)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-discord": {
|
|
"description": "Discord bot toolset - full access (terminal has safety checks via dangerous command approval)",
|
|
"tools": _HERMES_CORE_TOOLS + [
|
|
# Discord server introspection & management (gated on DISCORD_BOT_TOKEN via check_fn)
|
|
"discord_server",
|
|
],
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-whatsapp": {
|
|
"description": "WhatsApp bot toolset - similar to Telegram (personal messaging, more trusted)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-slack": {
|
|
"description": "Slack bot toolset - full access for workspace use (terminal has safety checks)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-signal": {
|
|
"description": "Signal bot toolset - encrypted messaging platform (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-bluebubbles": {
|
|
"description": "BlueBubbles iMessage bot toolset - Apple iMessage via local BlueBubbles server",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-homeassistant": {
|
|
"description": "Home Assistant bot toolset - smart home event monitoring and control",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-email": {
|
|
"description": "Email bot toolset - interact with Hermes via email (IMAP/SMTP)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-mattermost": {
|
|
"description": "Mattermost bot toolset - self-hosted team messaging (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-matrix": {
|
|
"description": "Matrix bot toolset - decentralized encrypted messaging (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-dingtalk": {
|
|
"description": "DingTalk bot toolset - enterprise messaging platform (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-feishu": {
|
|
"description": "Feishu/Lark bot toolset - enterprise messaging via Feishu/Lark (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-weixin": {
|
|
"description": "Weixin bot toolset - personal WeChat messaging via iLink (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-qqbot": {
|
|
"description": "QQBot toolset - QQ messaging via Official Bot API v2 (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-wecom": {
|
|
"description": "WeCom bot toolset - enterprise WeChat messaging (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-wecom-callback": {
|
|
"description": "WeCom callback toolset - enterprise self-built app messaging (full access)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-sms": {
|
|
"description": "SMS bot toolset - interact with Hermes via SMS (Twilio)",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-webhook": {
|
|
"description": "Webhook toolset - receive and process external webhook events",
|
|
"tools": _HERMES_CORE_TOOLS,
|
|
"includes": []
|
|
},
|
|
|
|
"hermes-gateway": {
|
|
"description": "Gateway toolset - union of all messaging platform tools",
|
|
"tools": [],
|
|
"includes": ["hermes-telegram", "hermes-discord", "hermes-whatsapp", "hermes-slack", "hermes-signal", "hermes-bluebubbles", "hermes-homeassistant", "hermes-email", "hermes-sms", "hermes-mattermost", "hermes-matrix", "hermes-dingtalk", "hermes-feishu", "hermes-wecom", "hermes-wecom-callback", "hermes-weixin", "hermes-qqbot", "hermes-webhook"]
|
|
}
|
|
}
|
|
|
|
|
|
|
|
def get_toolset(name: str) -> Optional[Dict[str, Any]]:
|
|
"""
|
|
Get a toolset definition by name.
|
|
|
|
Args:
|
|
name (str): Name of the toolset
|
|
|
|
Returns:
|
|
Dict: Toolset definition with description, tools, and includes
|
|
None: If toolset not found
|
|
"""
|
|
toolset = TOOLSETS.get(name)
|
|
if toolset:
|
|
return toolset
|
|
|
|
try:
|
|
from tools.registry import registry
|
|
except Exception:
|
|
return None
|
|
|
|
registry_toolset = name
|
|
description = f"Plugin toolset: {name}"
|
|
alias_target = registry.get_toolset_alias_target(name)
|
|
|
|
if name not in _get_plugin_toolset_names():
|
|
registry_toolset = alias_target
|
|
if not registry_toolset:
|
|
return None
|
|
description = f"MCP server '{name}' tools"
|
|
else:
|
|
reverse_aliases = {
|
|
canonical: alias
|
|
for alias, canonical in _get_registry_toolset_aliases().items()
|
|
if alias not in TOOLSETS
|
|
}
|
|
alias = reverse_aliases.get(name)
|
|
if alias:
|
|
description = f"MCP server '{alias}' tools"
|
|
|
|
return {
|
|
"description": description,
|
|
"tools": registry.get_tool_names_for_toolset(registry_toolset),
|
|
"includes": [],
|
|
}
|
|
|
|
|
|
def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
|
|
"""
|
|
Recursively resolve a toolset to get all tool names.
|
|
|
|
This function handles toolset composition by recursively resolving
|
|
included toolsets and combining all tools.
|
|
|
|
Args:
|
|
name (str): Name of the toolset to resolve
|
|
visited (Set[str]): Set of already visited toolsets (for cycle detection)
|
|
|
|
Returns:
|
|
List[str]: List of all tool names in the toolset
|
|
"""
|
|
if visited is None:
|
|
visited = set()
|
|
|
|
# Special aliases that represent all tools across every toolset
|
|
# This ensures future toolsets are automatically included without changes.
|
|
if name in {"all", "*"}:
|
|
all_tools: Set[str] = set()
|
|
for toolset_name in get_toolset_names():
|
|
# Use a fresh visited set per branch to avoid cross-branch contamination
|
|
resolved = resolve_toolset(toolset_name, visited.copy())
|
|
all_tools.update(resolved)
|
|
return sorted(all_tools)
|
|
|
|
# Check for cycles / already-resolved (diamond deps).
|
|
# Silently return [] — either this is a diamond (not a bug, tools already
|
|
# collected via another path) or a genuine cycle (safe to skip).
|
|
if name in visited:
|
|
return []
|
|
|
|
visited.add(name)
|
|
|
|
# Get toolset definition
|
|
toolset = get_toolset(name)
|
|
if not toolset:
|
|
return []
|
|
|
|
# Collect direct tools
|
|
tools = set(toolset.get("tools", []))
|
|
|
|
# Recursively resolve included toolsets, sharing the visited set across
|
|
# sibling includes so diamond dependencies are only resolved once and
|
|
# cycle warnings don't fire multiple times for the same cycle.
|
|
for included_name in toolset.get("includes", []):
|
|
included_tools = resolve_toolset(included_name, visited)
|
|
tools.update(included_tools)
|
|
|
|
return sorted(tools)
|
|
|
|
|
|
def resolve_multiple_toolsets(toolset_names: List[str]) -> List[str]:
|
|
"""
|
|
Resolve multiple toolsets and combine their tools.
|
|
|
|
Args:
|
|
toolset_names (List[str]): List of toolset names to resolve
|
|
|
|
Returns:
|
|
List[str]: Combined list of all tool names (deduplicated)
|
|
"""
|
|
all_tools = set()
|
|
|
|
for name in toolset_names:
|
|
tools = resolve_toolset(name)
|
|
all_tools.update(tools)
|
|
|
|
return sorted(all_tools)
|
|
|
|
|
|
def _get_plugin_toolset_names() -> Set[str]:
|
|
"""Return toolset names registered by plugins (from the tool registry).
|
|
|
|
These are toolsets that exist in the registry but not in the static
|
|
``TOOLSETS`` dict — i.e. they were added by plugins at load time.
|
|
"""
|
|
try:
|
|
from tools.registry import registry
|
|
return {
|
|
toolset_name
|
|
for toolset_name in registry.get_registered_toolset_names()
|
|
if toolset_name not in TOOLSETS
|
|
}
|
|
except Exception:
|
|
return set()
|
|
|
|
|
|
def _get_registry_toolset_aliases() -> Dict[str, str]:
|
|
"""Return explicit toolset aliases registered in the live registry."""
|
|
try:
|
|
from tools.registry import registry
|
|
return registry.get_registered_toolset_aliases()
|
|
except Exception:
|
|
return {}
|
|
|
|
|
|
def get_all_toolsets() -> Dict[str, Dict[str, Any]]:
|
|
"""
|
|
Get all available toolsets with their definitions.
|
|
|
|
Includes both statically-defined toolsets and plugin-registered ones.
|
|
|
|
Returns:
|
|
Dict: All toolset definitions
|
|
"""
|
|
result = dict(TOOLSETS)
|
|
aliases = _get_registry_toolset_aliases()
|
|
for ts_name in _get_plugin_toolset_names():
|
|
display_name = ts_name
|
|
for alias, canonical in aliases.items():
|
|
if canonical == ts_name and alias not in TOOLSETS:
|
|
display_name = alias
|
|
break
|
|
if display_name in result:
|
|
continue
|
|
toolset = get_toolset(display_name)
|
|
if toolset:
|
|
result[display_name] = toolset
|
|
return result
|
|
|
|
|
|
def get_toolset_names() -> List[str]:
|
|
"""
|
|
Get names of all available toolsets (excluding aliases).
|
|
|
|
Includes plugin-registered toolset names.
|
|
|
|
Returns:
|
|
List[str]: List of toolset names
|
|
"""
|
|
names = set(TOOLSETS.keys())
|
|
aliases = _get_registry_toolset_aliases()
|
|
for ts_name in _get_plugin_toolset_names():
|
|
for alias, canonical in aliases.items():
|
|
if canonical == ts_name and alias not in TOOLSETS:
|
|
names.add(alias)
|
|
break
|
|
else:
|
|
names.add(ts_name)
|
|
return sorted(names)
|
|
|
|
|
|
|
|
|
|
def validate_toolset(name: str) -> bool:
|
|
"""
|
|
Check if a toolset name is valid.
|
|
|
|
Args:
|
|
name (str): Toolset name to validate
|
|
|
|
Returns:
|
|
bool: True if valid, False otherwise
|
|
"""
|
|
# Accept special alias names for convenience
|
|
if name in {"all", "*"}:
|
|
return True
|
|
if name in TOOLSETS:
|
|
return True
|
|
if name in _get_plugin_toolset_names():
|
|
return True
|
|
return name in _get_registry_toolset_aliases()
|
|
|
|
|
|
def create_custom_toolset(
|
|
name: str,
|
|
description: str,
|
|
tools: List[str] = None,
|
|
includes: List[str] = None
|
|
) -> None:
|
|
"""
|
|
Create a custom toolset at runtime.
|
|
|
|
Args:
|
|
name (str): Name for the new toolset
|
|
description (str): Description of the toolset
|
|
tools (List[str]): Direct tools to include
|
|
includes (List[str]): Other toolsets to include
|
|
"""
|
|
TOOLSETS[name] = {
|
|
"description": description,
|
|
"tools": tools or [],
|
|
"includes": includes or []
|
|
}
|
|
|
|
|
|
|
|
|
|
def get_toolset_info(name: str) -> Dict[str, Any]:
|
|
"""
|
|
Get detailed information about a toolset including resolved tools.
|
|
|
|
Args:
|
|
name (str): Toolset name
|
|
|
|
Returns:
|
|
Dict: Detailed toolset information
|
|
"""
|
|
toolset = get_toolset(name)
|
|
if not toolset:
|
|
return None
|
|
|
|
resolved_tools = resolve_toolset(name)
|
|
|
|
return {
|
|
"name": name,
|
|
"description": toolset["description"],
|
|
"direct_tools": toolset["tools"],
|
|
"includes": toolset["includes"],
|
|
"resolved_tools": resolved_tools,
|
|
"tool_count": len(resolved_tools),
|
|
"is_composite": bool(toolset["includes"])
|
|
}
|
|
|
|
|
|
|
|
|
|
if __name__ == "__main__":
|
|
print("Toolsets System Demo")
|
|
print("=" * 60)
|
|
|
|
print("\nAvailable Toolsets:")
|
|
print("-" * 40)
|
|
for name, toolset in get_all_toolsets().items():
|
|
info = get_toolset_info(name)
|
|
composite = "[composite]" if info["is_composite"] else "[leaf]"
|
|
print(f" {composite} {name:20} - {toolset['description']}")
|
|
print(f" Tools: {len(info['resolved_tools'])} total")
|
|
|
|
print("\nToolset Resolution Examples:")
|
|
print("-" * 40)
|
|
for name in ["web", "terminal", "safe", "debugging"]:
|
|
tools = resolve_toolset(name)
|
|
print(f"\n {name}:")
|
|
print(f" Resolved to {len(tools)} tools: {', '.join(sorted(tools))}")
|
|
|
|
print("\nMultiple Toolset Resolution:")
|
|
print("-" * 40)
|
|
combined = resolve_multiple_toolsets(["web", "vision", "terminal"])
|
|
print(" Combining ['web', 'vision', 'terminal']:")
|
|
print(f" Result: {', '.join(sorted(combined))}")
|
|
|
|
print("\nCustom Toolset Creation:")
|
|
print("-" * 40)
|
|
create_custom_toolset(
|
|
name="my_custom",
|
|
description="My custom toolset for specific tasks",
|
|
tools=["web_search"],
|
|
includes=["terminal", "vision"]
|
|
)
|
|
custom_info = get_toolset_info("my_custom")
|
|
print(" Created 'my_custom' toolset:")
|
|
print(f" Description: {custom_info['description']}")
|
|
print(f" Resolved tools: {', '.join(custom_info['resolved_tools'])}")
|