Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.
9.1 KiB
| sidebar_position | title | description |
|---|---|---|
| 4 | Toolsets Reference | Reference for Hermes core, composite, platform, and dynamic toolsets |
Toolsets Reference
Toolsets are named bundles of tools that control what the agent can do. They're the primary mechanism for configuring tool availability per platform, per session, or per task.
How Toolsets Work
Every tool belongs to exactly one toolset. When you enable a toolset, all tools in that bundle become available to the agent. Toolsets come in three kinds:
- Core — A single logical group of related tools (e.g.,
filebundlesread_file,write_file,patch,search_files) - Composite — Combines multiple core toolsets for a common scenario (e.g.,
debuggingbundles file, terminal, and web tools) - Platform — A complete tool configuration for a specific deployment context (e.g.,
hermes-cliis the default for interactive CLI sessions)
Configuring Toolsets
Per-session (CLI)
hermes chat --toolsets web,file,terminal
hermes chat --toolsets debugging # composite — expands to file + terminal + web
hermes chat --toolsets all # everything
Per-platform (config.yaml)
toolsets:
- hermes-cli # default for CLI
# - hermes-telegram # override for Telegram gateway
Interactive management
hermes tools # curses UI to enable/disable per platform
Or in-session:
/tools list
/tools disable browser
/tools enable rl
Core Toolsets
| Toolset | Tools | Purpose |
|---|---|---|
browser |
browser_back, browser_click, browser_console, browser_get_images, browser_navigate, browser_press, browser_scroll, browser_snapshot, browser_type, browser_vision, web_search |
Core browser automation. Includes web_search as a fallback for quick lookups. browser_cdp and browser_dialog live in a separate browser-cdp toolset and are registered only when a CDP endpoint is reachable at session start — via /browser connect, browser.cdp_url config, Browserbase, or Camofox. browser_dialog works together with the pending_dialogs and frame_tree fields that browser_snapshot adds when a CDP supervisor is attached. |
clarify |
clarify |
Ask the user a question when the agent needs clarification. |
code_execution |
execute_code |
Run Python scripts that call Hermes tools programmatically. |
cronjob |
cronjob |
Schedule and manage recurring tasks. |
debugging |
composite (file + terminal + web) |
Debug bundle — file, process/terminal, web extract/search. |
delegation |
delegate_task |
Spawn isolated subagent instances for parallel work. |
discord |
discord |
Core Discord text/embed/DM actions (gateway-only). Active on the hermes-discord toolset. |
discord_admin |
discord_admin |
Discord moderation (bans, role changes, channel management). Active on the hermes-discord toolset; requires the bot to hold the relevant Discord permissions. |
feishu_doc |
feishu_doc_read |
Read Feishu/Lark document content. Used by the Feishu document-comment intelligent-reply handler. |
feishu_drive |
feishu_drive_add_comment, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment |
Feishu/Lark drive comment operations. Scoped to the comment agent; not exposed on hermes-cli or other messaging toolsets. |
file |
patch, read_file, search_files, write_file |
File reading, writing, searching, and editing. |
homeassistant |
ha_call_service, ha_get_state, ha_list_entities, ha_list_services |
Smart home control via Home Assistant. Only available when HASS_TOKEN is set. |
computer_use |
computer_use |
Background macOS desktop control via cua-driver — does not steal cursor/focus. Works with any tool-capable model. macOS only; requires cua-driver on $PATH. |
image_gen |
image_generate |
Text-to-image generation via FAL.ai (with opt-in OpenAI / xAI backends). |
memory |
memory |
Persistent cross-session memory management. |
messaging |
send_message |
Send messages to other platforms (Telegram, Discord, etc.) from within a session. |
moa |
mixture_of_agents |
Multi-model consensus via Mixture of Agents. |
rl |
rl_check_status, rl_edit_config, rl_get_current_config, rl_get_results, rl_list_environments, rl_list_runs, rl_select_environment, rl_start_training, rl_stop_training, rl_test_inference |
RL training environment management (Atropos). |
safe |
image_generate, vision_analyze, web_extract, web_search (via includes) |
Read-only research + media generation. No file writes, no terminal, no code execution. |
search |
web_search |
Web search only (without extract). |
session_search |
session_search |
Search past conversation sessions. |
skills |
skill_manage, skill_view, skills_list |
Skill CRUD and browsing. |
spotify |
spotify_albums, spotify_devices, spotify_library, spotify_playback, spotify_playlists, spotify_queue, spotify_search |
Native Spotify control (playback, queue, search, playlists, albums, library). Registered by the bundled spotify plugin. |
terminal |
process, terminal |
Shell command execution and background process management. |
todo |
todo |
Task list management within a session. |
tts |
text_to_speech |
Text-to-speech audio generation. |
vision |
vision_analyze |
Image analysis via vision-capable models. |
web |
web_extract, web_search |
Web search and page content extraction. |
yuanbao |
yb_query_group_info, yb_query_group_members, yb_search_sticker, yb_send_dm, yb_send_sticker |
Yuanbao DM/group actions and sticker search. Registered only on hermes-yuanbao. |
Platform Toolsets
Platform toolsets define the complete tool configuration for a deployment target. Most messaging platforms use the same set as hermes-cli:
| Toolset | Differences from hermes-cli |
|---|---|
hermes-cli |
Full toolset — 38 tools. The default for interactive CLI sessions. |
hermes-acp |
Drops clarify, cronjob, image_generate, send_message, text_to_speech, and all four Home Assistant tools. Focused on coding tasks in IDE context. |
hermes-api-server |
Drops clarify, send_message, and text_to_speech. Keeps everything else — suitable for programmatic access where user interaction isn't possible. |
hermes-cron |
Same as hermes-cli. |
hermes-telegram |
Same as hermes-cli. |
hermes-discord |
Adds discord and discord_admin on top of hermes-cli. |
hermes-slack |
Same as hermes-cli. |
hermes-whatsapp |
Same as hermes-cli. |
hermes-signal |
Same as hermes-cli. |
hermes-matrix |
Same as hermes-cli. |
hermes-mattermost |
Same as hermes-cli. |
hermes-email |
Same as hermes-cli. |
hermes-sms |
Same as hermes-cli. |
hermes-bluebubbles |
Same as hermes-cli. |
hermes-dingtalk |
Same as hermes-cli. |
hermes-feishu |
Adds the five feishu_doc_* / feishu_drive_* tools (only used by the document-comment handler, not the regular chat adapter). |
hermes-qqbot |
Same as hermes-cli. |
hermes-wecom |
Same as hermes-cli. |
hermes-wecom-callback |
Same as hermes-cli. |
hermes-weixin |
Same as hermes-cli. |
hermes-yuanbao |
Adds the five yb_* tools (DM/group/sticker) on top of hermes-cli. |
hermes-homeassistant |
Same as hermes-cli (the Home Assistant tools are already present by default and activate when HASS_TOKEN is set). |
hermes-webhook |
Same as hermes-cli. |
hermes-gateway |
Internal gateway orchestrator toolset — union of every hermes-<platform> toolset; used when the gateway needs to accept any message source. |
Dynamic Toolsets
MCP server toolsets
Each configured MCP server generates a mcp-<server> toolset at runtime. For example, if you configure a github MCP server, a mcp-github toolset is created containing all tools that server exposes.
# config.yaml
mcp_servers:
github:
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
This creates a mcp-github toolset you can reference in --toolsets or platform configs.
Plugin toolsets
Plugins can register their own toolsets via ctx.register_tool() during plugin initialization. These appear alongside built-in toolsets and can be enabled/disabled the same way.
Custom toolsets
Define custom toolsets in config.yaml to create project-specific bundles:
toolsets:
- hermes-cli
custom_toolsets:
data-science:
- file
- terminal
- code_execution
- web
- vision
Wildcards
allor*— expands to every registered toolset (built-in + dynamic + plugin)
Relationship to hermes tools
The hermes tools command provides a curses-based UI for toggling individual tools on or off per platform. This operates at the tool level (finer than toolsets) and persists to config.yaml. Disabled tools are filtered out even if their toolset is enabled.
See also: Tools Reference for the complete list of individual tools and their parameters.