hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

History

Teknium b07791db05 feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. ## What this adds - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. ## Tests 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. ## Not in this PR - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. ## Caveats - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.		2026-04-23 16:44:24 -07:00
..
browser_providers	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in	2026-04-16 12:36:49 -07:00
computer_use	feat(computer-use): cua-driver backend, universal any-model schema	2026-04-23 16:44:24 -07:00
environments	fix(terminal): auto-source ~/.profile and ~/.bash_profile so n/nvm PATH survives (#14534 )	2026-04-23 05:15:37 -07:00
neutts_samples	refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow	2026-03-17 02:33:12 -07:00
__init__.py	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 08:48:54 +09:00
ansi_strip.py	fix: strip ANSI at the source — clean terminal output before it reaches the model	2026-03-23 07:43:12 -07:00
approval.py	test: cover absolute paths in project env/config approval regex	2026-04-23 14:05:36 -07:00
binary_extensions.py	fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening	2026-04-08 02:24:32 -07:00
browser_camofox.py	refactor: remove remaining redundant local imports (comprehensive sweep)	2026-04-21 00:50:58 -07:00
browser_camofox_state.py	feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400 ) (#4419 )	2026-04-01 04:18:50 -07:00
browser_cdp_tool.py	fix: separate browser_cdp into its own toolset	2026-04-22 17:45:17 -07:00
browser_tool.py	perf(browser): upgrade agent-browser 0.13 -> 0.26, wire daemon idle timeout	2026-04-22 16:33:36 -07:00
budget_config.py	fix: preserve existing thresholds, remove pre-read byte guard	2026-04-08 02:24:32 -07:00
checkpoint_manager.py	refactor: remove redundant local imports already available at module level	2026-04-21 00:50:58 -07:00
clarify_tool.py	refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites	2026-04-07 13:36:38 -07:00
code_execution_tool.py	fix(tools): restrict RPC socket permissions to owner-only	2026-04-22 17:27:18 -07:00
computer_use_tool.py	feat(computer-use): cua-driver backend, universal any-model schema	2026-04-23 16:44:24 -07:00
credential_files.py	refactor: extract shared helpers to deduplicate repeated code patterns (#7917 )	2026-04-11 13:59:52 -07:00
cronjob_tools.py	feat(cron): expose enabled_toolsets in cronjob tool and create_job()	2026-04-23 15:16:18 -07:00
debug_helpers.py	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )	2026-04-07 10:25:31 -07:00
delegate_tool.py	fix(delegate): remove model-facing max_iterations override; config is authoritative (#14732 )	2026-04-23 13:56:26 -07:00
discord_tool.py	feat: add Discord server introspection and management tool (#4753 )	2026-04-19 11:52:19 -07:00
env_passthrough.py	fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523 )	2026-04-21 06:14:25 -07:00
feishu_doc_tool.py	fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP	2026-04-17 19:04:11 -07:00
feishu_drive_tool.py	fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP	2026-04-17 19:04:11 -07:00
file_operations.py	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
file_state.py	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 )	2026-04-21 16:41:26 -07:00
file_tools.py	fix(file_tools): resolve bookkeeping paths against live terminal cwd	2026-04-23 15:11:52 -07:00
fuzzy_match.py	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage	2026-04-21 02:03:46 -07:00
homeassistant_tool.py	fix: clean up description escaping, add string-data tests	2026-04-13 04:45:07 -07:00
image_generation_tool.py	fix(image-gen): force-refresh plugin providers in long-lived sessions	2026-04-23 03:01:18 -07:00
interrupt.py	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 )	2026-04-17 20:39:25 -07:00
managed_tool_gateway.py	fix(tools): add debug logging for token refresh and tighten domain check	2026-04-02 12:40:03 +11:00
mcp_oauth.py	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 )	2026-04-19 16:31:07 -07:00
mcp_oauth_manager.py	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 )	2026-04-19 16:31:07 -07:00
mcp_tool.py	fix(mcp): rewrite definitions refs to in input schemas	2026-04-23 15:56:57 -07:00
memory_tool.py	fix: nest msvcrt import inside fcntl except block	2026-04-14 10:18:05 -07:00
mixture_of_agents_tool.py	Fix (mixture_of_agents): replace deprecated Gemini model and forward max_tokens to OpenRouter (#6621 )	2026-04-23 15:14:11 -07:00
neutts_synth.py	fix(tts): document NeuTTS provider and align install guidance (#1903 )	2026-03-18 02:55:30 -07:00
openrouter_client.py	refactor: route ad-hoc LLM consumers through centralized provider router	2026-03-11 20:02:36 -07:00
osv_check.py	feat: OSV malware check for MCP extension packages (#5305 )	2026-04-05 12:46:07 -07:00
patch_parser.py	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage	2026-04-21 02:03:46 -07:00
path_security.py	refactor: extract shared helpers to deduplicate repeated code patterns (#7917 )	2026-04-11 13:59:52 -07:00
process_registry.py	refactor: remove redundant local imports already available at module level	2026-04-21 00:50:58 -07:00
registry.py	fix: tighten AST check to module-level only	2026-04-14 21:12:29 -07:00
rl_training_tool.py	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )	2026-04-07 10:25:31 -07:00
send_message_tool.py	refactor: remove remaining redundant local imports (comprehensive sweep)	2026-04-21 00:50:58 -07:00
session_search_tool.py	fix(aux): add session_search extra_body and concurrency controls	2026-04-20 00:47:39 -07:00
skill_manager_tool.py	feat(skills-guard): gate agent-created scanner on config.skills.guard_agent_created (default off)	2026-04-23 06:20:47 -07:00
skills_guard.py	feat(skills-guard): gate agent-created scanner on config.skills.guard_agent_created (default off)	2026-04-23 06:20:47 -07:00
skills_hub.py	feat(skills): add MiniMax-AI/cli as default skill tap	2026-04-23 02:35:13 -07:00
skills_sync.py	feat(skills_sync): surface collision with reset-hint	2026-04-23 05:09:08 -07:00
skills_tool.py	fix(skills): follow symlinked category dirs consistently	2026-04-23 14:05:47 -07:00
terminal_tool.py	fix(terminal): forward docker_forward_env and docker_env to container_config	2026-04-22 17:45:56 -07:00
tirith_security.py	fix: guard against None tirith path in security scanner	2026-04-23 03:08:53 -07:00
todo_tool.py	fix(tools): enforce ID uniqueness in TODO store during replace operations	2026-04-11 16:22:50 -07:00
tool_backend_helpers.py	fix(fal): extend whitespace-only FAL_KEY handling to all call sites	2026-04-21 02:04:21 -07:00
tool_result_storage.py	fix(tools): neutralize shell injection in _write_to_sandbox via path quoting (#7940 )	2026-04-11 14:26:11 -07:00
transcription_tools.py	review(stt-xai): address cetej's nits	2026-04-23 01:57:33 -07:00
tts_tool.py	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 )	2026-04-21 17:49:39 -07:00
url_safety.py	feat(security): add global toggle to allow private/internal URL resolution	2026-04-22 14:38:59 -07:00
vision_tools.py	fix: vision tool respects auxiliary.vision.temperature from config (#4661 )	2026-04-20 00:32:09 -07:00
voice_mode.py	fix: point optional-dep install hints at the venv's python (#11938 )	2026-04-17 21:16:33 -07:00
web_tools.py	feat(web): support TAVILY_BASE_URL env var for custom proxy endpoints	2026-04-22 17:36:33 -07:00
website_policy.py	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )	2026-04-07 10:25:31 -07:00
xai_http.py	feat(xai): upgrade to Responses API, add TTS provider	2026-04-16 02:24:08 -07:00