hermes-agent/tools
xxxigm e02a7e5e1c fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015)
When the active main model has no vision capability — or when the user
explicitly configured auxiliary.vision in config.yaml — sending the
captured screenshot back to the main model in a multimodal tool-result
envelope is the wrong move: it trips HTTP 404 / 400 at the provider
boundary (e.g. 'No endpoints found that support image input') and the
agent loop reports a hard tool failure for what should have been a
simple capture.

The reporter on #24015 hit this with:

  model:
    default: tencent/hy3-preview      # no vision support
    provider: openrouter
  auxiliary:
    vision:
      provider: openrouter
      model: google/gemini-2.5-flash  # explicitly configured

…and observed:

  computer_use(action='capture', mode='som')
  → ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404]
     🔌 Provider: openrouter  Model: tencent/hy3-preview
     📝 Error: HTTP 404: No endpoints found that support image input

Fix: in tools/computer_use/tool.py::_capture_response, after a
screenshot is captured (modes 'som' / 'vision'), consult the routing
helper introduced earlier in this branch. When it says 'route to aux',
materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze
on it (which honours auxiliary.vision via the standard async_call_llm
task='vision' router), and return a text-only JSON tool result that
embeds the analysis alongside the existing AX/SOM index. The main
model never sees the pixels — it sees an actionable text description
plus the same set-of-mark element index it normally uses.

The two new helpers (_should_route_through_aux_vision,
_route_capture_through_aux_vision) keep the policy and the IO
separated so each can be tested in isolation. Both fail open: if the
config import fails, if the aux call raises, or if the analysis is
empty, we fall back to the existing multimodal envelope so the
behaviour is at worst the pre-fix status quo. Temp screenshot files
are cleaned up unconditionally in a finally block — even on aux call
failure — to avoid leaving residue under cache/vision/.

The end-to-end regression for #24015 is added in the next commit.
2026-05-21 17:38:19 -07:00
..
computer_use fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015) 2026-05-21 17:38:19 -07:00
environments fix(windows): drop duplicate creationflags kwarg in LocalEnvironment._run_bash 2026-05-19 23:17:52 -04:00
neutts_samples refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow 2026-03-17 02:33:12 -07:00
__init__.py Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 08:48:54 +09:00
ansi_strip.py fix: strip ANSI at the source — clean terminal output before it reaches the model 2026-03-23 07:43:12 -07:00
approval.py fix(approval): surface pending-approval state with explicit marker visible to LLM 2026-05-18 19:37:16 -07:00
binary_extensions.py fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening 2026-04-08 02:24:32 -07:00
browser_camofox.py feat: auto-launch Chromium-family browser for CDP 2026-05-19 22:34:05 -07:00
browser_camofox_state.py feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400) (#4419) 2026-04-01 04:18:50 -07:00
browser_cdp_tool.py feat: auto-launch Chromium-family browser for CDP 2026-05-19 22:34:05 -07:00
browser_dialog_tool.py feat: auto-launch Chromium-family browser for CDP 2026-05-19 22:34:05 -07:00
browser_supervisor.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
browser_tool.py feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection (#27845) 2026-05-18 16:34:24 +05:30
budget_config.py chore: remove Atropos RL environments and tinker-atropos integration (#26106) 2026-05-15 10:36:38 +05:30
checkpoint_manager.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
clarify_gateway.py feat(gateway): wire clarify tool with inline keyboard buttons on Telegram (#24199) 2026-05-12 16:33:33 -07:00
clarify_tool.py refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
code_execution_tool.py fix(windows): suppress console window flash on subprocess spawns 2026-05-16 23:05:27 -07:00
computer_use_tool.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
credential_files.py fix(gateway): translate inbound document host paths to container paths for Docker backend 2026-05-07 05:02:26 -07:00
cronjob_tools.py fix(cron): allow emoji ZWJ sequences in prompts 2026-05-19 00:10:43 -07:00
debug_helpers.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
delegate_tool.py fix(delegation): preserve configured_provider name when runtime returns 'custom' 2026-05-17 11:40:05 -07:00
discord_tool.py feat: add Discord message deletion action 2026-05-07 05:11:09 -07:00
env_passthrough.py refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304) 2026-04-28 23:17:39 -07:00
feishu_doc_tool.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
feishu_drive_tool.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
file_operations.py fix(lint): skip per-file shell linter when LSP will handle the file (#29054) 2026-05-20 01:46:40 -05:00
file_state.py feat(delegate): cross-agent file state coordination for concurrent subagents (#13718) 2026-04-21 16:41:26 -07:00
file_tools.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
fuzzy_match.py chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926) 2026-05-11 11:03:29 -07:00
homeassistant_tool.py fix: clean up description escaping, add string-data tests 2026-04-13 04:45:07 -07:00
image_generation_tool.py feat(image-gen): actionable setup message when no FAL backend is reachable (#26222) 2026-05-15 01:33:13 -07:00
interrupt.py fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907) 2026-04-17 20:39:25 -07:00
kanban_tools.py feat(kanban): stamp originating ACP session_id on tasks 2026-05-18 21:15:21 -07:00
lazy_deps.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
managed_tool_gateway.py fix(tools): add debug logging for token refresh and tighten domain check 2026-04-02 12:40:03 +11:00
mcp_oauth.py fix(security): guard os.chmod(parent) against / and top-level dirs 2026-05-20 22:56:55 -07:00
mcp_oauth_manager.py fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226) 2026-05-07 05:35:33 -07:00
mcp_tool.py fix(mcp): use module-level time so test patches do not race background sleepers 2026-05-17 13:33:26 -07:00
memory_tool.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
microsoft_graph_auth.py feat(msgraph): add auth and client foundation 2026-05-08 09:27:26 -07:00
microsoft_graph_client.py fix(msgraph): stream download_to_file body instead of buffering 2026-05-08 09:27:26 -07:00
mixture_of_agents_tool.py chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940) 2026-05-11 11:20:58 -07:00
neutts_synth.py fix(tts): document NeuTTS provider and align install guidance (#1903) 2026-03-18 02:55:30 -07:00
openrouter_client.py refactor: route ad-hoc LLM consumers through centralized provider router 2026-03-11 20:02:36 -07:00
osv_check.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
patch_parser.py fix(lint): skip per-file shell linter when LSP will handle the file (#29054) 2026-05-20 01:46:40 -05:00
path_security.py refactor: extract shared helpers to deduplicate repeated code patterns (#7917) 2026-04-11 13:59:52 -07:00
process_registry.py fix(windows): hide local subprocess consoles 2026-05-19 11:23:15 -07:00
registry.py security: sanitize tool error strings before injecting into model context (#26823) 2026-05-16 00:57:39 -07:00
schema_sanitizer.py fix(xai-responses): strip enum values containing '/' from tool schemas 2026-05-18 10:37:35 -07:00
send_message_tool.py fix(tests): catch up 25 stale tests after recent merges (#28626) 2026-05-19 01:28:32 -07:00
session_search_tool.py feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590) 2026-05-17 23:28:45 -07:00
skill_manager_tool.py fix(skills): prune dependency/venv dirs from all skill scanners (#30042) 2026-05-21 14:18:02 -07:00
skill_provenance.py fix(curator): only mark agent-created for background-review sediment (#19621) 2026-05-04 02:42:16 -07:00
skill_usage.py fix(skills): prune dependency/venv dirs from all skill scanners (#30042) 2026-05-21 14:18:02 -07:00
skills_guard.py feat(skills-hub): add huggingface/skills as trusted default tap (#2549) 2026-05-15 01:25:33 -07:00
skills_hub.py fix(skills): prune dependency/venv dirs from all skill scanners (#30042) 2026-05-21 14:18:02 -07:00
skills_sync.py fix(skills): prune dependency/venv dirs from all skill scanners (#30042) 2026-05-21 14:18:02 -07:00
skills_tool.py fix(skills): prune dependency/venv dirs from all skill scanners (#30042) 2026-05-21 14:18:02 -07:00
slash_confirm.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
terminal_tool.py fix(gateway): route background-process notifications into Telegram DM topics 2026-05-18 22:03:12 -07:00
tirith_security.py fix(tirith): suppress .app lookalike_tld false positives in warn verdicts 2026-05-18 10:20:07 -07:00
todo_tool.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
tool_backend_helpers.py fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
tool_output_limits.py feat(skills): add design-md skill for Google's DESIGN.md spec (#14876) 2026-04-23 21:51:19 -07:00
tool_result_storage.py fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913) 2026-05-09 18:44:58 -07:00
transcription_tools.py fix(xai-http): preserve ~/.hermes/.env fallback and XAI_STT_BASE_URL precedence 2026-05-15 12:11:32 -07:00
tts_tool.py Add opt-in xAI TTS speech tag pauses 2026-05-20 09:22:28 -07:00
url_safety.py fix(url_safety): block IPv4-mapped IPv6 addresses to prevent SSRF bypass 2026-05-18 10:51:15 -07:00
video_generation_tool.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
vision_tools.py chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940) 2026-05-11 11:20:58 -07:00
voice_mode.py fix(voice): chunk oversized CLI recordings 2026-05-21 14:17:39 -07:00
web_tools.py feat(web): add xAI Web Search provider plugin 2026-05-19 19:27:34 -07:00
website_policy.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
x_search_tool.py fix(x_search): surface degraded results + validate dates 2026-05-21 02:38:45 +05:30
xai_http.py feat(web): add xAI Web Search provider plugin 2026-05-19 19:27:34 -07:00
yuanbao_tools.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00