mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
When the active main model has no vision capability — or when the user explicitly configured auxiliary.vision in config.yaml — sending the captured screenshot back to the main model in a multimodal tool-result envelope is the wrong move: it trips HTTP 404 / 400 at the provider boundary (e.g. 'No endpoints found that support image input') and the agent loop reports a hard tool failure for what should have been a simple capture. The reporter on #24015 hit this with: model: default: tencent/hy3-preview # no vision support provider: openrouter auxiliary: vision: provider: openrouter model: google/gemini-2.5-flash # explicitly configured …and observed: computer_use(action='capture', mode='som') → ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404] 🔌 Provider: openrouter Model: tencent/hy3-preview 📝 Error: HTTP 404: No endpoints found that support image input Fix: in tools/computer_use/tool.py::_capture_response, after a screenshot is captured (modes 'som' / 'vision'), consult the routing helper introduced earlier in this branch. When it says 'route to aux', materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze on it (which honours auxiliary.vision via the standard async_call_llm task='vision' router), and return a text-only JSON tool result that embeds the analysis alongside the existing AX/SOM index. The main model never sees the pixels — it sees an actionable text description plus the same set-of-mark element index it normally uses. The two new helpers (_should_route_through_aux_vision, _route_capture_through_aux_vision) keep the policy and the IO separated so each can be tested in isolation. Both fail open: if the config import fails, if the aux call raises, or if the analysis is empty, we fall back to the existing multimodal envelope so the behaviour is at worst the pre-fix status quo. Temp screenshot files are cleaned up unconditionally in a finally block — even on aux call failure — to avoid leaving residue under cache/vision/. The end-to-end regression for #24015 is added in the next commit. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| backend.py | ||
| cua_backend.py | ||
| schema.py | ||
| tool.py | ||
| vision_routing.py | ||