mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-07 08:02:23 +00:00
fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1)
`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.
The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.
Fix:
- `capture(app=…)`: when the filter matches nothing, return a
`CaptureResult` with empty `app`/`elements` and a diagnostic
`window_title` pointing the caller at `list_apps` and noting the
localized-name convention. `_active_pid` / `_active_window_id` are left
untouched so a subsequent action doesn't inadvertently hit the wrong
process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
and let the existing `return ActionResult(ok=False, …, "No on-screen
window found for app …")` path fire instead of falsely reporting success
on the frontmost window.
This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046;
bugs 3 & 4 in #30032.
This commit is contained in:
parent
4cc18877c6
commit
5aa4727f34
2 changed files with 156 additions and 3 deletions
|
|
@ -393,11 +393,27 @@ class CuaDriverBackend(ComputerUseBackend):
|
|||
elements=[], app="", window_title="", png_bytes_len=0)
|
||||
|
||||
# Filter by app name (case-insensitive substring) if requested.
|
||||
# When the filter matches nothing, surface that explicitly instead of
|
||||
# silently capturing the frontmost window — on macOS the `app_name`
|
||||
# returned by list_windows is the localized name (e.g. "計算機"), so
|
||||
# `app="Calculator"` legitimately matches no windows on a non-English
|
||||
# system and the caller needs to retry with the localized name.
|
||||
if app:
|
||||
app_lower = app.lower()
|
||||
filtered = [w for w in windows if app_lower in w["app_name"].lower()]
|
||||
if filtered:
|
||||
windows = filtered
|
||||
if not filtered:
|
||||
return CaptureResult(
|
||||
mode=mode, width=0, height=0, png_b64=None,
|
||||
elements=[], app="",
|
||||
window_title=(
|
||||
f"<no on-screen window matched app={app!r}; "
|
||||
f"call list_apps to see available app names "
|
||||
f"(macOS reports localized names, e.g. '計算機' "
|
||||
f"instead of 'Calculator')>"
|
||||
),
|
||||
png_bytes_len=0,
|
||||
)
|
||||
windows = filtered
|
||||
|
||||
# Pick first on-screen window (sorted by z_index / z-order above).
|
||||
target = next((w for w in windows if not w["off_screen"]), windows[0])
|
||||
|
|
@ -658,7 +674,11 @@ class CuaDriverBackend(ComputerUseBackend):
|
|||
|
||||
app_lower = app.lower()
|
||||
matched = [w for w in windows if app_lower in w["app_name"].lower()]
|
||||
target = matched[0] if matched else (windows[0] if windows else None)
|
||||
# Don't silently fall back to the frontmost window when the filter
|
||||
# matches nothing — that hides the real failure (often a localized
|
||||
# macOS app name mismatch, e.g. caller passed "Calculator" but
|
||||
# list_windows returns "計算機").
|
||||
target = matched[0] if matched else None
|
||||
if target:
|
||||
self._active_pid = target["pid"]
|
||||
self._active_window_id = target["window_id"]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue