hermes-agent/agent
ddupont e31f3b3c56 feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.

## cua_backend.py

**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.

**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.

**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.

**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).

**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.

**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.

**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.

**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.

## schema.py

Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.

## tool.py

Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.

## agent/display.py + run_agent.py

Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
2026-05-08 11:07:38 -07:00
..
transports feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path 2026-05-05 13:40:01 -07:00
__init__.py
account_usage.py
anthropic_adapter.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
auxiliary_client.py refactor(gmi): move User-Agent to profile.default_headers 2026-05-08 03:22:11 -07:00
bedrock_adapter.py fix(bedrock): preserve reasoningContent across converse normalization 2026-05-07 05:17:16 -07:00
codex_responses_adapter.py
context_compressor.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
context_engine.py
context_references.py
copilot_acp_client.py fix(oauth,gateway): monotonic deadlines for polling/timeout loops 2026-05-07 05:09:39 -07:00
credential_pool.py fix(auth): shorten credential 401 cooldown 2026-05-07 06:15:33 -07:00
credential_sources.py
curator.py
curator_backup.py
display.py feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection 2026-05-08 11:07:38 -07:00
error_classifier.py
file_safety.py
gemini_cloudcode_adapter.py
gemini_native_adapter.py
gemini_schema.py
google_code_assist.py
google_oauth.py
i18n.py feat(i18n): add Turkish (tr) locale 2026-05-05 17:29:12 -07:00
image_gen_provider.py
image_gen_registry.py
image_routing.py fix(image-routing): sniff magic bytes for image MIME, ignore misleading suffix 2026-05-07 05:58:11 -07:00
insights.py
lmstudio_reasoning.py
manual_compression_feedback.py
memory_manager.py fix: salvage batch — compaction guidance, memory authority, cache eviction after compression 2026-05-05 22:33:45 -07:00
memory_provider.py docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings 2026-05-05 13:33:49 -07:00
model_metadata.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
models_dev.py fix(models): prefer image modalities for vision routing 2026-05-07 05:54:12 -07:00
moonshot_schema.py
nous_rate_guard.py
onboarding.py
prompt_builder.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
prompt_caching.py
rate_limit_tracker.py
redact.py feat(security): enable secret redaction by default (#17691, #20785) (#21193) 2026-05-07 05:10:33 -07:00
retry_utils.py
shell_hooks.py
skill_commands.py
skill_preprocessing.py
skill_utils.py
subdirectory_hints.py
think_scrubber.py fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184) 2026-05-05 04:33:38 -07:00
title_generator.py
tool_guardrails.py
trajectory.py
usage_pricing.py fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455) 2026-05-07 13:24:31 -07:00