revert: computer-use cua-driver (PR #16919) (#16927)

Reverts PR #16919 (commits dad10a78d, 413ee1a28, b4a8031b2, afb958829)
which was merged prematurely. Restoring the pre-merge state so #14817
and #15328 can be revisited as standing PRs.

Reverted commits:
- afb958829 fix(computer-use): harden image-rejection fallback + AUTHOR_MAP
- b4a8031b2 fix(computer-use): unwrap _multimodal tool results
- 413ee1a28 feat(computer-use): background focus-safe backend
- dad10a78d feat(computer-use): cua-driver backend, universal any-model schema

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
This commit is contained in:
Teknium 2026-04-28 01:57:21 -07:00 committed by GitHub
parent cf0852f92e
commit e63364b8df
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
27 changed files with 35 additions and 3536 deletions

View file

@ -1,43 +0,0 @@
"""Computer use toolset — universal (any-model) macOS desktop control.
Architecture
------------
This toolset drives macOS apps through cua-driver's background computer-use
primitive (SkyLight private SPIs for focus-without-raise + pid-scoped event
posting). Unlike #4562's pyautogui backend, it does NOT steal the user's
cursor, keyboard focus, or Space the agent and the user can co-work on the
same machine.
Unlike #4562's Anthropic-native `computer_20251124` tool, the schema here is
a plain OpenAI function-calling schema that every tool-capable model can
drive. Vision models get SOM (set-of-mark) captures a screenshot with
numbered overlays on every interactable element plus the AX tree so they
click by element index instead of pixel coordinates. Non-vision models can
drive via the AX tree alone.
Wiring
------
* `tool.py` registers the `computer_use` tool via tools.registry.
* `backend.py` abstract `ComputerUseBackend`; swappable implementation.
* `cua_backend.py` default backend; speaks MCP over stdio to `cua-driver`.
* `schema.py` shared schema + docstring for the generic `computer_use`
tool. Model-agnostic.
* `capture.py` screenshot post-processing (PNG coercion, sizing, SOM
overlay if the backend did not).
The outer integration points (multimodal tool-result plumbing, screenshot
eviction in the Anthropic adapter, image-aware token estimation, the
COMPUTER_USE_GUIDANCE prompt block, approval hook, and the skill) live
alongside this package. See agent/anthropic_adapter.py and
agent/prompt_builder.py for the salvaged hunks from PR #4562.
"""
from __future__ import annotations
# Re-export the public surface so `from tools.computer_use import ...` works.
from tools.computer_use.tool import ( # noqa: F401
handle_computer_use,
set_approval_callback,
check_computer_use_requirements,
get_computer_use_schema,
)