mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-24 10:52:21 +00:00
* feat(computer_use): disable cua-driver telemetry by default, add opt-in cua-driver ships anonymous PostHog usage telemetry ENABLED by default upstream (fires cua_driver_install / cua_driver_doctor events to eu.i.posthog.com). Hermes now disables it for our users unless they explicitly opt in. - New config key `computer_use.cua_telemetry` (default false) in DEFAULT_CONFIG. - `cua_backend.cua_driver_child_env()` injects `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is disabled (the default); leaves the var untouched on opt-in so the driver uses its own default. Reads config fail-safe — any error defaults to telemetry off. - Routed every cua-driver spawn site through the policy: MCP backend (StdioServerParameters env), `cua_driver_update_check`, doctor's health_report Popen, the install.sh/install.ps1 runner, and the `--version` / status probes. - Docs: new Telemetry subsection in computer-use.md (EN). - Tests: tests/computer_use/test_cua_telemetry.py — default disables, explicit-false disables, opt-in leaves var untouched, config-failure fails safe, inherited-enabled is overridden off. Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with the var=0 the driver reports "telemetry: disabled via CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs "sending event: cua_driver_doctor". 213 computer_use + install tests green. * fix(dashboard): fold computer_use config category into agent tab The new computer_use.cua_telemetry key created a single-field dashboard config category, tripping test_no_single_field_categories (web_server's invariant that categories with <2 fields must be merged to avoid tab sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the existing onboarding/telegram single-field folds.
457 lines
20 KiB
Markdown
457 lines
20 KiB
Markdown
---
|
||
title: Computer Use
|
||
sidebar_position: 16
|
||
---
|
||
|
||
# Computer Use
|
||
|
||
Hermes Agent can drive your desktop — clicking, typing, scrolling,
|
||
dragging — in the **background** on **macOS, Windows, and Linux**. Your
|
||
cursor doesn't move, keyboard focus doesn't change, and your virtual
|
||
desktops / Spaces don't switch on you. You and the agent co-work on the
|
||
same machine.
|
||
|
||
Unlike most computer-use integrations, this works with **any tool-capable
|
||
model** — Claude, GPT, Gemini, or an open model on a local
|
||
OpenAI-compatible endpoint. There's no Anthropic-native schema to worry
|
||
about.
|
||
|
||
## How it works
|
||
|
||
The `computer_use` toolset speaks MCP over stdio to
|
||
[`cua-driver`](https://github.com/trycua/cua), an open-source background
|
||
computer-use driver. Each platform uses the appropriate accessibility +
|
||
input stack under the hood:
|
||
|
||
| Platform | Accessibility tree | Input dispatch |
|
||
|---|---|---|
|
||
| macOS | AX (private SkyLight SPIs) | `SLPSPostEventRecordTo` — pid-scoped, no cursor warp |
|
||
| Windows | UIAutomation | `SendInput` + `PostMessage` — no focus steal |
|
||
| Linux | AT-SPI (X11 + Wayland) | XTest (X11) / virtual-keyboard (Wayland) |
|
||
|
||
The result is the same on every platform: the agent can read the
|
||
accessibility tree of any visible window AND post synthesized events
|
||
without bringing it to front, switching virtual desktops, or moving the
|
||
real OS cursor.
|
||
|
||
For the underlying contract — *why* background mode matters, the
|
||
no-foreground invariant, click-dispatch internals — see
|
||
**[cua.ai/docs/explanation/the-no-foreground-contract](https://cua.ai/docs/explanation/the-no-foreground-contract)**.
|
||
|
||
## Enabling
|
||
|
||
Pick whichever path is most convenient — both run the same upstream
|
||
installer:
|
||
|
||
**Option 1: dedicated CLI command (most direct).**
|
||
|
||
```
|
||
hermes computer-use install
|
||
```
|
||
|
||
This fetches and runs the upstream cua-driver installer — `install.sh`
|
||
on macOS/Linux, `install.ps1` on Windows. Use `hermes computer-use
|
||
status` to verify the install.
|
||
|
||
**Option 2: enable the toolset interactively.**
|
||
|
||
1. Run `hermes tools`, pick `🖱️ Computer Use (macOS/Windows/Linux)`.
|
||
2. The setup runs the upstream installer (same as Option 1).
|
||
|
||
After installing, regardless of which path you took, grant the
|
||
platform-appropriate prereqs:
|
||
|
||
| Platform | Prereqs |
|
||
|---|---|
|
||
| **macOS** | System Settings → Privacy & Security → **Accessibility** + **Screen Recording** → allow your terminal (or Hermes app). `hermes computer-use doctor` will tell you which permission is missing. |
|
||
| **Windows** | None at install time. If you're driving over SSH (not RDP / console), you need the autostart pattern — see [cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh) for the Session 0 ↔ Session 1+ proxy. |
|
||
| **Linux** | A reachable display server: `DISPLAY` set for X11, or `XDG_SESSION_TYPE=wayland`. Wayland sessions need an XWayland bridge for capture. AT-SPI must be on (default on GNOME/KDE/Xfce). |
|
||
|
||
Then start a session with the toolset enabled:
|
||
|
||
```
|
||
hermes -t computer_use chat
|
||
```
|
||
|
||
or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.
|
||
|
||
## `hermes computer-use doctor` — your first triage stop
|
||
|
||
`hermes computer-use doctor` runs cua-driver's structured
|
||
`health_report` MCP tool and prints a per-check matrix. It's the single
|
||
fastest way to find out *why* an action isn't working.
|
||
|
||
```
|
||
$ hermes computer-use doctor
|
||
⚠️ cua-driver 0.5.8 on darwin — degraded
|
||
✅ binary_version: cua-driver 0.5.8
|
||
✅ platform_supported: macOS 26.4.1 (arm64)
|
||
✅ session_active: MCP session is active.
|
||
❌ bundle_identity: Process has no CFBundleIdentifier.
|
||
→ Run the binary inside CuaDriver.app so TCC grants attribute correctly.
|
||
✅ tcc_accessibility: Accessibility is granted.
|
||
✅ tcc_screen_recording: Screen Recording is granted.
|
||
✅ ax_capability: AX is trusted and reachable.
|
||
✅ screen_capture_capability: ScreenCaptureKit reachable; 1 display(s) shareable.
|
||
```
|
||
|
||
- **Exit code 0** when overall is `ok` — everything's wired up.
|
||
- **Exit code 1** when `degraded` or `failed` — at least one check failed; the hint on each failure tells you what to fix.
|
||
- **Exit code 2** when the cua-driver binary itself isn't reachable.
|
||
|
||
Useful flags:
|
||
|
||
- `--include CHECK` — run only the listed checks (repeat for multiple)
|
||
- `--skip CHECK` — skip a check (wins over `--include`)
|
||
- `--json` — emit the raw structured payload, same shape as the
|
||
`tools/call health_report` MCP response
|
||
|
||
The check matrix is platform-aware: `bundle_identity` / `tcc_*` are
|
||
`skip` on Windows + Linux because those concepts don't apply.
|
||
`ax_capability` checks AX on macOS, UIA on Windows, AT-SPI on Linux —
|
||
each with the right diagnostic hint when it can't reach.
|
||
|
||
## The agent cursor and sessions
|
||
|
||
When the agent acts, you'll see a **tinted overlay cursor** glide
|
||
across the screen to where each click / type / scroll lands. The real
|
||
OS cursor never moves — the overlay is a visual cue that says "the
|
||
agent is acting here." Each Hermes run declares its own cua-driver
|
||
**session id** (something like `hermes-3a7b9c14d2e8`); the cursor's
|
||
identity is keyed to that session, so concurrent runs / subagents each
|
||
get their own cursor without stepping on each other.
|
||
|
||
Tune the cursor with `cua-driver`'s CLI flags or the runtime
|
||
`set_agent_cursor_style` MCP tool — see
|
||
[cua.ai/docs/how-to-guides/driver/personalize-cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor)
|
||
for the full menu (built-in `arrow` vs `teardrop` silhouette, custom
|
||
SVG / PNG / ICO via `--cursor-icon`, runtime gradient colors, bloom
|
||
halo).
|
||
|
||
## Going deeper — the cua-driver skill pack
|
||
|
||
Hermes intentionally keeps its skill (`skills/computer-use/SKILL.md`)
|
||
focused on the Hermes-side `computer_use` action vocabulary — the
|
||
single source of truth the agent loads. For the deeper material —
|
||
platform-specific deep dives, recording semantics, browser page
|
||
interaction — point your agent harness at the cua-driver skill pack
|
||
the cua-driver team ships and maintains directly:
|
||
|
||
```
|
||
cua-driver skills install
|
||
```
|
||
|
||
This symlinks the pack into your agent harness' skill directory. After
|
||
running it, an agent gets access to:
|
||
|
||
| File | Topic |
|
||
|---|---|
|
||
| `SKILL.md` | The cross-platform core (snapshot invariant, no-foreground contract, click dispatch, AX-tree mechanics) |
|
||
| `MACOS.md` | macOS specifics: no-foreground contract, AXMenuBar navigation, SkyLight click dispatch, Apple Events JS bridge |
|
||
| `WINDOWS.md` | Windows specifics: UIA tree, UWP / `ApplicationFrameHost` hosting, Session 0 isolation, autostart pattern |
|
||
| `LINUX.md` | Linux specifics: AT-SPI tree, X11 / Wayland, terminal-emulator detection |
|
||
| `RECORDING.md` | Trajectory + video recording semantics |
|
||
| `WEB_APPS.md` | Browser-page interaction tips |
|
||
| `TESTS.md` | Replay-by-trajectory workflow |
|
||
|
||
These are **platform deep dives, not duplicates of the Hermes skill** —
|
||
when an agent reports "on Windows, my click landed on the wrong
|
||
element," it reads `WINDOWS.md` for the UIA / UWP context that
|
||
explains why and what to do differently.
|
||
|
||
`cua-driver skills status` shows what's installed and which agent
|
||
harnesses it's linked into. Today the autodetect list covers Claude
|
||
Code, Codex, OpenCode, OpenClaw, and Antigravity; **Hermes
|
||
autodetection is planned as a follow-up in `trycua/cua`** — until
|
||
then, run `cua-driver skills install` once and point your harness at
|
||
the resulting `~/.cua-driver/skills/cua-driver` directory (or symlink
|
||
it into your usual skill space).
|
||
|
||
## Quick example
|
||
|
||
User prompt: *"Find my latest email from Stripe and summarise what they want me to do."*
|
||
|
||
The agent's plan (this is the same shape on macOS / Windows / Linux —
|
||
the model substitutes the platform's idiomatic shortcut and app name):
|
||
|
||
1. `computer_use(action="capture", mode="som", app="Mail")` — gets a
|
||
screenshot of the email app with every sidebar item, toolbar button,
|
||
and message row numbered.
|
||
2. `computer_use(action="click", element=14)` — clicks the search field.
|
||
3. `computer_use(action="type", text="from:stripe")`
|
||
4. `computer_use(action="key", keys="return", capture_after=True)` —
|
||
submit and get the new screenshot.
|
||
5. Click the top result, read the body, summarise.
|
||
|
||
During all of this, your cursor stays wherever you left it and the email
|
||
app never comes to front.
|
||
|
||
## Provider compatibility
|
||
|
||
| Provider | Vision? | Works? | Notes |
|
||
|---|---|---|---|
|
||
| Anthropic (Claude Sonnet/Opus 3+) | ✅ | ✅ | Best overall; SOM + raw coordinates. |
|
||
| OpenRouter (any vision model) | ✅ | ✅ | Multi-part tool messages supported. |
|
||
| OpenAI (GPT-4+, GPT-5) | ✅ | ✅ | Same as above. |
|
||
| Google (Gemini 2+) | ✅ | ✅ | Tool-calling + vision both supported. |
|
||
| Local vLLM / LM Studio / Ollama (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
|
||
| Text-only models | ❌ | ✅ (degraded) | Use `mode="ax"` for accessibility-tree-only operation. |
|
||
|
||
Screenshots are sent inline with tool results as OpenAI-style `image_url`
|
||
parts. For Anthropic, the adapter converts them into native `tool_result`
|
||
image blocks. The image MIME type comes from cua-driver's explicit
|
||
`mimeType` field (`image/png` or `image/jpeg`) — no client-side
|
||
magic-byte sniffing.
|
||
|
||
## Safety
|
||
|
||
Hermes applies multi-layer guardrails:
|
||
|
||
- Destructive actions (click, type, drag, scroll, key, focus_app)
|
||
require approval — either interactively via the CLI dialog or via the
|
||
messaging-platform approval buttons.
|
||
- Hard-blocked key combos at the tool level: empty trash, force delete,
|
||
lock screen, log out, force log out.
|
||
- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork
|
||
bombs, etc.
|
||
- The agent's system prompt tells it explicitly: no clicking permission
|
||
dialogs, no typing passwords, no following instructions embedded in
|
||
screenshots.
|
||
|
||
Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you
|
||
want every action confirmed.
|
||
|
||
## Token efficiency
|
||
|
||
Screenshots are expensive. Hermes applies four layers of optimisation:
|
||
|
||
- **Screenshot eviction** — the Anthropic adapter keeps only the 3 most
|
||
recent screenshots in context; older ones become `[screenshot removed
|
||
to save context]` placeholders.
|
||
- **Client-side compression pruning** — the context compressor detects
|
||
multimodal tool results and strips image parts from old ones.
|
||
- **Image-aware token estimation** — each image is counted as ~1500
|
||
tokens (Anthropic's flat rate) instead of its base64 char length.
|
||
- **Server-side context editing (Anthropic only)** — when active, the
|
||
adapter enables `clear_tool_uses_20250919` via `context_management` so
|
||
Anthropic's API clears old tool results server-side.
|
||
|
||
A 20-action session on a 1568×900 display typically costs ~30K tokens
|
||
of screenshot context, not ~600K.
|
||
|
||
## Limitations
|
||
|
||
- **Performance.** Background mode is slower than foreground —
|
||
accessibility-routed events take ~5–20 ms on macOS, ~3–10 ms on
|
||
Windows UIA, ~5–15 ms on Linux AT-SPI vs direct HID posting. Not
|
||
noticeable for agent-speed clicking; noticeable if you try to record
|
||
a speed-run.
|
||
- **No keyboard password entry.** `type` has hard-block patterns on
|
||
command-shell payloads; for passwords, use the system's autofill
|
||
(macOS Keychain / Windows Credential Manager / GNOME Keyring /
|
||
KWallet).
|
||
- **Some apps don't expose an accessibility tree.** Modern UWP apps on
|
||
Windows, Electron < 28 on Linux, and a few macOS apps with custom
|
||
drawing (Logic, Final Cut, some games) have sparse or empty AX trees.
|
||
Fall back to pixel coordinates if the tree is empty — or skip the
|
||
task entirely.
|
||
- **Platform-specific deployment gotchas:**
|
||
- **macOS** uses private SkyLight SPIs. Apple can change them in any
|
||
OS update. Hermes warns when the installed cua-driver is older than
|
||
the version it was tested against.
|
||
- **Windows** SSH sessions run in **Session 0**, which has no
|
||
interactive desktop. Drive Hermes from inside the RDP / console
|
||
session, or set up cua-driver's autostart Scheduled Task —
|
||
[windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh)
|
||
has the recipe.
|
||
- **Linux** requires a reachable display server. Headless servers
|
||
need Xvfb (`Xvfb :99 -screen 0 1920x1080x24`) before
|
||
`computer_use` can capture or inject events. Pure Wayland sessions
|
||
need an XWayland bridge for screen capture (cua-driver's Wayland
|
||
inject path handles input independently).
|
||
|
||
For cross-platform GUI automation without the desktop overhead (and
|
||
without TCC / Session 0 / X11 setup), the `browser` toolset uses a
|
||
real headless Chromium and is the right answer for web-only tasks.
|
||
|
||
## Configuration
|
||
|
||
Override the driver binary path (tests / CI / local builds):
|
||
|
||
```
|
||
HERMES_CUA_DRIVER_CMD=/path/to/your/cua-driver
|
||
```
|
||
|
||
Swap the backend entirely (for testing):
|
||
|
||
```
|
||
HERMES_COMPUTER_USE_BACKEND=noop # records calls, no side effects
|
||
```
|
||
|
||
### Telemetry
|
||
|
||
cua-driver ships with anonymous usage telemetry (PostHog) enabled by default
|
||
upstream. **Hermes disables it for you** — on every cua-driver invocation
|
||
(the MCP backend, `status`, `doctor`, and install) Hermes sets
|
||
`CUA_DRIVER_RS_TELEMETRY_ENABLED=0` in the driver's environment.
|
||
|
||
To opt back in (let cua-driver use its own default and send telemetry), set
|
||
this in `config.yaml`:
|
||
|
||
```yaml
|
||
computer_use:
|
||
cua_telemetry: true # default: false (telemetry off)
|
||
```
|
||
|
||
When it's on, `hermes computer-use doctor` reports `telemetry: enabled`;
|
||
when off (the default), it reports `telemetry: disabled via
|
||
CUA_DRIVER_RS_TELEMETRY_ENABLED`.
|
||
|
||
## Testing against a local cua-driver build
|
||
|
||
When you're developing cua-driver itself — or want to test an
|
||
unreleased fix — point Hermes at a binary you built from source instead
|
||
of the published release. Hermes resolves the driver with
|
||
`shutil.which("cua-driver")` and **does not enforce
|
||
`HERMES_CUA_DRIVER_VERSION`**, so a local build (reported as
|
||
`0.0.0-local-*`) is accepted as-is. Two approaches:
|
||
|
||
### Option A — `install-local` (build + put it on PATH)
|
||
|
||
From your `trycua/cua` checkout, run the upstream local installer. It
|
||
builds the Rust backend in release mode and drops `cua-driver` into the
|
||
same install layout the production installer uses, adding its bin dir
|
||
to your PATH:
|
||
|
||
```powershell
|
||
# Windows (PowerShell), from the cua repo root
|
||
./libs/cua-driver/scripts/install-local.ps1 -NoAutoStart
|
||
```
|
||
|
||
```bash
|
||
# macOS / Linux, from the cua repo root (defaults to a debug build without --release)
|
||
./libs/cua-driver/scripts/install-local.sh --release
|
||
```
|
||
|
||
- Windows stages the build under `%USERPROFILE%\.cua-driver\packages\…`
|
||
and junctions
|
||
`%LOCALAPPDATA%\Programs\Cua\cua-driver\bin` (added to your User
|
||
PATH) to it. macOS/Linux symlinks `cua-driver` into `~/.local/bin`
|
||
(override with `--bin-dir <path>`).
|
||
- `-NoAutoStart` skips registering the `cua-driver-serve` logon daemon
|
||
— you don't need it for Hermes testing (see notes).
|
||
|
||
Then open a fresh shell (so the PATH change is visible) and confirm:
|
||
|
||
```
|
||
cua-driver --version # local builds report 0.0.0-local-release
|
||
# Windows: (Get-Command cua-driver).Source
|
||
# macOS/Linux: which cua-driver
|
||
```
|
||
|
||
### Option B — point Hermes straight at the built binary (fastest loop)
|
||
|
||
Skip the install ceremony entirely: `cargo build` and set
|
||
`HERMES_CUA_DRIVER_CMD` to the resulting binary. Best for rapid
|
||
edit/build/test.
|
||
|
||
```bash
|
||
cargo build -p cua-driver # add --release for a release build; run from libs/cua-driver/rust
|
||
```
|
||
|
||
```
|
||
# Windows (.env)
|
||
HERMES_CUA_DRIVER_CMD=C:\path\to\cua\libs\cua-driver\rust\target\debug\cua-driver.exe
|
||
# macOS / Linux (.env)
|
||
HERMES_CUA_DRIVER_CMD=/path/to/cua/libs/cua-driver/rust/target/debug/cua-driver
|
||
```
|
||
|
||
### Confirm Hermes is using your build
|
||
|
||
- `hermes computer-use status` prints the resolved binary path and
|
||
version.
|
||
- `hermes computer-use doctor` confirms the binary is reachable and
|
||
exercises the full MCP path end-to-end.
|
||
- In a session, `computer_use(action="capture")` exercises the spawned
|
||
`cua-driver mcp` child process.
|
||
|
||
### Notes & gotchas
|
||
|
||
- **Hermes spawns its own `cua-driver mcp` child over stdio** — it does
|
||
*not* attach to the long-running `cua-driver serve` autostart daemon
|
||
or its named pipe. So the scheduled task / LaunchAgent is unnecessary
|
||
for testing (`-NoAutoStart` is fine). The autostart daemon and the
|
||
Windows UIAccess worker (`cua-driver-uia.exe`) only matter for
|
||
foreground-safe input on some apps (e.g. WPF); the standard tool
|
||
surface works through the stdio child. On Windows SSH sessions, the
|
||
autostart pattern IS needed — see the Limitations section.
|
||
- **Locked binary on Windows.** A running `cua-driver-serve` daemon can
|
||
hold `cua-driver.exe` and block an overwrite on rebuild.
|
||
`install-local.ps1` renames the locked binary out of the way
|
||
automatically; if you `cargo build` manually (Option B), stop it
|
||
first with `cua-driver autostart disable` (or `schtasks /End /TN
|
||
cua-driver-serve`).
|
||
- **Rebuild loop.** After editing cua-driver source, re-run
|
||
`install-local` (rebuilds, restages, flips the `current` junction)
|
||
for Option A, or just re-`cargo build` for Option B — no Hermes
|
||
change needed either way.
|
||
- **Local builds skip the version check.** Hermes warns when the
|
||
installed cua-driver is older than its per-OS tested baseline, but
|
||
exempts `0.0.0-local-*` dev builds — so your local build never
|
||
triggers that warning.
|
||
|
||
## Troubleshooting
|
||
|
||
**First action when anything's off: run `hermes computer-use doctor`.**
|
||
The structured per-check matrix tells you (and any agent helping you
|
||
debug) exactly what's wrong.
|
||
|
||
Specific failure modes the doctor doesn't catch:
|
||
|
||
**`computer_use backend unavailable: cua-driver is not installed`** —
|
||
Run `hermes computer-use install` to fetch the cua-driver binary, or
|
||
run `hermes tools` and enable the Computer Use toolset.
|
||
|
||
**Clicks seem to have no effect** — Capture and verify. A modal you
|
||
didn't see may be blocking input. Dismiss it with `escape` or the close
|
||
button.
|
||
|
||
**Element indices are stale** — SOM indices are only valid until the
|
||
next `capture`. Re-capture after any state-changing action. The
|
||
wrapper carries opaque `element_token`s for stale detection — you'll
|
||
see an explicit error rather than a wrong click.
|
||
|
||
**"blocked pattern in type text"** — The text you tried to `type`
|
||
matches the dangerous-shell-pattern list. Break the command up or
|
||
reconsider.
|
||
|
||
**Empty captures on Linux** — `DISPLAY` not set, or you're on pure
|
||
Wayland without an XWayland bridge. `hermes computer-use doctor` will
|
||
flag this as `ax_capability: fail` with a `Set DISPLAY (X11)…` hint.
|
||
|
||
**Empty captures on Windows over SSH** — You're in Session 0 (the
|
||
services session). Drive from RDP / console directly, or set up the
|
||
autostart pattern — see
|
||
[cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh).
|
||
|
||
## See also
|
||
|
||
- **Hermes-side skill** — `skills/computer-use/SKILL.md` — teaches the
|
||
Hermes `computer_use` action vocabulary; this is what the agent loads.
|
||
- **cua-driver skill pack** — for platform-specific deep dives
|
||
(macOS no-foreground contract, Windows UIA + Session 0, Linux AT-SPI
|
||
+ X11/Wayland, recording, browser pages), run
|
||
`cua-driver skills install` and read `MACOS.md` / `WINDOWS.md` /
|
||
`LINUX.md` / `RECORDING.md` / `WEB_APPS.md`. Once `cua-driver skills
|
||
install` autodetects Hermes (planned follow-up), this happens
|
||
automatically on install.
|
||
- **cua.ai/docs** — the cua-driver project's documentation:
|
||
- [What is computer use?](https://cua.ai/docs/explanation/what-is-computer-use) — concept intro
|
||
- [The no-foreground contract](https://cua.ai/docs/explanation/the-no-foreground-contract) — *why* background mode matters
|
||
- [Install reference](https://cua.ai/docs/how-to-guides/driver/install) — cross-platform install details
|
||
- [Personalize the agent cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor) — built-in shapes, custom assets, runtime overrides
|
||
- [Drive Windows over SSH](https://cua.ai/docs/how-to-guides/driver/windows-ssh) — the Session 0 → Session 1+ autostart pattern
|
||
- [Keep cua-driver running](https://cua.ai/docs/how-to-guides/driver/keep-running) — autostart / daemon lifecycle
|
||
- [Connect your agent](https://cua.ai/docs/how-to-guides/driver/connect-your-agent) — register cua-driver with various harnesses (Hermes among them)
|
||
- [cua-driver source (trycua/cua)](https://github.com/trycua/cua)
|
||
- [Browser automation](./browser.md) for cross-platform web tasks where you don't need to drive native apps.
|