Merge remote-tracking branch 'origin/main' into bb/pets-merge

# Conflicts:
#	hermes_cli/commands.py
#	tui_gateway/server.py
This commit is contained in:
Brooklyn Nicholson 2026-06-23 19:05:22 -05:00
commit e495b33bf1
251 changed files with 23395 additions and 2720 deletions

View file

@ -625,7 +625,7 @@ Advanced per-platform knobs for throttling the outbound message batcher. Most us
| `HERMES_AGENT_NOTIFY_INTERVAL` | Gateway: interval in seconds between progress notifications on long-running agent turns. |
| `HERMES_CHECKPOINT_TIMEOUT` | Timeout for filesystem checkpoint creation in seconds (default: `30`). |
| `HERMES_EXEC_ASK` | Enable execution approval prompts in gateway mode (`true`/`false`) |
| `HERMES_ENABLE_PROJECT_PLUGINS` | Enable auto-discovery of repo-local plugins from `./.hermes/plugins/` for both the agent loader and the dashboard web server. Accepts the standard truthy set: `1` / `true` / `yes` / `on` (case-insensitive). Everything else — including `0`, `false`, `no`, `off`, and the empty string — is treated as **disabled** (default). Note: as of GHSA-5qr3-c538-wm9j (#29156) the dashboard web server refuses to auto-import a project plugin's Python `api` file even when this var is enabled — project plugins may extend the UI via static JS/CSS but their backend routes are only loaded when moved under `~/.hermes/plugins/`. |
| `HERMES_ENABLE_PROJECT_PLUGINS` | Enable auto-discovery of repo-local plugins from `./.hermes/plugins/` for both the agent loader and the dashboard web server. Accepts the standard truthy set: `1` / `true` / `yes` / `on` (case-insensitive). Everything else — including `0`, `false`, `no`, `off`, and the empty string — is treated as **disabled** (default). Note: as of GHSA-5qr3-c538-wm9j (#29156) and #43719, the dashboard web server refuses to auto-import Python `api` files from project or user-installed plugins — they may extend the UI via static JS/CSS, while backend routes are reserved for bundled plugins. |
| `HERMES_PLUGINS_DEBUG` | `1`/`true` to surface verbose plugin-discovery logs on stderr — directories scanned, manifests parsed, skip reasons, and full tracebacks on parse or `register()` failure. Aimed at plugin authors. |
| `HERMES_BACKGROUND_NOTIFICATIONS` | Background process notification mode in gateway: `all` (default), `result`, `error`, `off` |
| `HERMES_EPHEMERAL_SYSTEM_PROMPT` | Ephemeral system prompt injected at API-call time (never persisted to sessions) |

View file

@ -89,6 +89,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
| `/skills` | Search, install, inspect, or manage skills from online registries. Also the review surface for the skill write-approval gate: `/skills pending`, `/skills diff <id>`, `/skills approve <id>`, `/skills reject <id>`, `/skills approval on\|off`. See [Gating agent skill writes](/user-guide/features/skills#gating-agent-skill-writes-skillswrite_approval). |
| `/memory [pending\|approve\|reject\|approval]` | Review pending memory writes staged by the write-approval gate (`memory.write_approval`) and toggle the gate. See [Controlling memory writes](/user-guide/features/memory#controlling-memory-writes-write_approval). |
| `/bundles` | List configured skill bundles — `/<name>` slash aliases that preload several skills at once. Configure under `bundles:` in `~/.hermes/config.yaml`. See [Skill Bundles](/user-guide/features/skills#skill-bundles). |
| `/learn <what to learn from>` | Distill a reusable skill from anything you describe — a directory, a URL, the workflow you just walked the agent through, or pasted notes. Open-ended: the agent gathers the sources with its own tools and authors a `SKILL.md` following the house authoring standards. Works in the CLI, the messaging gateway, the TUI, and the dashboard Skills page. |
| `/cron` | Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove) |
| `/suggestions [accept\|dismiss N\|catalog\|clear]` (alias: `/suggest`) | Review suggested automations. Use `/suggestions` to list pending suggestions, `/suggestions accept <id>` to create the proposed automation, `/suggestions dismiss <id>` to reject one, `/suggestions catalog` to add curated starter automations, and `/suggestions clear` to clear resolved suggestion records. Accepted jobs preserve the current surface as the delivery origin. |
| `/blueprint [name] [slot=value ...]` (alias: `/bp`) | Set up an automation from a blueprint template. Bare `/blueprint` lists the catalog; `/blueprint <name>` starts a guided slot-filling flow on the next agent turn; `/blueprint <name> slot=value ...` creates the job directly. |
@ -249,7 +250,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
- `/skills` is **CLI-only for search/browse/install**; its write-approval review subcommands (`pending`, `approve`, `reject`, `diff`, `approval`) also work on messaging platforms when `skills.write_approval` is on. `/memory` works on **both** surfaces.
- `/verbose` is **CLI-only by default**, but can be enabled for messaging platforms by setting `display.tool_progress_command: true` in `config.yaml`. When enabled, it cycles the `display.tool_progress` mode and saves to config.
- `/sethome`, `/update`, `/restart`, `/approve`, `/deny`, `/topic`, `/platform`, and `/commands` are **messaging-only** commands.
- `/status`, `/version`, `/background`, `/queue`, `/steer`, `/voice`, `/reload-mcp`, `/reload-skills`, `/rollback`, `/debug`, `/fast`, `/footer`, `/curator`, `/kanban`, `/credits`, `/suggestions`, `/blueprint`, `/sessions`, and `/yolo` work in **both** the CLI and the messaging gateway.
- `/status`, `/version`, `/background`, `/queue`, `/steer`, `/voice`, `/reload-mcp`, `/reload-skills`, `/rollback`, `/debug`, `/fast`, `/footer`, `/curator`, `/kanban`, `/credits`, `/suggestions`, `/blueprint`, `/learn`, `/sessions`, and `/yolo` work in **both** the CLI and the messaging gateway.
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
- In the TUI, `/sessions` shows live sessions in the current TUI process. Use `/resume [name]` or `hermes --tui --resume <id-or-title>` for saved or closed transcripts.

View file

@ -3,36 +3,45 @@ title: Computer Use
sidebar_position: 16
---
# Computer Use (macOS)
# Computer Use
Hermes Agent can drive your Mac's desktop — clicking, typing, scrolling,
dragging — in the **background**. Your cursor doesn't move, keyboard focus
doesn't change, and macOS doesn't switch Spaces on you. You and the agent
co-work on the same machine.
Hermes Agent can drive your desktop — clicking, typing, scrolling,
dragging — in the **background** on **macOS, Windows, and Linux**. Your
cursor doesn't move, keyboard focus doesn't change, and your virtual
desktops / Spaces don't switch on you. You and the agent co-work on the
same machine.
Unlike most computer-use integrations, this works with **any tool-capable
model** — Claude, GPT, Gemini, or an open model on a local vLLM endpoint.
There's no Anthropic-native schema to worry about.
model** — Claude, GPT, Gemini, or an open model on a local
OpenAI-compatible endpoint. There's no Anthropic-native schema to worry
about.
## How it works
The `computer_use` toolset speaks MCP over stdio to [`cua-driver`](https://github.com/trycua/cua),
a macOS driver that uses SkyLight private SPIs (`SLEventPostToPid`,
`SLPSPostEventRecordTo`) and the `_AXObserverAddNotificationAndCheckRemote`
accessibility SPI to:
The `computer_use` toolset speaks MCP over stdio to
[`cua-driver`](https://github.com/trycua/cua), an open-source background
computer-use driver. Each platform uses the appropriate accessibility +
input stack under the hood:
- Post synthesized events directly to target processes — no HID event tap,
no cursor warp.
- Flip AppKit active-state without raising windows — no Space switching.
- Keep Chromium/Electron accessibility trees alive when windows are
occluded.
| Platform | Accessibility tree | Input dispatch |
|---|---|---|
| macOS | AX (private SkyLight SPIs) | `SLPSPostEventRecordTo` — pid-scoped, no cursor warp |
| Windows | UIAutomation | `SendInput` + `PostMessage` — no focus steal |
| Linux | AT-SPI (X11 + Wayland) | XTest (X11) / virtual-keyboard (Wayland) |
That combination is what OpenAI's Codex "background computer-use" ships.
cua-driver is the open-source equivalent.
The result is the same on every platform: the agent can read the
accessibility tree of any visible window AND post synthesized events
without bringing it to front, switching virtual desktops, or moving the
real OS cursor.
For the underlying contract — *why* background mode matters, the
no-foreground invariant, click-dispatch internals — see
**[cua.ai/docs/explanation/the-no-foreground-contract](https://cua.ai/docs/explanation/the-no-foreground-contract)**.
## Enabling
Pick whichever path is most convenient — both run the same upstream installer:
Pick whichever path is most convenient — both run the same upstream
installer:
**Option 1: dedicated CLI command (most direct).**
@ -40,63 +49,142 @@ Pick whichever path is most convenient — both run the same upstream installer:
hermes computer-use install
```
This fetches and runs the upstream cua-driver installer:
`curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh`.
Use `hermes computer-use status` to verify the install.
This fetches and runs the upstream cua-driver installer`install.sh`
on macOS/Linux, `install.ps1` on Windows. Use `hermes computer-use
status` to verify the install.
**Option 2: enable the toolset interactively.**
1. Run `hermes tools`, pick `🖱️ Computer Use (macOS)` → `cua-driver (background)`.
1. Run `hermes tools`, pick `🖱️ Computer Use (macOS/Windows/Linux)`.
2. The setup runs the upstream installer (same as Option 1).
After installing, regardless of which path you took:
After installing, regardless of which path you took, grant the
platform-appropriate prereqs:
3. Grant macOS permissions when prompted:
- **System Settings → Privacy & Security → Accessibility** → allow the
terminal (or Hermes app).
- **System Settings → Privacy & Security → Screen Recording** → allow
the same.
4. Start a session with the toolset enabled:
```
hermes -t computer_use chat
```
or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.
| Platform | Prereqs |
|---|---|
| **macOS** | System Settings → Privacy & Security → **Accessibility** + **Screen Recording** → allow your terminal (or Hermes app). `hermes computer-use doctor` will tell you which permission is missing. |
| **Windows** | None at install time. If you're driving over SSH (not RDP / console), you need the autostart pattern — see [cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh) for the Session 0 ↔ Session 1+ proxy. |
| **Linux** | A reachable display server: `DISPLAY` set for X11, or `XDG_SESSION_TYPE=wayland`. Wayland sessions need an XWayland bridge for capture. AT-SPI must be on (default on GNOME/KDE/Xfce). |
## Keeping cua-driver up to date
Then start a session with the toolset enabled:
The cua-driver project ships fixes regularly (e.g. v0.1.6 fixed a Safari
window-focus bug for UTM workflows). Hermes refreshes the binary in two
places so you don't get stuck on a stale release:
```
hermes -t computer_use chat
```
- **`hermes update`** — when you update Hermes itself, if `cua-driver` is
on PATH the upstream installer re-runs at the end of the update.
No-op for non-macOS users and for users without cua-driver installed.
- **`hermes computer-use install --upgrade`** — manual force-refresh.
Re-runs the upstream installer regardless of whether cua-driver is
already installed. Use this when you want the latest fix without
waiting for the next agent update.
or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.
`hermes computer-use status` shows the installed version next to the
binary path.
## `hermes computer-use doctor` — your first triage stop
`hermes computer-use doctor` runs cua-driver's structured
`health_report` MCP tool and prints a per-check matrix. It's the single
fastest way to find out *why* an action isn't working.
```
$ hermes computer-use doctor
⚠️ cua-driver 0.5.8 on darwin — degraded
✅ binary_version: cua-driver 0.5.8
✅ platform_supported: macOS 26.4.1 (arm64)
✅ session_active: MCP session is active.
❌ bundle_identity: Process has no CFBundleIdentifier.
→ Run the binary inside CuaDriver.app so TCC grants attribute correctly.
✅ tcc_accessibility: Accessibility is granted.
✅ tcc_screen_recording: Screen Recording is granted.
✅ ax_capability: AX is trusted and reachable.
✅ screen_capture_capability: ScreenCaptureKit reachable; 1 display(s) shareable.
```
- **Exit code 0** when overall is `ok` — everything's wired up.
- **Exit code 1** when `degraded` or `failed` — at least one check failed; the hint on each failure tells you what to fix.
- **Exit code 2** when the cua-driver binary itself isn't reachable.
Useful flags:
- `--include CHECK` — run only the listed checks (repeat for multiple)
- `--skip CHECK` — skip a check (wins over `--include`)
- `--json` — emit the raw structured payload, same shape as the
`tools/call health_report` MCP response
The check matrix is platform-aware: `bundle_identity` / `tcc_*` are
`skip` on Windows + Linux because those concepts don't apply.
`ax_capability` checks AX on macOS, UIA on Windows, AT-SPI on Linux —
each with the right diagnostic hint when it can't reach.
## The agent cursor and sessions
When the agent acts, you'll see a **tinted overlay cursor** glide
across the screen to where each click / type / scroll lands. The real
OS cursor never moves — the overlay is a visual cue that says "the
agent is acting here." Each Hermes run declares its own cua-driver
**session id** (something like `hermes-3a7b9c14d2e8`); the cursor's
identity is keyed to that session, so concurrent runs / subagents each
get their own cursor without stepping on each other.
Tune the cursor with `cua-driver`'s CLI flags or the runtime
`set_agent_cursor_style` MCP tool — see
[cua.ai/docs/how-to-guides/driver/personalize-cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor)
for the full menu (built-in `arrow` vs `teardrop` silhouette, custom
SVG / PNG / ICO via `--cursor-icon`, runtime gradient colors, bloom
halo).
## Going deeper — the cua-driver skill pack
Hermes intentionally keeps its skill (`skills/computer-use/SKILL.md`)
focused on the Hermes-side `computer_use` action vocabulary — the
single source of truth the agent loads. For the deeper material —
platform-specific deep dives, recording semantics, browser page
interaction — point your agent harness at the cua-driver skill pack
the cua-driver team ships and maintains directly:
```
cua-driver skills install
```
This symlinks the pack into your agent harness' skill directory. After
running it, an agent gets access to:
| File | Topic |
|---|---|
| `SKILL.md` | The cross-platform core (snapshot invariant, no-foreground contract, click dispatch, AX-tree mechanics) |
| `MACOS.md` | macOS specifics: no-foreground contract, AXMenuBar navigation, SkyLight click dispatch, Apple Events JS bridge |
| `WINDOWS.md` | Windows specifics: UIA tree, UWP / `ApplicationFrameHost` hosting, Session 0 isolation, autostart pattern |
| `LINUX.md` | Linux specifics: AT-SPI tree, X11 / Wayland, terminal-emulator detection |
| `RECORDING.md` | Trajectory + video recording semantics |
| `WEB_APPS.md` | Browser-page interaction tips |
| `TESTS.md` | Replay-by-trajectory workflow |
These are **platform deep dives, not duplicates of the Hermes skill**
when an agent reports "on Windows, my click landed on the wrong
element," it reads `WINDOWS.md` for the UIA / UWP context that
explains why and what to do differently.
`cua-driver skills status` shows what's installed and which agent
harnesses it's linked into. Today the autodetect list covers Claude
Code, Codex, OpenCode, OpenClaw, and Antigravity; **Hermes
autodetection is planned as a follow-up in `trycua/cua`** — until
then, run `cua-driver skills install` once and point your harness at
the resulting `~/.cua-driver/skills/cua-driver` directory (or symlink
it into your usual skill space).
## Quick example
User prompt: *"Find my latest email from Stripe and summarise what they want me to do."*
The agent's plan:
The agent's plan (this is the same shape on macOS / Windows / Linux —
the model substitutes the platform's idiomatic shortcut and app name):
1. `computer_use(action="capture", mode="som", app="Mail")` — gets a
screenshot of Mail with every sidebar item, toolbar button, and message
row numbered.
2. `computer_use(action="click", element=14)` — clicks the search field
(element #14 from the capture).
screenshot of the email app with every sidebar item, toolbar button,
and message row numbered.
2. `computer_use(action="click", element=14)` — clicks the search field.
3. `computer_use(action="type", text="from:stripe")`
4. `computer_use(action="key", keys="return", capture_after=True)` — submit
and get the new screenshot.
4. `computer_use(action="key", keys="return", capture_after=True)`
submit and get the new screenshot.
5. Click the top result, read the body, summarise.
During all of this, your cursor stays wherever you left it and Mail never
comes to front.
During all of this, your cursor stays wherever you left it and the email
app never comes to front.
## Provider compatibility
@ -105,29 +193,33 @@ comes to front.
| Anthropic (Claude Sonnet/Opus 3+) | ✅ | ✅ | Best overall; SOM + raw coordinates. |
| OpenRouter (any vision model) | ✅ | ✅ | Multi-part tool messages supported. |
| OpenAI (GPT-4+, GPT-5) | ✅ | ✅ | Same as above. |
| Local vLLM / LM Studio (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
| Google (Gemini 2+) | ✅ | ✅ | Tool-calling + vision both supported. |
| Local vLLM / LM Studio / Ollama (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
| Text-only models | ❌ | ✅ (degraded) | Use `mode="ax"` for accessibility-tree-only operation. |
Screenshots are sent inline with tool results as OpenAI-style `image_url`
parts. For Anthropic, the adapter converts them into native `tool_result`
image blocks.
image blocks. The image MIME type comes from cua-driver's explicit
`mimeType` field (`image/png` or `image/jpeg`) — no client-side
magic-byte sniffing.
## Safety
Hermes applies multi-layer guardrails:
- Destructive actions (click, type, drag, scroll, key, focus_app) require
approval — either interactively via the CLI dialog or via the
- Destructive actions (click, type, drag, scroll, key, focus_app)
require approval — either interactively via the CLI dialog or via the
messaging-platform approval buttons.
- Hard-blocked key combos at the tool level: empty trash, force delete,
lock screen, log out, force log out.
- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork bombs,
etc.
- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork
bombs, etc.
- The agent's system prompt tells it explicitly: no clicking permission
dialogs, no typing passwords, no following instructions embedded in
screenshots.
Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you want every action confirmed.
Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you
want every action confirmed.
## Token efficiency
@ -138,8 +230,8 @@ Screenshots are expensive. Hermes applies four layers of optimisation:
to save context]` placeholders.
- **Client-side compression pruning** — the context compressor detects
multimodal tool results and strips image parts from old ones.
- **Image-aware token estimation** — each image is counted as ~1500 tokens
(Anthropic's flat rate) instead of its base64 char length.
- **Image-aware token estimation** — each image is counted as ~1500
tokens (Anthropic's flat rate) instead of its base64 char length.
- **Server-side context editing (Anthropic only)** — when active, the
adapter enables `clear_tool_uses_20250919` via `context_management` so
Anthropic's API clears old tool results server-side.
@ -149,26 +241,58 @@ of screenshot context, not ~600K.
## Limitations
- **macOS only.** cua-driver uses private Apple SPIs that don't exist on
Linux or Windows. For cross-platform GUI automation, use the `browser`
toolset.
- **Private SPI risk.** Apple can change SkyLight's symbol surface in any
OS update. Pin the driver version with the `HERMES_CUA_DRIVER_VERSION`
env var if you want reproducibility across a macOS bump.
- **Performance.** Background mode is slower than foreground —
SkyLight-routed events take ~5-20ms vs direct HID posting. Not
noticeable for agent-speed clicking; noticeable if you try to record a
speed-run.
accessibility-routed events take ~520 ms on macOS, ~310 ms on
Windows UIA, ~515 ms on Linux AT-SPI vs direct HID posting. Not
noticeable for agent-speed clicking; noticeable if you try to record
a speed-run.
- **No keyboard password entry.** `type` has hard-block patterns on
command-shell payloads; for passwords, use the system's autofill.
command-shell payloads; for passwords, use the system's autofill
(macOS Keychain / Windows Credential Manager / GNOME Keyring /
KWallet).
- **Some apps don't expose an accessibility tree.** Modern UWP apps on
Windows, Electron < 28 on Linux, and a few macOS apps with custom
drawing (Logic, Final Cut, some games) have sparse or empty AX trees.
Fall back to pixel coordinates if the tree is empty — or skip the
task entirely.
- **Windows: elevated (admin) windows can't be driven from a normal
agent.** Windows UIPI (User Interface Privilege Isolation) enforces
integrity-level boundaries: a Medium-integrity process (the default
Hermes agent) cannot enumerate the UIA tree of, or inject mouse input
into, a window owned by a High-integrity (Administrator) process.
Symptom: `capture(mode='som')` returns 0 elements and `click(...)`
reports success while doing nothing, even though the screenshot
renders fine (GDI capture sits below the integrity check). Keyboard
events partially bypass UIPI, so Tab / Enter can still navigate an
elevated dialog. This is an OS constraint, not a cua-driver bug — it
affects every Windows automation stack. To drive elevated windows,
run the Hermes agent itself at High integrity (launch from an
elevated terminal); otherwise target non-elevated windows.
- **Platform-specific deployment gotchas:**
- **macOS** uses private SkyLight SPIs. Apple can change them in any
OS update. Hermes warns when the installed cua-driver is older than
the version it was tested against.
- **Windows** SSH sessions run in **Session 0**, which has no
interactive desktop. Drive Hermes from inside the RDP / console
session, or set up cua-driver's autostart Scheduled Task —
[windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh)
has the recipe.
- **Linux** requires a reachable display server. Headless servers
need Xvfb (`Xvfb :99 -screen 0 1920x1080x24`) before
`computer_use` can capture or inject events. Pure Wayland sessions
need an XWayland bridge for screen capture (cua-driver's Wayland
inject path handles input independently).
For cross-platform GUI automation without the desktop overhead (and
without TCC / Session 0 / X11 setup), the `browser` toolset uses a
real headless Chromium and is the right answer for web-only tasks.
## Configuration
Override the driver binary path (tests / CI):
Override the driver binary path (tests / CI / local builds):
```
HERMES_CUA_DRIVER_CMD=/opt/homebrew/bin/cua-driver
HERMES_CUA_DRIVER_VERSION=0.5.0 # optional pin
HERMES_CUA_DRIVER_CMD=/path/to/your/cua-driver
```
Swap the backend entirely (for testing):
@ -177,25 +301,170 @@ Swap the backend entirely (for testing):
HERMES_COMPUTER_USE_BACKEND=noop # records calls, no side effects
```
### Telemetry
cua-driver ships with anonymous usage telemetry (PostHog) enabled by default
upstream. **Hermes disables it for you** — on every cua-driver invocation
(the MCP backend, `status`, `doctor`, and install) Hermes sets
`CUA_DRIVER_RS_TELEMETRY_ENABLED=0` in the driver's environment.
To opt back in (let cua-driver use its own default and send telemetry), set
this in `config.yaml`:
```yaml
computer_use:
cua_telemetry: true # default: false (telemetry off)
```
When it's on, `hermes computer-use doctor` reports `telemetry: enabled`;
when off (the default), it reports `telemetry: disabled via
CUA_DRIVER_RS_TELEMETRY_ENABLED`.
## Testing against a local cua-driver build
When you're developing cua-driver itself — or want to test an
unreleased fix — point Hermes at a binary you built from source instead
of the published release. Hermes resolves the driver with
`shutil.which("cua-driver")` and **does not enforce
`HERMES_CUA_DRIVER_VERSION`**, so a local build (reported as
`0.0.0-local-*`) is accepted as-is. Two approaches:
### Option A — `install-local` (build + put it on PATH)
From your `trycua/cua` checkout, run the upstream local installer. It
builds the Rust backend in release mode and drops `cua-driver` into the
same install layout the production installer uses, adding its bin dir
to your PATH:
```powershell
# Windows (PowerShell), from the cua repo root
./libs/cua-driver/scripts/install-local.ps1 -NoAutoStart
```
```bash
# macOS / Linux, from the cua repo root (defaults to a debug build without --release)
./libs/cua-driver/scripts/install-local.sh --release
```
- Windows stages the build under `%USERPROFILE%\.cua-driver\packages\…`
and junctions
`%LOCALAPPDATA%\Programs\Cua\cua-driver\bin` (added to your User
PATH) to it. macOS/Linux symlinks `cua-driver` into `~/.local/bin`
(override with `--bin-dir <path>`).
- `-NoAutoStart` skips registering the `cua-driver-serve` logon daemon
— you don't need it for Hermes testing (see notes).
Then open a fresh shell (so the PATH change is visible) and confirm:
```
cua-driver --version # local builds report 0.0.0-local-release
# Windows: (Get-Command cua-driver).Source
# macOS/Linux: which cua-driver
```
### Option B — point Hermes straight at the built binary (fastest loop)
Skip the install ceremony entirely: `cargo build` and set
`HERMES_CUA_DRIVER_CMD` to the resulting binary. Best for rapid
edit/build/test.
```bash
cargo build -p cua-driver # add --release for a release build; run from libs/cua-driver/rust
```
```
# Windows (.env)
HERMES_CUA_DRIVER_CMD=C:\path\to\cua\libs\cua-driver\rust\target\debug\cua-driver.exe
# macOS / Linux (.env)
HERMES_CUA_DRIVER_CMD=/path/to/cua/libs/cua-driver/rust/target/debug/cua-driver
```
### Confirm Hermes is using your build
- `hermes computer-use status` prints the resolved binary path and
version.
- `hermes computer-use doctor` confirms the binary is reachable and
exercises the full MCP path end-to-end.
- In a session, `computer_use(action="capture")` exercises the spawned
`cua-driver mcp` child process.
### Notes & gotchas
- **Hermes spawns its own `cua-driver mcp` child over stdio** — it does
*not* attach to the long-running `cua-driver serve` autostart daemon
or its named pipe. So the scheduled task / LaunchAgent is unnecessary
for testing (`-NoAutoStart` is fine). The autostart daemon and the
Windows UIAccess worker (`cua-driver-uia.exe`) only matter for
foreground-safe input on some apps (e.g. WPF); the standard tool
surface works through the stdio child. On Windows SSH sessions, the
autostart pattern IS needed — see the Limitations section.
- **Locked binary on Windows.** A running `cua-driver-serve` daemon can
hold `cua-driver.exe` and block an overwrite on rebuild.
`install-local.ps1` renames the locked binary out of the way
automatically; if you `cargo build` manually (Option B), stop it
first with `cua-driver autostart disable` (or `schtasks /End /TN
cua-driver-serve`).
- **Rebuild loop.** After editing cua-driver source, re-run
`install-local` (rebuilds, restages, flips the `current` junction)
for Option A, or just re-`cargo build` for Option B — no Hermes
change needed either way.
- **Local builds skip the version check.** Hermes warns when the
installed cua-driver is older than its per-OS tested baseline, but
exempts `0.0.0-local-*` dev builds — so your local build never
triggers that warning.
## Troubleshooting
**`computer_use backend unavailable: cua-driver is not installed`** — Run
`hermes computer-use install` to fetch the cua-driver binary, or run
`hermes tools` and enable the Computer Use toolset.
**First action when anything's off: run `hermes computer-use doctor`.**
The structured per-check matrix tells you (and any agent helping you
debug) exactly what's wrong.
Specific failure modes the doctor doesn't catch:
**`computer_use backend unavailable: cua-driver is not installed`** —
Run `hermes computer-use install` to fetch the cua-driver binary, or
run `hermes tools` and enable the Computer Use toolset.
**Clicks seem to have no effect** — Capture and verify. A modal you
didn't see may be blocking input. Dismiss it with `escape` or the close
button.
**Element indices are stale** — SOM indices are only valid until the
next `capture`. Re-capture after any state-changing action.
next `capture`. Re-capture after any state-changing action. The
wrapper carries opaque `element_token`s for stale detection — you'll
see an explicit error rather than a wrong click.
**"blocked pattern in type text"** — The text you tried to `type`
matches the dangerous-shell-pattern list. Break the command up or
reconsider.
**Empty captures on Linux** — `DISPLAY` not set, or you're on pure
Wayland without an XWayland bridge. `hermes computer-use doctor` will
flag this as `ax_capability: fail` with a `Set DISPLAY (X11)…` hint.
**Empty captures on Windows over SSH** — You're in Session 0 (the
services session). Drive from RDP / console directly, or set up the
autostart pattern — see
[cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh).
## See also
- [Universal skill: `macos-computer-use`](https://github.com/NousResearch/hermes-agent/blob/main/skills/apple/macos-computer-use/SKILL.md)
- **Hermes-side skill**`skills/computer-use/SKILL.md` — teaches the
Hermes `computer_use` action vocabulary; this is what the agent loads.
- **cua-driver skill pack** — for platform-specific deep dives
(macOS no-foreground contract, Windows UIA + Session 0, Linux AT-SPI
+ X11/Wayland, recording, browser pages), run
`cua-driver skills install` and read `MACOS.md` / `WINDOWS.md` /
`LINUX.md` / `RECORDING.md` / `WEB_APPS.md`. Once `cua-driver skills
install` autodetects Hermes (planned follow-up), this happens
automatically on install.
- **cua.ai/docs** — the cua-driver project's documentation:
- [What is computer use?](https://cua.ai/docs/explanation/what-is-computer-use) — concept intro
- [The no-foreground contract](https://cua.ai/docs/explanation/the-no-foreground-contract) — *why* background mode matters
- [Install reference](https://cua.ai/docs/how-to-guides/driver/install) — cross-platform install details
- [Personalize the agent cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor) — built-in shapes, custom assets, runtime overrides
- [Drive Windows over SSH](https://cua.ai/docs/how-to-guides/driver/windows-ssh) — the Session 0 → Session 1+ autostart pattern
- [Keep cua-driver running](https://cua.ai/docs/how-to-guides/driver/keep-running) — autostart / daemon lifecycle
- [Connect your agent](https://cua.ai/docs/how-to-guides/driver/connect-your-agent) — register cua-driver with various harnesses (Hermes among them)
- [cua-driver source (trycua/cua)](https://github.com/trycua/cua)
- [Browser automation](./browser.md) for cross-platform web tasks.
- [Browser automation](./browser.md) for cross-platform web tasks where you don't need to drive native apps.

View file

@ -431,14 +431,14 @@ If you prefer JSX, use any bundler (esbuild, Vite, rollup) with React as an exte
├── dist/
│ ├── index.js # required — pre-built JS bundle (IIFE)
│ └── style.css # optional — custom CSS
└── plugin_api.py # optional — backend API routes (FastAPI)
└── plugin_api.py # bundled plugins only — backend API routes (FastAPI)
```
A single plugin directory can carry three orthogonal extensions:
- `plugin.yaml` + `__init__.py` — CLI/gateway plugin ([see plugins page](./plugins)).
- `dashboard/manifest.json` + `dashboard/dist/index.js` — dashboard UI plugin.
- `dashboard/plugin_api.py`dashboard backend routes.
- `dashboard/plugin_api.py`bundled plugins only; backend API routes.
None of them are required; include only the layers you need.
@ -743,7 +743,10 @@ Routes are mounted under `/api/plugins/<name>/`, so the above becomes:
- `GET /api/plugins/my-plugin/data`
- `POST /api/plugins/my-plugin/action`
Plugin API routes bypass session-token authentication since the dashboard server binds to localhost by default. **Don't expose the dashboard on a public interface with `--host 0.0.0.0` if you run untrusted plugins** — their routes become reachable too.
Security notes:
- Bundled plugin API routes bypass session-token authentication. The dashboard server binds to localhost by default, which mitigates the risks of this bypass.
- User-installed and project dashboard plugins may still extend the UI with static JS/CSS, but their Python `api` files are not auto-imported by the dashboard server. Backend routes are reserved for bundled plugins.
#### Accessing Hermes internals
@ -804,11 +807,14 @@ The dashboard scans three directories for `dashboard/manifest.json`:
| Priority | Directory | Source label |
|----------|-----------|--------------|
| 1 (wins on conflict) | `~/.hermes/plugins/<name>/dashboard/` | `user` |
| 2 | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
| 2 | `<repo>/plugins/<name>/dashboard/` | `bundled` |
| 1 (wins on conflict) | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
| 1 (wins on conflict) | `<repo>/plugins/<name>/dashboard/` | `bundled` |
| 2 | `~/.hermes/plugins/<name>/dashboard/` | `user` |
| 3 | `./.hermes/plugins/<name>/dashboard/` | `project` — only when `HERMES_ENABLE_PROJECT_PLUGINS` is set |
Bundled dashboard plugins win name conflicts because only bundled plugins may
register backend routes. Give user and project dashboard plugins unique names.
Discovery results are cached per dashboard process. After adding a new plugin, either:
```bash
@ -908,10 +914,11 @@ Check that the file is in `~/.hermes/dashboard-themes/` and ends in `.yaml` or `
The `sidebar` slot only renders when the active theme has `layoutVariant: cockpit`. Other slots always render. If you're registering into a slot with no hits, add `console.log` inside `registerSlot` to confirm the plugin bundle ran at all.
**Plugin backend routes return 404.**
1. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
2. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
3. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
4. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.
1. Confirm the plugin is bundled with Hermes. User-installed and project dashboard plugins can extend the UI, but their Python backend routes are not auto-imported.
2. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
3. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
4. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
5. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.
**Theme change drops my color overrides.**
`colorOverrides` are scoped to the active theme and cleared on theme switch — that's by design. If you want overrides that persist, put them in your theme's YAML, not in the live switcher.

View file

@ -40,13 +40,57 @@ What you'll see:
| Command | What it does |
|---|---|
| `/goal <text>` | Set (or replace) the standing goal. Kicks off the first turn immediately so you don't need to send a separate message. |
| `/goal draft <text>` | Draft a structured completion contract from a plain-language objective, then set it. See [Completion contracts](#completion-contracts). |
| `/goal show` | Print the active goal's completion contract. |
| `/goal` or `/goal status` | Show the current goal, its status, and turns used. |
| `/goal pause` | Stop the auto-continuation loop without clearing the goal. |
| `/goal resume` | Resume the loop (resets the turn counter back to zero). |
| `/goal clear` | Drop the goal entirely. |
| `/goal wait <pid> [reason]` | Park the loop on a background process — it stops re-poking the agent every turn while the process runs, and auto-resumes when it exits. |
| `/goal unwait` | Drop the wait barrier and resume the loop immediately. |
Works identically on the CLI and every gateway platform (Telegram, Discord, Slack, Matrix, Signal, WhatsApp, SMS, iMessage, Webhook, API server, and the web dashboard).
## Completion contracts
A bare `/goal <text>` works fine, but a *vague* goal makes for vague judging — the judge can only check what you told it to want. Codex's `/goal` guidance makes the same point: a durable objective works best when it names **what done means, how to prove it, what not to break, what's in scope, and when to stop**. Hermes adapts this as an optional **completion contract** layered on top of the existing goal loop.
A contract has five fields, all optional:
| Field | Meaning |
|---|---|
| `outcome` | The single end state that must be true when done. |
| `verification` | The specific test / command / artifact that *proves* the outcome. |
| `constraints` | What must not change or regress. |
| `boundaries` | Which files, dirs, tools, or systems are in scope. |
| `stop_when` | The condition under which Hermes should stop and ask for input. |
When a contract is set, both prompts change: the **continuation prompt** tells the agent to target the verification surface and respect the constraints, and the **judge prompt** decides `done` *only when the verification criterion is met with concrete evidence* (a command result, file excerpt, test output) — not a loose "looks done" claim. This directly tightens the most common `/goal` failure mode (premature completion or endless over-continuation on an underspecified objective).
### Two ways to set a contract
**1. Let Hermes draft it** (recommended — adapted from Codex's "let the agent draft the goal" tip):
```
/goal draft Migrate the auth service from session cookies to JWT
```
Hermes expands your one-liner into a full contract via the `goal_judge` auxiliary model, sets it, and shows you the result so you can review or tighten any field. If the aux model is unavailable, it falls back to a plain free-form goal — drafting never blocks setting a goal.
**2. Write it inline** with `field: value` lines:
```
/goal Migrate auth to JWT
verify: pytest tests/auth passes
constraints: keep the /login response shape unchanged
boundaries: only touch services/auth and its tests
stop when: a DB schema migration is required
```
The first non-field line(s) are the goal headline; recognized field prefixes (`verify:`, `verified by:`, `constraints:`, `preserve:`, `boundaries:`, `scope:`, `stop when:`, `blocked:`, …) populate the contract. A plain goal with an incidental colon (`Fix bug: the parser drops commas`) is **not** mangled — only known field prefixes are pulled out.
Use `/goal show` to review the active contract. Contracts persist in `SessionDB.state_meta` alongside the goal, so they survive `/resume`. Old goals from before this feature load unchanged (no contract). Contracts and `/subgoal` criteria compose: subgoals fold into the contract as extra criteria the judge must also satisfy.
## Adding criteria mid-goal: `/subgoal`
While a goal is active you can append extra acceptance criteria with `/subgoal <text>` without resetting the loop. Each call adds one numbered item to the goal's subgoal list; the **continuation prompt** the agent sees on the next turn includes the original goal plus an "Additional criteria the user added mid-loop" block, and the **judge prompt** is rewritten so the verdict must consider every subgoal — the goal isn't marked done until the original objective **and** every subgoal are met.
@ -62,6 +106,29 @@ Subgoals are persisted alongside the goal in `SessionDB.state_meta`, so they sur
Use this when you start a loop ("fix the failing tests") and notice partway through that you also want it to "and add a regression test for the bug you just patched" — `/subgoal add a regression test` tightens the success criteria without breaking the running loop.
## Parking on a background process: automatic, with a manual override
Some goals are gated on something that takes minutes and runs on its own — CI on a pushed PR, a long build, a test matrix, a deploy, a rate-limit cooldown. Without help, the goal loop would re-poke the agent every turn into "is it done yet?" busy-work while it waits.
**This is handled automatically.** Every turn, the judge is shown the agent's live background processes (the `terminal(background=true)` registry — pid, session id, command, uptime, recent output, and any `watch_patterns` / `notify_on_complete` trigger) alongside the goal and the agent's response. When the agent's progress is genuinely gated on one of them, the judge returns a **`wait`** verdict instead of `continue`, and the loop **parks**: the next turns are skipped (no judge call, no continuation, no turn consumed) until the wait is satisfied — then it resumes normally with the result in hand. The judge can also park on a **time** basis (`wait_for_seconds`) for backoff/cooldown waits. `/goal status` shows `⏳ Goal (parked …)` while parked.
The judge picks the right kind of wait from the process's own signal:
- **`wait_on_session <id>`** — releases when the process's *own trigger* fires: it exits, **or** (if it was started with `watch_patterns`) its pattern matches. This is the one for a long-lived watcher / server / poller that signals **mid-run** (e.g. a build process that prints `BUILD SUCCESSFUL` and keeps running, or a `notify_on_complete` watcher) and may never exit on its own.
- **`wait_on_pid <pid>`** — releases on process exit only.
- **`wait_for_seconds <n>`** — releases after a fixed delay.
You don't type anything for this — it's the judge's decision, made from the process context the loop hands it. The manual commands exist as an override:
| Command | What it does |
|---|---|
| `/goal wait <pid> [reason]` | Manually park the loop until the process with that PID exits. |
| `/goal unwait` | Clear any wait barrier (judge- or manually-set) and resume immediately. |
The barrier (pid- or time-based) is persisted with the goal in `SessionDB.state_meta`, so it survives `/resume`. `/goal pause`, `/goal resume`, and `/goal clear` all drop it. If the PID is already dead when the barrier is set (or dies while parked), or the time deadline passes, the barrier clears on the next check — a stale barrier can never wedge the loop.
Typical flow: the agent pushes a PR, starts a CI watcher with `terminal(background=true, notify_on_complete=true)`, and reports "watching CI." The judge sees the watcher process still running, returns `wait` on its pid, and the loop goes quiet — then picks back up the instant CI finishes and judges the goal against the actual result.
## Behavior details
### The judge
@ -94,7 +161,7 @@ Any real message you send while a goal is active takes priority over the continu
### Mid-run safety (gateway)
While an agent is already running, `/goal status`, `/goal pause`, and `/goal clear` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
While an agent is already running, `/goal status`, `/goal pause`, `/goal clear`, `/goal wait`, and `/goal unwait` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
### Persistence

View file

@ -61,6 +61,8 @@ AI-native cross-session user modeling with dialectic reasoning, session-scoped c
- `dialecticCadence` — how often the dialectic LLM fires (LLM call frequency)
- `dialecticDepth` — how many `.chat()` passes per dialectic invocation (13, depth of reasoning)
The auto-injected dialectic also scales its reasoning level by query length (longer query → deeper reasoning, capped at `reasoningLevelCap`); see [Query-Adaptive Reasoning Level](./honcho.md#query-adaptive-reasoning-level).
**Setup Wizard:**
```bash
hermes memory setup # select "honcho" — runs the Honcho-specific post-setup
@ -315,31 +317,55 @@ echo "OPENVIKING_API_KEY=..." >> ~/.hermes/.env
### Mem0
Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication.
Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication. Supports both Mem0 Platform (cloud) and OSS (self-hosted) modes.
| | |
|---|---|
| **Best for** | Hands-off memory management — Mem0 handles extraction automatically |
| **Requires** | `pip install mem0ai` + API key |
| **Data storage** | Mem0 Cloud |
| **Cost** | Mem0 pricing |
| **Requires** | `pip install mem0ai` + API key (platform) or LLM/vector store (OSS) |
| **Data storage** | Mem0 Cloud (platform) or self-hosted (OSS) |
| **Cost** | Mem0 pricing (platform) / free (OSS) |
**Tools:** `mem0_profile` (all stored memories), `mem0_search` (semantic search + reranking), `mem0_conclude` (store verbatim facts)
**Tools (5):** `mem0_list` (list all memories, paginated), `mem0_search` (semantic search with reranking in platform mode), `mem0_add` (store verbatim facts), `mem0_update` (update by ID), `mem0_delete` (delete by ID)
**Setup:**
**Setup (Platform):**
```bash
hermes memory setup # select "mem0"
hermes memory setup # select "mem0" → "Platform"
# Or manually:
hermes config set memory.provider mem0
echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env
```
**Config:** `$HERMES_HOME/mem0.json`
**Setup (OSS):**
```bash
hermes memory setup # select "mem0" → "Open Source (self-hosted)"
# Or via flags:
hermes memory setup mem0 --mode oss --oss-llm openai --oss-llm-key sk-... --oss-vector qdrant
```
Preview without writing files:
```bash
hermes memory setup mem0 --mode oss --oss-llm-key sk-... --dry-run
```
**Config:** `$HERMES_HOME/mem0.json` (behavioral settings). Only the secret `MEM0_API_KEY` belongs in `~/.hermes/.env`.
| Key | Default | Description |
|-----|---------|-------------|
| `mode` | `platform` | `platform` (Mem0 Cloud) or `oss` (self-hosted) |
| `user_id` | `hermes-user` | User identifier |
| `agent_id` | `hermes` | Agent identifier |
| `rerank` | `true` | Rerank search results for relevance (platform mode only) |
**OSS supported providers:**
| Component | Providers |
|-----------|-----------|
| LLM | openai, ollama |
| Embedder | openai, ollama |
| Vector Store | qdrant (local/server), pgvector |
**Switching modes:** Re-run `hermes memory setup mem0 --mode <platform|oss>` or edit `mem0.json` directly.
---
@ -569,7 +595,7 @@ hermes memory setup
|----------|---------|------|-------|-------------|----------------|
| **Honcho** | Cloud | Paid | 5 | `honcho-ai` | Dialectic user modeling + session-scoped context |
| **OpenViking** | Self-hosted | Free | 5 | `openviking` + server | Filesystem hierarchy + tiered loading |
| **Mem0** | Cloud | Paid | 3 | `mem0ai` | Server-side LLM extraction |
| **Mem0** | Cloud/Self-hosted | Free/Paid | 5 | `mem0ai` | Server-side LLM extraction + OSS mode |
| **Hindsight** | Cloud/Local | Free/Paid | 3 | `hindsight-client` | Knowledge graph + reflect synthesis |
| **Holographic** | Local | Free | 2 | None | HRR algebra + trust scoring |
| **RetainDB** | Cloud | $20/mo | 5 | `requests` | Delta compression |

View file

@ -270,6 +270,31 @@ display:
> writes to your memory/skill stores, are unaffected by this setting. Set it
> per-platform via `display.platforms.<platform>.memory_notifications`.
## Running the review on a cheaper model (`auxiliary.background_review`)
The review runs on your **main chat model** by default, replaying the
conversation — which is already warm in the prompt cache, so it's cheap cache
reads. On an expensive main model you can run the review on a cheaper model
instead:
```yaml
auxiliary:
background_review:
provider: openrouter
model: google/gemini-3-flash-preview # auto (default) = main chat model
```
When you point it at a model **different** from your main one, the review runs
there for substantially lower cost (~35× in benchmarks). Because a different
model can't reuse your main model's prompt cache anyway, the fork automatically
replays a compact **digest** of the conversation (recent turns verbatim + a
summary of older ones) rather than the full transcript — minimizing what it
writes to the new cache. Capture holds: in testing, memory capture was
identical and skill capture near-identical to the main-model review.
Leave it at `auto` (or set it to your main model) and nothing changes — the
review keeps running on the main model with the full warm-cache replay.
## Controlling skill writes (`skills.write_approval`)
Skills use the same on/off gate, but the review UX differs because a

View file

@ -71,6 +71,42 @@ hermes chat --toolsets skills -q "What skills do you have?"
hermes chat --toolsets skills -q "Show me the axolotl skill"
```
## Learning a skill from sources (`/learn`)
`/learn` is the fast way to turn something you already know — or a pile of
reference material — into a reusable skill, without hand-writing the
`SKILL.md`. It is open-ended: point it at *anything you can describe* and the
agent gathers the material with the tools it already has, then authors a skill
that follows the [house authoring standards](#skillmd-format) (≤60-char
description, the standard section order, Hermes-tool framing, no invented
commands).
```bash
# A local SDK or doc directory — read with read_file / search_files
/learn the REST client in ~/projects/acme-sdk, focus on auth + pagination
# An online doc page — fetched with web_extract
/learn https://docs.example.com/api/quickstart
# The workflow you just walked the agent through in this conversation
/learn how I just deployed the staging server
# Pasted notes / a described procedure
/learn filing an expense: open the portal, New > Expense, attach the receipt, submit
```
Because the live agent does the sourcing, `/learn` works the same in the CLI,
the messaging gateway, the TUI, and the dashboard — and on any terminal backend
(local, Docker, remote), since there is no separate ingestion engine. In the
**dashboard**, the Skills page has a **Learn a skill** button that opens a panel
with a directory field, a URL field, and an open-ended text box; it composes a
`/learn` request and runs it in chat.
There is no model-tool footprint: `/learn` builds a standards-guided prompt and
hands it to the agent as a normal turn. The agent saves the result with the
`skill_manage` tool, so the [write-approval gate](#gating-agent-skill-writes-skillswrite_approval)
applies if you have it on.
## Progressive Disclosure
Skills use a token-efficient loading pattern:

View file

@ -109,7 +109,7 @@ Hermes 应用多层防护机制:
## 限制
- **仅限 macOS。** cua-driver 使用的私有 Apple SPI 在 Linux 或 Windows 上不存在。跨平台 GUI 自动化请使用 `browser` 工具集。
- **私有 SPI 风险。** Apple 可能在任何 OS 更新中更改 SkyLight 的符号接口。如需在 macOS 版本升级时保持可复现性,请通过 `HERMES_CUA_DRIVER_VERSION` 环境变量固定驱动版本
- **私有 SPI 风险。** Apple 可能在任何 OS 更新中更改 SkyLight 的符号接口。Hermes 始终安装最新版 cua-driver并在已安装的二进制文件低于其测试基线版本按操作系统分别设定时发出警告。没有版本固定开关——如需可复现的版本请将 `HERMES_CUA_DRIVER_CMD` 指向特定的二进制文件
- **性能。** 后台模式比前台模式慢——SkyLight 路由事件耗时约 520ms而直接 HID 投递更快。对于 Agent 速度的点击操作无明显影响;若尝试录制速通视频则会有感知。
- **不支持键盘输入密码。** `type` 对命令行 payload 有硬性屏蔽模式;密码请使用系统自动填充功能。
@ -119,7 +119,6 @@ Hermes 应用多层防护机制:
```
HERMES_CUA_DRIVER_CMD=/opt/homebrew/bin/cua-driver
HERMES_CUA_DRIVER_VERSION=0.5.0 # optional pin
```
完全替换后端(用于测试):