Merge remote-tracking branch 'origin/main' into bb/pets-merge

# Conflicts: # hermes_cli/commands.py # tui_gateway/server.py
2026-06-27 11:22:03 +00:00 · 2026-06-23 19:05:22 -05:00 · 2026-06-23 19:05:22 -05:00 · e495b33bf1
commit e495b33bf1
parent 5342eccf12 40fddc9e4c
251 changed files with 23395 additions and 2720 deletions
--- a/website/docs/user-guide/features/computer-use.md
+++ b/website/docs/user-guide/features/computer-use.md
@ -3,36 +3,45 @@ title: Computer Use
 sidebar_position: 16
 ---

-# Computer Use (macOS)
+# Computer Use

-Hermes Agent can drive your Mac's desktop — clicking, typing, scrolling,
-dragging — in the **background**. Your cursor doesn't move, keyboard focus
-doesn't change, and macOS doesn't switch Spaces on you. You and the agent
-co-work on the same machine.
+Hermes Agent can drive your desktop — clicking, typing, scrolling,
+dragging — in the **background** on **macOS, Windows, and Linux**. Your
+cursor doesn't move, keyboard focus doesn't change, and your virtual
+desktops / Spaces don't switch on you. You and the agent co-work on the
+same machine.

 Unlike most computer-use integrations, this works with **any tool-capable
-model** — Claude, GPT, Gemini, or an open model on a local vLLM endpoint.
-There's no Anthropic-native schema to worry about.
+model** — Claude, GPT, Gemini, or an open model on a local
+OpenAI-compatible endpoint. There's no Anthropic-native schema to worry
+about.

 ## How it works

-The `computer_use` toolset speaks MCP over stdio to [`cua-driver`](https://github.com/trycua/cua),
-a macOS driver that uses SkyLight private SPIs (`SLEventPostToPid`,
-`SLPSPostEventRecordTo`) and the `_AXObserverAddNotificationAndCheckRemote`
-accessibility SPI to:
+The `computer_use` toolset speaks MCP over stdio to
+[`cua-driver`](https://github.com/trycua/cua), an open-source background
+computer-use driver. Each platform uses the appropriate accessibility +
+input stack under the hood:

- Post synthesized events directly to target processes — no HID event tap,
-  no cursor warp.
- Flip AppKit active-state without raising windows — no Space switching.
- Keep Chromium/Electron accessibility trees alive when windows are
-  occluded.
+| Platform | Accessibility tree | Input dispatch |
+|---|---|---|
+| macOS | AX (private SkyLight SPIs) | `SLPSPostEventRecordTo` — pid-scoped, no cursor warp |
+| Windows | UIAutomation | `SendInput` + `PostMessage` — no focus steal |
+| Linux | AT-SPI (X11 + Wayland) | XTest (X11) / virtual-keyboard (Wayland) |

-That combination is what OpenAI's Codex "background computer-use" ships.
-cua-driver is the open-source equivalent.
+The result is the same on every platform: the agent can read the
+accessibility tree of any visible window AND post synthesized events
+without bringing it to front, switching virtual desktops, or moving the
+real OS cursor.
+
+For the underlying contract — *why* background mode matters, the
+no-foreground invariant, click-dispatch internals — see
+**[cua.ai/docs/explanation/the-no-foreground-contract](https://cua.ai/docs/explanation/the-no-foreground-contract)**.

 ## Enabling

-Pick whichever path is most convenient — both run the same upstream installer:
+Pick whichever path is most convenient — both run the same upstream
+installer:

 **Option 1: dedicated CLI command (most direct).**

@ -40,63 +49,142 @@ Pick whichever path is most convenient — both run the same upstream installer:
 hermes computer-use install
 ```

-This fetches and runs the upstream cua-driver installer:
-`curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh`.
-Use `hermes computer-use status` to verify the install.
+This fetches and runs the upstream cua-driver installer — `install.sh`
+on macOS/Linux, `install.ps1` on Windows. Use `hermes computer-use
+status` to verify the install.

 **Option 2: enable the toolset interactively.**

-1. Run `hermes tools`, pick `🖱️ Computer Use (macOS)` → `cua-driver (background)`.
+1. Run `hermes tools`, pick `🖱️  Computer Use (macOS/Windows/Linux)`.
 2. The setup runs the upstream installer (same as Option 1).

-After installing, regardless of which path you took:
+After installing, regardless of which path you took, grant the
+platform-appropriate prereqs:

-3. Grant macOS permissions when prompted:
-   - **System Settings → Privacy & Security → Accessibility** → allow the
-     terminal (or Hermes app).
-   - **System Settings → Privacy & Security → Screen Recording** → allow
-     the same.
-4. Start a session with the toolset enabled:
-   ```
-   hermes -t computer_use chat
-   ```
-   or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.
+| Platform | Prereqs |
+|---|---|
+| **macOS** | System Settings → Privacy & Security → **Accessibility** + **Screen Recording** → allow your terminal (or Hermes app). `hermes computer-use doctor` will tell you which permission is missing. |
+| **Windows** | None at install time. If you're driving over SSH (not RDP / console), you need the autostart pattern — see [cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh) for the Session 0 ↔ Session 1+ proxy. |
+| **Linux** | A reachable display server: `DISPLAY` set for X11, or `XDG_SESSION_TYPE=wayland`. Wayland sessions need an XWayland bridge for capture. AT-SPI must be on (default on GNOME/KDE/Xfce). |

-## Keeping cua-driver up to date
+Then start a session with the toolset enabled:

-The cua-driver project ships fixes regularly (e.g. v0.1.6 fixed a Safari
-window-focus bug for UTM workflows). Hermes refreshes the binary in two
-places so you don't get stuck on a stale release:
+```
+hermes -t computer_use chat
+```

- **`hermes update`** — when you update Hermes itself, if `cua-driver` is
-  on PATH the upstream installer re-runs at the end of the update.
-  No-op for non-macOS users and for users without cua-driver installed.
- **`hermes computer-use install --upgrade`** — manual force-refresh.
-  Re-runs the upstream installer regardless of whether cua-driver is
-  already installed. Use this when you want the latest fix without
-  waiting for the next agent update.
+or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.

-`hermes computer-use status` shows the installed version next to the
-binary path.
+## `hermes computer-use doctor` — your first triage stop
+
+`hermes computer-use doctor` runs cua-driver's structured
+`health_report` MCP tool and prints a per-check matrix. It's the single
+fastest way to find out *why* an action isn't working.
+
+```
+$ hermes computer-use doctor
+⚠️  cua-driver 0.5.8 on darwin — degraded
+  ✅ binary_version: cua-driver 0.5.8
+  ✅ platform_supported: macOS 26.4.1 (arm64)
+  ✅ session_active: MCP session is active.
+  ❌ bundle_identity: Process has no CFBundleIdentifier.
+      → Run the binary inside CuaDriver.app so TCC grants attribute correctly.
+  ✅ tcc_accessibility: Accessibility is granted.
+  ✅ tcc_screen_recording: Screen Recording is granted.
+  ✅ ax_capability: AX is trusted and reachable.
+  ✅ screen_capture_capability: ScreenCaptureKit reachable; 1 display(s) shareable.
+```
+
+- **Exit code 0** when overall is `ok` — everything's wired up.
+- **Exit code 1** when `degraded` or `failed` — at least one check failed; the hint on each failure tells you what to fix.
+- **Exit code 2** when the cua-driver binary itself isn't reachable.
+
+Useful flags:
+
+- `--include CHECK` — run only the listed checks (repeat for multiple)
+- `--skip CHECK` — skip a check (wins over `--include`)
+- `--json` — emit the raw structured payload, same shape as the
+  `tools/call health_report` MCP response
+
+The check matrix is platform-aware: `bundle_identity` / `tcc_*` are
+`skip` on Windows + Linux because those concepts don't apply.
+`ax_capability` checks AX on macOS, UIA on Windows, AT-SPI on Linux —
+each with the right diagnostic hint when it can't reach.
+
+## The agent cursor and sessions
+
+When the agent acts, you'll see a **tinted overlay cursor** glide
+across the screen to where each click / type / scroll lands. The real
+OS cursor never moves — the overlay is a visual cue that says "the
+agent is acting here." Each Hermes run declares its own cua-driver
+**session id** (something like `hermes-3a7b9c14d2e8`); the cursor's
+identity is keyed to that session, so concurrent runs / subagents each
+get their own cursor without stepping on each other.
+
+Tune the cursor with `cua-driver`'s CLI flags or the runtime
+`set_agent_cursor_style` MCP tool — see
+[cua.ai/docs/how-to-guides/driver/personalize-cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor)
+for the full menu (built-in `arrow` vs `teardrop` silhouette, custom
+SVG / PNG / ICO via `--cursor-icon`, runtime gradient colors, bloom
+halo).
+
+## Going deeper — the cua-driver skill pack
+
+Hermes intentionally keeps its skill (`skills/computer-use/SKILL.md`)
+focused on the Hermes-side `computer_use` action vocabulary — the
+single source of truth the agent loads. For the deeper material —
+platform-specific deep dives, recording semantics, browser page
+interaction — point your agent harness at the cua-driver skill pack
+the cua-driver team ships and maintains directly:
+
+```
+cua-driver skills install
+```
+
+This symlinks the pack into your agent harness' skill directory. After
+running it, an agent gets access to:
+
+| File | Topic |
+|---|---|
+| `SKILL.md` | The cross-platform core (snapshot invariant, no-foreground contract, click dispatch, AX-tree mechanics) |
+| `MACOS.md` | macOS specifics: no-foreground contract, AXMenuBar navigation, SkyLight click dispatch, Apple Events JS bridge |
+| `WINDOWS.md` | Windows specifics: UIA tree, UWP / `ApplicationFrameHost` hosting, Session 0 isolation, autostart pattern |
+| `LINUX.md` | Linux specifics: AT-SPI tree, X11 / Wayland, terminal-emulator detection |
+| `RECORDING.md` | Trajectory + video recording semantics |
+| `WEB_APPS.md` | Browser-page interaction tips |
+| `TESTS.md` | Replay-by-trajectory workflow |
+
+These are **platform deep dives, not duplicates of the Hermes skill** —
+when an agent reports "on Windows, my click landed on the wrong
+element," it reads `WINDOWS.md` for the UIA / UWP context that
+explains why and what to do differently.
+
+`cua-driver skills status` shows what's installed and which agent
+harnesses it's linked into. Today the autodetect list covers Claude
+Code, Codex, OpenCode, OpenClaw, and Antigravity; **Hermes
+autodetection is planned as a follow-up in `trycua/cua`** — until
+then, run `cua-driver skills install` once and point your harness at
+the resulting `~/.cua-driver/skills/cua-driver` directory (or symlink
+it into your usual skill space).

 ## Quick example

 User prompt: *"Find my latest email from Stripe and summarise what they want me to do."*

-The agent's plan:
+The agent's plan (this is the same shape on macOS / Windows / Linux —
+the model substitutes the platform's idiomatic shortcut and app name):

 1. `computer_use(action="capture", mode="som", app="Mail")` — gets a
-   screenshot of Mail with every sidebar item, toolbar button, and message
-   row numbered.
-2. `computer_use(action="click", element=14)` — clicks the search field
-   (element #14 from the capture).
+   screenshot of the email app with every sidebar item, toolbar button,
+   and message row numbered.
+2. `computer_use(action="click", element=14)` — clicks the search field.
 3. `computer_use(action="type", text="from:stripe")`
-4. `computer_use(action="key", keys="return", capture_after=True)` — submit
-   and get the new screenshot.
+4. `computer_use(action="key", keys="return", capture_after=True)` —
+   submit and get the new screenshot.
 5. Click the top result, read the body, summarise.

-During all of this, your cursor stays wherever you left it and Mail never
-comes to front.
+During all of this, your cursor stays wherever you left it and the email
+app never comes to front.

 ## Provider compatibility

@ -105,29 +193,33 @@ comes to front.
 | Anthropic (Claude Sonnet/Opus 3+) | ✅ | ✅ | Best overall; SOM + raw coordinates. |
 | OpenRouter (any vision model) | ✅ | ✅ | Multi-part tool messages supported. |
 | OpenAI (GPT-4+, GPT-5) | ✅ | ✅ | Same as above. |
-| Local vLLM / LM Studio (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
+| Google (Gemini 2+) | ✅ | ✅ | Tool-calling + vision both supported. |
+| Local vLLM / LM Studio / Ollama (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
 | Text-only models | ❌ | ✅ (degraded) | Use `mode="ax"` for accessibility-tree-only operation. |

 Screenshots are sent inline with tool results as OpenAI-style `image_url`
 parts. For Anthropic, the adapter converts them into native `tool_result`
-image blocks.
+image blocks. The image MIME type comes from cua-driver's explicit
+`mimeType` field (`image/png` or `image/jpeg`) — no client-side
+magic-byte sniffing.

 ## Safety

 Hermes applies multi-layer guardrails:

- Destructive actions (click, type, drag, scroll, key, focus_app) require
-  approval — either interactively via the CLI dialog or via the
+- Destructive actions (click, type, drag, scroll, key, focus_app)
+  require approval — either interactively via the CLI dialog or via the
  messaging-platform approval buttons.
 - Hard-blocked key combos at the tool level: empty trash, force delete,
  lock screen, log out, force log out.
- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork bombs,
-  etc.
+- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork
+  bombs, etc.
 - The agent's system prompt tells it explicitly: no clicking permission
  dialogs, no typing passwords, no following instructions embedded in
  screenshots.

-Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you want every action confirmed.
+Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you
+want every action confirmed.

 ## Token efficiency

@ -138,8 +230,8 @@ Screenshots are expensive. Hermes applies four layers of optimisation:
  to save context]` placeholders.
 - **Client-side compression pruning** — the context compressor detects
  multimodal tool results and strips image parts from old ones.
- **Image-aware token estimation** — each image is counted as ~1500 tokens
-  (Anthropic's flat rate) instead of its base64 char length.
+- **Image-aware token estimation** — each image is counted as ~1500
+  tokens (Anthropic's flat rate) instead of its base64 char length.
 - **Server-side context editing (Anthropic only)** — when active, the
  adapter enables `clear_tool_uses_20250919` via `context_management` so
  Anthropic's API clears old tool results server-side.
@ -149,26 +241,58 @@ of screenshot context, not ~600K.

 ## Limitations

- **macOS only.** cua-driver uses private Apple SPIs that don't exist on
-  Linux or Windows. For cross-platform GUI automation, use the `browser`
-  toolset.
- **Private SPI risk.** Apple can change SkyLight's symbol surface in any
-  OS update. Pin the driver version with the `HERMES_CUA_DRIVER_VERSION`
-  env var if you want reproducibility across a macOS bump.
 - **Performance.** Background mode is slower than foreground —
-  SkyLight-routed events take ~5-20ms vs direct HID posting. Not
-  noticeable for agent-speed clicking; noticeable if you try to record a
-  speed-run.
+  accessibility-routed events take ~5–20 ms on macOS, ~3–10 ms on
+  Windows UIA, ~5–15 ms on Linux AT-SPI vs direct HID posting. Not
+  noticeable for agent-speed clicking; noticeable if you try to record
+  a speed-run.
 - **No keyboard password entry.** `type` has hard-block patterns on
-  command-shell payloads; for passwords, use the system's autofill.
+  command-shell payloads; for passwords, use the system's autofill
+  (macOS Keychain / Windows Credential Manager / GNOME Keyring /
+  KWallet).
+- **Some apps don't expose an accessibility tree.** Modern UWP apps on
+  Windows, Electron < 28 on Linux, and a few macOS apps with custom
+  drawing (Logic, Final Cut, some games) have sparse or empty AX trees.
+  Fall back to pixel coordinates if the tree is empty — or skip the
+  task entirely.
+- **Windows: elevated (admin) windows can't be driven from a normal
+  agent.** Windows UIPI (User Interface Privilege Isolation) enforces
+  integrity-level boundaries: a Medium-integrity process (the default
+  Hermes agent) cannot enumerate the UIA tree of, or inject mouse input
+  into, a window owned by a High-integrity (Administrator) process.
+  Symptom: `capture(mode='som')` returns 0 elements and `click(...)`
+  reports success while doing nothing, even though the screenshot
+  renders fine (GDI capture sits below the integrity check). Keyboard
+  events partially bypass UIPI, so Tab / Enter can still navigate an
+  elevated dialog. This is an OS constraint, not a cua-driver bug — it
+  affects every Windows automation stack. To drive elevated windows,
+  run the Hermes agent itself at High integrity (launch from an
+  elevated terminal); otherwise target non-elevated windows.
+- **Platform-specific deployment gotchas:**
+  - **macOS** uses private SkyLight SPIs. Apple can change them in any
+    OS update. Hermes warns when the installed cua-driver is older than
+    the version it was tested against.
+  - **Windows** SSH sessions run in **Session 0**, which has no
+    interactive desktop. Drive Hermes from inside the RDP / console
+    session, or set up cua-driver's autostart Scheduled Task —
+    [windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh)
+    has the recipe.
+  - **Linux** requires a reachable display server. Headless servers
+    need Xvfb (`Xvfb :99 -screen 0 1920x1080x24`) before
+    `computer_use` can capture or inject events. Pure Wayland sessions
+    need an XWayland bridge for screen capture (cua-driver's Wayland
+    inject path handles input independently).
+
+For cross-platform GUI automation without the desktop overhead (and
+without TCC / Session 0 / X11 setup), the `browser` toolset uses a
+real headless Chromium and is the right answer for web-only tasks.

 ## Configuration

-Override the driver binary path (tests / CI):
+Override the driver binary path (tests / CI / local builds):

 ```
-HERMES_CUA_DRIVER_CMD=/opt/homebrew/bin/cua-driver
-HERMES_CUA_DRIVER_VERSION=0.5.0    # optional pin
+HERMES_CUA_DRIVER_CMD=/path/to/your/cua-driver
 ```

 Swap the backend entirely (for testing):
@ -177,25 +301,170 @@ Swap the backend entirely (for testing):
 HERMES_COMPUTER_USE_BACKEND=noop   # records calls, no side effects
 ```

+### Telemetry
+
+cua-driver ships with anonymous usage telemetry (PostHog) enabled by default
+upstream. **Hermes disables it for you** — on every cua-driver invocation
+(the MCP backend, `status`, `doctor`, and install) Hermes sets
+`CUA_DRIVER_RS_TELEMETRY_ENABLED=0` in the driver's environment.
+
+To opt back in (let cua-driver use its own default and send telemetry), set
+this in `config.yaml`:
+
+```yaml
+computer_use:
+  cua_telemetry: true   # default: false (telemetry off)
+```
+
+When it's on, `hermes computer-use doctor` reports `telemetry: enabled`;
+when off (the default), it reports `telemetry: disabled via
+CUA_DRIVER_RS_TELEMETRY_ENABLED`.
+
+## Testing against a local cua-driver build
+
+When you're developing cua-driver itself — or want to test an
+unreleased fix — point Hermes at a binary you built from source instead
+of the published release. Hermes resolves the driver with
+`shutil.which("cua-driver")` and **does not enforce
+`HERMES_CUA_DRIVER_VERSION`**, so a local build (reported as
+`0.0.0-local-*`) is accepted as-is. Two approaches:
+
+### Option A — `install-local` (build + put it on PATH)
+
+From your `trycua/cua` checkout, run the upstream local installer. It
+builds the Rust backend in release mode and drops `cua-driver` into the
+same install layout the production installer uses, adding its bin dir
+to your PATH:
+
+```powershell
+# Windows (PowerShell), from the cua repo root
+./libs/cua-driver/scripts/install-local.ps1 -NoAutoStart
+```
+
+```bash
+# macOS / Linux, from the cua repo root  (defaults to a debug build without --release)
+./libs/cua-driver/scripts/install-local.sh --release
+```
+
+- Windows stages the build under `%USERPROFILE%\.cua-driver\packages\…`
+  and junctions
+  `%LOCALAPPDATA%\Programs\Cua\cua-driver\bin` (added to your User
+  PATH) to it. macOS/Linux symlinks `cua-driver` into `~/.local/bin`
+  (override with `--bin-dir <path>`).
+- `-NoAutoStart` skips registering the `cua-driver-serve` logon daemon
+  — you don't need it for Hermes testing (see notes).
+
+Then open a fresh shell (so the PATH change is visible) and confirm:
+
+```
+cua-driver --version                 # local builds report 0.0.0-local-release
+# Windows:      (Get-Command cua-driver).Source
+# macOS/Linux:  which cua-driver
+```
+
+### Option B — point Hermes straight at the built binary (fastest loop)
+
+Skip the install ceremony entirely: `cargo build` and set
+`HERMES_CUA_DRIVER_CMD` to the resulting binary. Best for rapid
+edit/build/test.
+
+```bash
+cargo build -p cua-driver            # add --release for a release build; run from libs/cua-driver/rust
+```
+
+```
+# Windows (.env)
+HERMES_CUA_DRIVER_CMD=C:\path\to\cua\libs\cua-driver\rust\target\debug\cua-driver.exe
+# macOS / Linux (.env)
+HERMES_CUA_DRIVER_CMD=/path/to/cua/libs/cua-driver/rust/target/debug/cua-driver
+```
+
+### Confirm Hermes is using your build
+
+- `hermes computer-use status` prints the resolved binary path and
+  version.
+- `hermes computer-use doctor` confirms the binary is reachable and
+  exercises the full MCP path end-to-end.
+- In a session, `computer_use(action="capture")` exercises the spawned
+  `cua-driver mcp` child process.
+
+### Notes & gotchas
+
+- **Hermes spawns its own `cua-driver mcp` child over stdio** — it does
+  *not* attach to the long-running `cua-driver serve` autostart daemon
+  or its named pipe. So the scheduled task / LaunchAgent is unnecessary
+  for testing (`-NoAutoStart` is fine). The autostart daemon and the
+  Windows UIAccess worker (`cua-driver-uia.exe`) only matter for
+  foreground-safe input on some apps (e.g. WPF); the standard tool
+  surface works through the stdio child. On Windows SSH sessions, the
+  autostart pattern IS needed — see the Limitations section.
+- **Locked binary on Windows.** A running `cua-driver-serve` daemon can
+  hold `cua-driver.exe` and block an overwrite on rebuild.
+  `install-local.ps1` renames the locked binary out of the way
+  automatically; if you `cargo build` manually (Option B), stop it
+  first with `cua-driver autostart disable` (or `schtasks /End /TN
+  cua-driver-serve`).
+- **Rebuild loop.** After editing cua-driver source, re-run
+  `install-local` (rebuilds, restages, flips the `current` junction)
+  for Option A, or just re-`cargo build` for Option B — no Hermes
+  change needed either way.
+- **Local builds skip the version check.** Hermes warns when the
+  installed cua-driver is older than its per-OS tested baseline, but
+  exempts `0.0.0-local-*` dev builds — so your local build never
+  triggers that warning.
+
 ## Troubleshooting

-**`computer_use backend unavailable: cua-driver is not installed`** — Run
-`hermes computer-use install` to fetch the cua-driver binary, or run
-`hermes tools` and enable the Computer Use toolset.
+**First action when anything's off: run `hermes computer-use doctor`.**
+The structured per-check matrix tells you (and any agent helping you
+debug) exactly what's wrong.
+
+Specific failure modes the doctor doesn't catch:
+
+**`computer_use backend unavailable: cua-driver is not installed`** —
+Run `hermes computer-use install` to fetch the cua-driver binary, or
+run `hermes tools` and enable the Computer Use toolset.

 **Clicks seem to have no effect** — Capture and verify. A modal you
 didn't see may be blocking input. Dismiss it with `escape` or the close
 button.

 **Element indices are stale** — SOM indices are only valid until the
-next `capture`. Re-capture after any state-changing action.
+next `capture`. Re-capture after any state-changing action. The
+wrapper carries opaque `element_token`s for stale detection — you'll
+see an explicit error rather than a wrong click.

 **"blocked pattern in type text"** — The text you tried to `type`
 matches the dangerous-shell-pattern list. Break the command up or
 reconsider.

+**Empty captures on Linux** — `DISPLAY` not set, or you're on pure
+Wayland without an XWayland bridge. `hermes computer-use doctor` will
+flag this as `ax_capability: fail` with a `Set DISPLAY (X11)…` hint.
+
+**Empty captures on Windows over SSH** — You're in Session 0 (the
+services session). Drive from RDP / console directly, or set up the
+autostart pattern — see
+[cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh).
+
 ## See also

- [Universal skill: `macos-computer-use`](https://github.com/NousResearch/hermes-agent/blob/main/skills/apple/macos-computer-use/SKILL.md)
+- **Hermes-side skill** — `skills/computer-use/SKILL.md` — teaches the
+  Hermes `computer_use` action vocabulary; this is what the agent loads.
+- **cua-driver skill pack** — for platform-specific deep dives
+  (macOS no-foreground contract, Windows UIA + Session 0, Linux AT-SPI
+  + X11/Wayland, recording, browser pages), run
+  `cua-driver skills install` and read `MACOS.md` / `WINDOWS.md` /
+  `LINUX.md` / `RECORDING.md` / `WEB_APPS.md`. Once `cua-driver skills
+  install` autodetects Hermes (planned follow-up), this happens
+  automatically on install.
+- **cua.ai/docs** — the cua-driver project's documentation:
+  - [What is computer use?](https://cua.ai/docs/explanation/what-is-computer-use) — concept intro
+  - [The no-foreground contract](https://cua.ai/docs/explanation/the-no-foreground-contract) — *why* background mode matters
+  - [Install reference](https://cua.ai/docs/how-to-guides/driver/install) — cross-platform install details
+  - [Personalize the agent cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor) — built-in shapes, custom assets, runtime overrides
+  - [Drive Windows over SSH](https://cua.ai/docs/how-to-guides/driver/windows-ssh) — the Session 0 → Session 1+ autostart pattern
+  - [Keep cua-driver running](https://cua.ai/docs/how-to-guides/driver/keep-running) — autostart / daemon lifecycle
+  - [Connect your agent](https://cua.ai/docs/how-to-guides/driver/connect-your-agent) — register cua-driver with various harnesses (Hermes among them)
 - [cua-driver source (trycua/cua)](https://github.com/trycua/cua)
- [Browser automation](./browser.md) for cross-platform web tasks.
+- [Browser automation](./browser.md) for cross-platform web tasks where you don't need to drive native apps.
--- a/website/docs/user-guide/features/extending-the-dashboard.md
+++ b/website/docs/user-guide/features/extending-the-dashboard.md
@ -431,14 +431,14 @@ If you prefer JSX, use any bundler (esbuild, Vite, rollup) with React as an exte
    ├── dist/
    │   ├── index.js         # required — pre-built JS bundle (IIFE)
    │   └── style.css        # optional — custom CSS
-    └── plugin_api.py        # optional — backend API routes (FastAPI)
+    └── plugin_api.py        # bundled plugins only — backend API routes (FastAPI)
 ```

 A single plugin directory can carry three orthogonal extensions:

 - `plugin.yaml` + `__init__.py` — CLI/gateway plugin ([see plugins page](./plugins)).
 - `dashboard/manifest.json` + `dashboard/dist/index.js` — dashboard UI plugin.
- `dashboard/plugin_api.py` — dashboard backend routes.
+- `dashboard/plugin_api.py` — bundled plugins only; backend API routes.

 None of them are required; include only the layers you need.

@ -743,7 +743,10 @@ Routes are mounted under `/api/plugins/<name>/`, so the above becomes:
 - `GET  /api/plugins/my-plugin/data`
 - `POST /api/plugins/my-plugin/action`

-Plugin API routes bypass session-token authentication since the dashboard server binds to localhost by default. **Don't expose the dashboard on a public interface with `--host 0.0.0.0` if you run untrusted plugins** — their routes become reachable too.
+Security notes:
+
+- Bundled plugin API routes bypass session-token authentication. The dashboard server binds to localhost by default, which mitigates the risks of this bypass.
+- User-installed and project dashboard plugins may still extend the UI with static JS/CSS, but their Python `api` files are not auto-imported by the dashboard server. Backend routes are reserved for bundled plugins.

 #### Accessing Hermes internals

@ -804,11 +807,14 @@ The dashboard scans three directories for `dashboard/manifest.json`:

 | Priority | Directory | Source label |
 |----------|-----------|--------------|
-| 1 (wins on conflict) | `~/.hermes/plugins/<name>/dashboard/` | `user` |
-| 2 | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
-| 2 | `<repo>/plugins/<name>/dashboard/` | `bundled` |
+| 1 (wins on conflict) | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
+| 1 (wins on conflict) | `<repo>/plugins/<name>/dashboard/` | `bundled` |
+| 2 | `~/.hermes/plugins/<name>/dashboard/` | `user` |
 | 3 | `./.hermes/plugins/<name>/dashboard/` | `project` — only when `HERMES_ENABLE_PROJECT_PLUGINS` is set |

+Bundled dashboard plugins win name conflicts because only bundled plugins may
+register backend routes. Give user and project dashboard plugins unique names.
+
 Discovery results are cached per dashboard process. After adding a new plugin, either:

 ```bash
@ -908,10 +914,11 @@ Check that the file is in `~/.hermes/dashboard-themes/` and ends in `.yaml` or `
 The `sidebar` slot only renders when the active theme has `layoutVariant: cockpit`. Other slots always render. If you're registering into a slot with no hits, add `console.log` inside `registerSlot` to confirm the plugin bundle ran at all.

 **Plugin backend routes return 404.**
-1. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
-2. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
-3. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
-4. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.
+1. Confirm the plugin is bundled with Hermes. User-installed and project dashboard plugins can extend the UI, but their Python backend routes are not auto-imported.
+2. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
+3. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
+4. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
+5. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.

 **Theme change drops my color overrides.**
 `colorOverrides` are scoped to the active theme and cleared on theme switch — that's by design. If you want overrides that persist, put them in your theme's YAML, not in the live switcher.
--- a/website/docs/user-guide/features/goals.md
+++ b/website/docs/user-guide/features/goals.md
@ -40,13 +40,57 @@ What you'll see:
 | Command | What it does |
 |---|---|
 | `/goal <text>` | Set (or replace) the standing goal. Kicks off the first turn immediately so you don't need to send a separate message. |
+| `/goal draft <text>` | Draft a structured completion contract from a plain-language objective, then set it. See [Completion contracts](#completion-contracts). |
+| `/goal show` | Print the active goal's completion contract. |
 | `/goal` or `/goal status` | Show the current goal, its status, and turns used. |
 | `/goal pause` | Stop the auto-continuation loop without clearing the goal. |
 | `/goal resume` | Resume the loop (resets the turn counter back to zero). |
 | `/goal clear` | Drop the goal entirely. |
+| `/goal wait <pid> [reason]` | Park the loop on a background process — it stops re-poking the agent every turn while the process runs, and auto-resumes when it exits. |
+| `/goal unwait` | Drop the wait barrier and resume the loop immediately. |

 Works identically on the CLI and every gateway platform (Telegram, Discord, Slack, Matrix, Signal, WhatsApp, SMS, iMessage, Webhook, API server, and the web dashboard).

+## Completion contracts
+
+A bare `/goal <text>` works fine, but a *vague* goal makes for vague judging — the judge can only check what you told it to want. Codex's `/goal` guidance makes the same point: a durable objective works best when it names **what done means, how to prove it, what not to break, what's in scope, and when to stop**. Hermes adapts this as an optional **completion contract** layered on top of the existing goal loop.
+
+A contract has five fields, all optional:
+
+| Field | Meaning |
+|---|---|
+| `outcome` | The single end state that must be true when done. |
+| `verification` | The specific test / command / artifact that *proves* the outcome. |
+| `constraints` | What must not change or regress. |
+| `boundaries` | Which files, dirs, tools, or systems are in scope. |
+| `stop_when` | The condition under which Hermes should stop and ask for input. |
+
+When a contract is set, both prompts change: the **continuation prompt** tells the agent to target the verification surface and respect the constraints, and the **judge prompt** decides `done` *only when the verification criterion is met with concrete evidence* (a command result, file excerpt, test output) — not a loose "looks done" claim. This directly tightens the most common `/goal` failure mode (premature completion or endless over-continuation on an underspecified objective).
+
+### Two ways to set a contract
+
+**1. Let Hermes draft it** (recommended — adapted from Codex's "let the agent draft the goal" tip):
+
+```
+/goal draft Migrate the auth service from session cookies to JWT
+```
+
+Hermes expands your one-liner into a full contract via the `goal_judge` auxiliary model, sets it, and shows you the result so you can review or tighten any field. If the aux model is unavailable, it falls back to a plain free-form goal — drafting never blocks setting a goal.
+
+**2. Write it inline** with `field: value` lines:
+
+```
+/goal Migrate auth to JWT
+verify: pytest tests/auth passes
+constraints: keep the /login response shape unchanged
+boundaries: only touch services/auth and its tests
+stop when: a DB schema migration is required
+```
+
+The first non-field line(s) are the goal headline; recognized field prefixes (`verify:`, `verified by:`, `constraints:`, `preserve:`, `boundaries:`, `scope:`, `stop when:`, `blocked:`, …) populate the contract. A plain goal with an incidental colon (`Fix bug: the parser drops commas`) is **not** mangled — only known field prefixes are pulled out.
+
+Use `/goal show` to review the active contract. Contracts persist in `SessionDB.state_meta` alongside the goal, so they survive `/resume`. Old goals from before this feature load unchanged (no contract). Contracts and `/subgoal` criteria compose: subgoals fold into the contract as extra criteria the judge must also satisfy.
+
 ## Adding criteria mid-goal: `/subgoal`

 While a goal is active you can append extra acceptance criteria with `/subgoal <text>` without resetting the loop. Each call adds one numbered item to the goal's subgoal list; the **continuation prompt** the agent sees on the next turn includes the original goal plus an "Additional criteria the user added mid-loop" block, and the **judge prompt** is rewritten so the verdict must consider every subgoal — the goal isn't marked done until the original objective **and** every subgoal are met.
@ -62,6 +106,29 @@ Subgoals are persisted alongside the goal in `SessionDB.state_meta`, so they sur

 Use this when you start a loop ("fix the failing tests") and notice partway through that you also want it to "and add a regression test for the bug you just patched" — `/subgoal add a regression test` tightens the success criteria without breaking the running loop.

+## Parking on a background process: automatic, with a manual override
+
+Some goals are gated on something that takes minutes and runs on its own — CI on a pushed PR, a long build, a test matrix, a deploy, a rate-limit cooldown. Without help, the goal loop would re-poke the agent every turn into "is it done yet?" busy-work while it waits.
+
+**This is handled automatically.** Every turn, the judge is shown the agent's live background processes (the `terminal(background=true)` registry — pid, session id, command, uptime, recent output, and any `watch_patterns` / `notify_on_complete` trigger) alongside the goal and the agent's response. When the agent's progress is genuinely gated on one of them, the judge returns a **`wait`** verdict instead of `continue`, and the loop **parks**: the next turns are skipped (no judge call, no continuation, no turn consumed) until the wait is satisfied — then it resumes normally with the result in hand. The judge can also park on a **time** basis (`wait_for_seconds`) for backoff/cooldown waits. `/goal status` shows `⏳ Goal (parked …)` while parked.
+
+The judge picks the right kind of wait from the process's own signal:
+
+- **`wait_on_session <id>`** — releases when the process's *own trigger* fires: it exits, **or** (if it was started with `watch_patterns`) its pattern matches. This is the one for a long-lived watcher / server / poller that signals **mid-run** (e.g. a build process that prints `BUILD SUCCESSFUL` and keeps running, or a `notify_on_complete` watcher) and may never exit on its own.
+- **`wait_on_pid <pid>`** — releases on process exit only.
+- **`wait_for_seconds <n>`** — releases after a fixed delay.
+
+You don't type anything for this — it's the judge's decision, made from the process context the loop hands it. The manual commands exist as an override:
+
+| Command | What it does |
+|---|---|
+| `/goal wait <pid> [reason]` | Manually park the loop until the process with that PID exits. |
+| `/goal unwait` | Clear any wait barrier (judge- or manually-set) and resume immediately. |
+
+The barrier (pid- or time-based) is persisted with the goal in `SessionDB.state_meta`, so it survives `/resume`. `/goal pause`, `/goal resume`, and `/goal clear` all drop it. If the PID is already dead when the barrier is set (or dies while parked), or the time deadline passes, the barrier clears on the next check — a stale barrier can never wedge the loop.
+
+Typical flow: the agent pushes a PR, starts a CI watcher with `terminal(background=true, notify_on_complete=true)`, and reports "watching CI." The judge sees the watcher process still running, returns `wait` on its pid, and the loop goes quiet — then picks back up the instant CI finishes and judges the goal against the actual result.
+
 ## Behavior details

 ### The judge
@ -94,7 +161,7 @@ Any real message you send while a goal is active takes priority over the continu

 ### Mid-run safety (gateway)

-While an agent is already running, `/goal status`, `/goal pause`, and `/goal clear` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
+While an agent is already running, `/goal status`, `/goal pause`, `/goal clear`, `/goal wait`, and `/goal unwait` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.

 ### Persistence

--- a/website/docs/user-guide/features/memory-providers.md
+++ b/website/docs/user-guide/features/memory-providers.md
@ -61,6 +61,8 @@ AI-native cross-session user modeling with dialectic reasoning, session-scoped c
 - `dialecticCadence` — how often the dialectic LLM fires (LLM call frequency)
 - `dialecticDepth` — how many `.chat()` passes per dialectic invocation (1–3, depth of reasoning)

+The auto-injected dialectic also scales its reasoning level by query length (longer query → deeper reasoning, capped at `reasoningLevelCap`); see [Query-Adaptive Reasoning Level](./honcho.md#query-adaptive-reasoning-level).
+
 **Setup Wizard:**
 ```bash
 hermes memory setup        # select "honcho" — runs the Honcho-specific post-setup
@ -315,31 +317,55 @@ echo "OPENVIKING_API_KEY=..." >> ~/.hermes/.env

 ### Mem0

-Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication.
+Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication. Supports both Mem0 Platform (cloud) and OSS (self-hosted) modes.

 | | |
 |---|---|
 | **Best for** | Hands-off memory management — Mem0 handles extraction automatically |
-| **Requires** | `pip install mem0ai` + API key |
-| **Data storage** | Mem0 Cloud |
-| **Cost** | Mem0 pricing |
+| **Requires** | `pip install mem0ai` + API key (platform) or LLM/vector store (OSS) |
+| **Data storage** | Mem0 Cloud (platform) or self-hosted (OSS) |
+| **Cost** | Mem0 pricing (platform) / free (OSS) |

-**Tools:** `mem0_profile` (all stored memories), `mem0_search` (semantic search + reranking), `mem0_conclude` (store verbatim facts)
+**Tools (5):** `mem0_list` (list all memories, paginated), `mem0_search` (semantic search with reranking in platform mode), `mem0_add` (store verbatim facts), `mem0_update` (update by ID), `mem0_delete` (delete by ID)

-**Setup:**
+**Setup (Platform):**
 ```bash
-hermes memory setup    # select "mem0"
+hermes memory setup    # select "mem0" → "Platform"
 # Or manually:
 hermes config set memory.provider mem0
 echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env
 ```

-**Config:** `$HERMES_HOME/mem0.json`
+**Setup (OSS):**
+```bash
+hermes memory setup    # select "mem0" → "Open Source (self-hosted)"
+# Or via flags:
+hermes memory setup mem0 --mode oss --oss-llm openai --oss-llm-key sk-... --oss-vector qdrant
+```
+
+Preview without writing files:
+```bash
+hermes memory setup mem0 --mode oss --oss-llm-key sk-... --dry-run
+```
+
+**Config:** `$HERMES_HOME/mem0.json` (behavioral settings). Only the secret `MEM0_API_KEY` belongs in `~/.hermes/.env`.

 | Key | Default | Description |
 |-----|---------|-------------|
+| `mode` | `platform` | `platform` (Mem0 Cloud) or `oss` (self-hosted) |
 | `user_id` | `hermes-user` | User identifier |
 | `agent_id` | `hermes` | Agent identifier |
+| `rerank` | `true` | Rerank search results for relevance (platform mode only) |
+
+**OSS supported providers:**
+
+| Component | Providers |
+|-----------|-----------|
+| LLM | openai, ollama |
+| Embedder | openai, ollama |
+| Vector Store | qdrant (local/server), pgvector |
+
+**Switching modes:** Re-run `hermes memory setup mem0 --mode <platform|oss>` or edit `mem0.json` directly.

 ---

@ -569,7 +595,7 @@ hermes memory setup
 |----------|---------|------|-------|-------------|----------------|
 | **Honcho** | Cloud | Paid | 5 | `honcho-ai` | Dialectic user modeling + session-scoped context |
 | **OpenViking** | Self-hosted | Free | 5 | `openviking` + server | Filesystem hierarchy + tiered loading |
-| **Mem0** | Cloud | Paid | 3 | `mem0ai` | Server-side LLM extraction |
+| **Mem0** | Cloud/Self-hosted | Free/Paid | 5 | `mem0ai` | Server-side LLM extraction + OSS mode |
 | **Hindsight** | Cloud/Local | Free/Paid | 3 | `hindsight-client` | Knowledge graph + reflect synthesis |
 | **Holographic** | Local | Free | 2 | None | HRR algebra + trust scoring |
 | **RetainDB** | Cloud | $20/mo | 5 | `requests` | Delta compression |
--- a/website/docs/user-guide/features/memory.md
+++ b/website/docs/user-guide/features/memory.md
@ -270,6 +270,31 @@ display:
 > writes to your memory/skill stores, are unaffected by this setting. Set it
 > per-platform via `display.platforms.<platform>.memory_notifications`.

+## Running the review on a cheaper model (`auxiliary.background_review`)
+
+The review runs on your **main chat model** by default, replaying the
+conversation — which is already warm in the prompt cache, so it's cheap cache
+reads. On an expensive main model you can run the review on a cheaper model
+instead:
+
+```yaml
+auxiliary:
+  background_review:
+    provider: openrouter
+    model: google/gemini-3-flash-preview   # auto (default) = main chat model
+```
+
+When you point it at a model **different** from your main one, the review runs
+there for substantially lower cost (~3–5× in benchmarks). Because a different
+model can't reuse your main model's prompt cache anyway, the fork automatically
+replays a compact **digest** of the conversation (recent turns verbatim + a
+summary of older ones) rather than the full transcript — minimizing what it
+writes to the new cache. Capture holds: in testing, memory capture was
+identical and skill capture near-identical to the main-model review.
+
+Leave it at `auto` (or set it to your main model) and nothing changes — the
+review keeps running on the main model with the full warm-cache replay.
+
 ## Controlling skill writes (`skills.write_approval`)

 Skills use the same on/off gate, but the review UX differs because a
--- a/website/docs/user-guide/features/skills.md
+++ b/website/docs/user-guide/features/skills.md
@ -71,6 +71,42 @@ hermes chat --toolsets skills -q "What skills do you have?"
 hermes chat --toolsets skills -q "Show me the axolotl skill"
 ```

+## Learning a skill from sources (`/learn`)
+
+`/learn` is the fast way to turn something you already know — or a pile of
+reference material — into a reusable skill, without hand-writing the
+`SKILL.md`. It is open-ended: point it at *anything you can describe* and the
+agent gathers the material with the tools it already has, then authors a skill
+that follows the [house authoring standards](#skillmd-format) (≤60-char
+description, the standard section order, Hermes-tool framing, no invented
+commands).
+
+```bash
+# A local SDK or doc directory — read with read_file / search_files
+/learn the REST client in ~/projects/acme-sdk, focus on auth + pagination
+
+# An online doc page — fetched with web_extract
+/learn https://docs.example.com/api/quickstart
+
+# The workflow you just walked the agent through in this conversation
+/learn how I just deployed the staging server
+
+# Pasted notes / a described procedure
+/learn filing an expense: open the portal, New > Expense, attach the receipt, submit
+```
+
+Because the live agent does the sourcing, `/learn` works the same in the CLI,
+the messaging gateway, the TUI, and the dashboard — and on any terminal backend
+(local, Docker, remote), since there is no separate ingestion engine. In the
+**dashboard**, the Skills page has a **Learn a skill** button that opens a panel
+with a directory field, a URL field, and an open-ended text box; it composes a
+`/learn` request and runs it in chat.
+
+There is no model-tool footprint: `/learn` builds a standards-guided prompt and
+hands it to the agent as a normal turn. The agent saves the result with the
+`skill_manage` tool, so the [write-approval gate](#gating-agent-skill-writes-skillswrite_approval)
+applies if you have it on.
+
 ## Progressive Disclosure

 Skills use a token-efficient loading pattern: