mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-27 11:22:03 +00:00
Merge remote-tracking branch 'origin/main' into bb/pets-merge
# Conflicts: # hermes_cli/commands.py # tui_gateway/server.py
This commit is contained in:
commit
e495b33bf1
251 changed files with 23395 additions and 2720 deletions
|
|
@ -3,36 +3,45 @@ title: Computer Use
|
|||
sidebar_position: 16
|
||||
---
|
||||
|
||||
# Computer Use (macOS)
|
||||
# Computer Use
|
||||
|
||||
Hermes Agent can drive your Mac's desktop — clicking, typing, scrolling,
|
||||
dragging — in the **background**. Your cursor doesn't move, keyboard focus
|
||||
doesn't change, and macOS doesn't switch Spaces on you. You and the agent
|
||||
co-work on the same machine.
|
||||
Hermes Agent can drive your desktop — clicking, typing, scrolling,
|
||||
dragging — in the **background** on **macOS, Windows, and Linux**. Your
|
||||
cursor doesn't move, keyboard focus doesn't change, and your virtual
|
||||
desktops / Spaces don't switch on you. You and the agent co-work on the
|
||||
same machine.
|
||||
|
||||
Unlike most computer-use integrations, this works with **any tool-capable
|
||||
model** — Claude, GPT, Gemini, or an open model on a local vLLM endpoint.
|
||||
There's no Anthropic-native schema to worry about.
|
||||
model** — Claude, GPT, Gemini, or an open model on a local
|
||||
OpenAI-compatible endpoint. There's no Anthropic-native schema to worry
|
||||
about.
|
||||
|
||||
## How it works
|
||||
|
||||
The `computer_use` toolset speaks MCP over stdio to [`cua-driver`](https://github.com/trycua/cua),
|
||||
a macOS driver that uses SkyLight private SPIs (`SLEventPostToPid`,
|
||||
`SLPSPostEventRecordTo`) and the `_AXObserverAddNotificationAndCheckRemote`
|
||||
accessibility SPI to:
|
||||
The `computer_use` toolset speaks MCP over stdio to
|
||||
[`cua-driver`](https://github.com/trycua/cua), an open-source background
|
||||
computer-use driver. Each platform uses the appropriate accessibility +
|
||||
input stack under the hood:
|
||||
|
||||
- Post synthesized events directly to target processes — no HID event tap,
|
||||
no cursor warp.
|
||||
- Flip AppKit active-state without raising windows — no Space switching.
|
||||
- Keep Chromium/Electron accessibility trees alive when windows are
|
||||
occluded.
|
||||
| Platform | Accessibility tree | Input dispatch |
|
||||
|---|---|---|
|
||||
| macOS | AX (private SkyLight SPIs) | `SLPSPostEventRecordTo` — pid-scoped, no cursor warp |
|
||||
| Windows | UIAutomation | `SendInput` + `PostMessage` — no focus steal |
|
||||
| Linux | AT-SPI (X11 + Wayland) | XTest (X11) / virtual-keyboard (Wayland) |
|
||||
|
||||
That combination is what OpenAI's Codex "background computer-use" ships.
|
||||
cua-driver is the open-source equivalent.
|
||||
The result is the same on every platform: the agent can read the
|
||||
accessibility tree of any visible window AND post synthesized events
|
||||
without bringing it to front, switching virtual desktops, or moving the
|
||||
real OS cursor.
|
||||
|
||||
For the underlying contract — *why* background mode matters, the
|
||||
no-foreground invariant, click-dispatch internals — see
|
||||
**[cua.ai/docs/explanation/the-no-foreground-contract](https://cua.ai/docs/explanation/the-no-foreground-contract)**.
|
||||
|
||||
## Enabling
|
||||
|
||||
Pick whichever path is most convenient — both run the same upstream installer:
|
||||
Pick whichever path is most convenient — both run the same upstream
|
||||
installer:
|
||||
|
||||
**Option 1: dedicated CLI command (most direct).**
|
||||
|
||||
|
|
@ -40,63 +49,142 @@ Pick whichever path is most convenient — both run the same upstream installer:
|
|||
hermes computer-use install
|
||||
```
|
||||
|
||||
This fetches and runs the upstream cua-driver installer:
|
||||
`curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh`.
|
||||
Use `hermes computer-use status` to verify the install.
|
||||
This fetches and runs the upstream cua-driver installer — `install.sh`
|
||||
on macOS/Linux, `install.ps1` on Windows. Use `hermes computer-use
|
||||
status` to verify the install.
|
||||
|
||||
**Option 2: enable the toolset interactively.**
|
||||
|
||||
1. Run `hermes tools`, pick `🖱️ Computer Use (macOS)` → `cua-driver (background)`.
|
||||
1. Run `hermes tools`, pick `🖱️ Computer Use (macOS/Windows/Linux)`.
|
||||
2. The setup runs the upstream installer (same as Option 1).
|
||||
|
||||
After installing, regardless of which path you took:
|
||||
After installing, regardless of which path you took, grant the
|
||||
platform-appropriate prereqs:
|
||||
|
||||
3. Grant macOS permissions when prompted:
|
||||
- **System Settings → Privacy & Security → Accessibility** → allow the
|
||||
terminal (or Hermes app).
|
||||
- **System Settings → Privacy & Security → Screen Recording** → allow
|
||||
the same.
|
||||
4. Start a session with the toolset enabled:
|
||||
```
|
||||
hermes -t computer_use chat
|
||||
```
|
||||
or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.
|
||||
| Platform | Prereqs |
|
||||
|---|---|
|
||||
| **macOS** | System Settings → Privacy & Security → **Accessibility** + **Screen Recording** → allow your terminal (or Hermes app). `hermes computer-use doctor` will tell you which permission is missing. |
|
||||
| **Windows** | None at install time. If you're driving over SSH (not RDP / console), you need the autostart pattern — see [cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh) for the Session 0 ↔ Session 1+ proxy. |
|
||||
| **Linux** | A reachable display server: `DISPLAY` set for X11, or `XDG_SESSION_TYPE=wayland`. Wayland sessions need an XWayland bridge for capture. AT-SPI must be on (default on GNOME/KDE/Xfce). |
|
||||
|
||||
## Keeping cua-driver up to date
|
||||
Then start a session with the toolset enabled:
|
||||
|
||||
The cua-driver project ships fixes regularly (e.g. v0.1.6 fixed a Safari
|
||||
window-focus bug for UTM workflows). Hermes refreshes the binary in two
|
||||
places so you don't get stuck on a stale release:
|
||||
```
|
||||
hermes -t computer_use chat
|
||||
```
|
||||
|
||||
- **`hermes update`** — when you update Hermes itself, if `cua-driver` is
|
||||
on PATH the upstream installer re-runs at the end of the update.
|
||||
No-op for non-macOS users and for users without cua-driver installed.
|
||||
- **`hermes computer-use install --upgrade`** — manual force-refresh.
|
||||
Re-runs the upstream installer regardless of whether cua-driver is
|
||||
already installed. Use this when you want the latest fix without
|
||||
waiting for the next agent update.
|
||||
or add `computer_use` to your enabled toolsets in `~/.hermes/config.yaml`.
|
||||
|
||||
`hermes computer-use status` shows the installed version next to the
|
||||
binary path.
|
||||
## `hermes computer-use doctor` — your first triage stop
|
||||
|
||||
`hermes computer-use doctor` runs cua-driver's structured
|
||||
`health_report` MCP tool and prints a per-check matrix. It's the single
|
||||
fastest way to find out *why* an action isn't working.
|
||||
|
||||
```
|
||||
$ hermes computer-use doctor
|
||||
⚠️ cua-driver 0.5.8 on darwin — degraded
|
||||
✅ binary_version: cua-driver 0.5.8
|
||||
✅ platform_supported: macOS 26.4.1 (arm64)
|
||||
✅ session_active: MCP session is active.
|
||||
❌ bundle_identity: Process has no CFBundleIdentifier.
|
||||
→ Run the binary inside CuaDriver.app so TCC grants attribute correctly.
|
||||
✅ tcc_accessibility: Accessibility is granted.
|
||||
✅ tcc_screen_recording: Screen Recording is granted.
|
||||
✅ ax_capability: AX is trusted and reachable.
|
||||
✅ screen_capture_capability: ScreenCaptureKit reachable; 1 display(s) shareable.
|
||||
```
|
||||
|
||||
- **Exit code 0** when overall is `ok` — everything's wired up.
|
||||
- **Exit code 1** when `degraded` or `failed` — at least one check failed; the hint on each failure tells you what to fix.
|
||||
- **Exit code 2** when the cua-driver binary itself isn't reachable.
|
||||
|
||||
Useful flags:
|
||||
|
||||
- `--include CHECK` — run only the listed checks (repeat for multiple)
|
||||
- `--skip CHECK` — skip a check (wins over `--include`)
|
||||
- `--json` — emit the raw structured payload, same shape as the
|
||||
`tools/call health_report` MCP response
|
||||
|
||||
The check matrix is platform-aware: `bundle_identity` / `tcc_*` are
|
||||
`skip` on Windows + Linux because those concepts don't apply.
|
||||
`ax_capability` checks AX on macOS, UIA on Windows, AT-SPI on Linux —
|
||||
each with the right diagnostic hint when it can't reach.
|
||||
|
||||
## The agent cursor and sessions
|
||||
|
||||
When the agent acts, you'll see a **tinted overlay cursor** glide
|
||||
across the screen to where each click / type / scroll lands. The real
|
||||
OS cursor never moves — the overlay is a visual cue that says "the
|
||||
agent is acting here." Each Hermes run declares its own cua-driver
|
||||
**session id** (something like `hermes-3a7b9c14d2e8`); the cursor's
|
||||
identity is keyed to that session, so concurrent runs / subagents each
|
||||
get their own cursor without stepping on each other.
|
||||
|
||||
Tune the cursor with `cua-driver`'s CLI flags or the runtime
|
||||
`set_agent_cursor_style` MCP tool — see
|
||||
[cua.ai/docs/how-to-guides/driver/personalize-cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor)
|
||||
for the full menu (built-in `arrow` vs `teardrop` silhouette, custom
|
||||
SVG / PNG / ICO via `--cursor-icon`, runtime gradient colors, bloom
|
||||
halo).
|
||||
|
||||
## Going deeper — the cua-driver skill pack
|
||||
|
||||
Hermes intentionally keeps its skill (`skills/computer-use/SKILL.md`)
|
||||
focused on the Hermes-side `computer_use` action vocabulary — the
|
||||
single source of truth the agent loads. For the deeper material —
|
||||
platform-specific deep dives, recording semantics, browser page
|
||||
interaction — point your agent harness at the cua-driver skill pack
|
||||
the cua-driver team ships and maintains directly:
|
||||
|
||||
```
|
||||
cua-driver skills install
|
||||
```
|
||||
|
||||
This symlinks the pack into your agent harness' skill directory. After
|
||||
running it, an agent gets access to:
|
||||
|
||||
| File | Topic |
|
||||
|---|---|
|
||||
| `SKILL.md` | The cross-platform core (snapshot invariant, no-foreground contract, click dispatch, AX-tree mechanics) |
|
||||
| `MACOS.md` | macOS specifics: no-foreground contract, AXMenuBar navigation, SkyLight click dispatch, Apple Events JS bridge |
|
||||
| `WINDOWS.md` | Windows specifics: UIA tree, UWP / `ApplicationFrameHost` hosting, Session 0 isolation, autostart pattern |
|
||||
| `LINUX.md` | Linux specifics: AT-SPI tree, X11 / Wayland, terminal-emulator detection |
|
||||
| `RECORDING.md` | Trajectory + video recording semantics |
|
||||
| `WEB_APPS.md` | Browser-page interaction tips |
|
||||
| `TESTS.md` | Replay-by-trajectory workflow |
|
||||
|
||||
These are **platform deep dives, not duplicates of the Hermes skill** —
|
||||
when an agent reports "on Windows, my click landed on the wrong
|
||||
element," it reads `WINDOWS.md` for the UIA / UWP context that
|
||||
explains why and what to do differently.
|
||||
|
||||
`cua-driver skills status` shows what's installed and which agent
|
||||
harnesses it's linked into. Today the autodetect list covers Claude
|
||||
Code, Codex, OpenCode, OpenClaw, and Antigravity; **Hermes
|
||||
autodetection is planned as a follow-up in `trycua/cua`** — until
|
||||
then, run `cua-driver skills install` once and point your harness at
|
||||
the resulting `~/.cua-driver/skills/cua-driver` directory (or symlink
|
||||
it into your usual skill space).
|
||||
|
||||
## Quick example
|
||||
|
||||
User prompt: *"Find my latest email from Stripe and summarise what they want me to do."*
|
||||
|
||||
The agent's plan:
|
||||
The agent's plan (this is the same shape on macOS / Windows / Linux —
|
||||
the model substitutes the platform's idiomatic shortcut and app name):
|
||||
|
||||
1. `computer_use(action="capture", mode="som", app="Mail")` — gets a
|
||||
screenshot of Mail with every sidebar item, toolbar button, and message
|
||||
row numbered.
|
||||
2. `computer_use(action="click", element=14)` — clicks the search field
|
||||
(element #14 from the capture).
|
||||
screenshot of the email app with every sidebar item, toolbar button,
|
||||
and message row numbered.
|
||||
2. `computer_use(action="click", element=14)` — clicks the search field.
|
||||
3. `computer_use(action="type", text="from:stripe")`
|
||||
4. `computer_use(action="key", keys="return", capture_after=True)` — submit
|
||||
and get the new screenshot.
|
||||
4. `computer_use(action="key", keys="return", capture_after=True)` —
|
||||
submit and get the new screenshot.
|
||||
5. Click the top result, read the body, summarise.
|
||||
|
||||
During all of this, your cursor stays wherever you left it and Mail never
|
||||
comes to front.
|
||||
During all of this, your cursor stays wherever you left it and the email
|
||||
app never comes to front.
|
||||
|
||||
## Provider compatibility
|
||||
|
||||
|
|
@ -105,29 +193,33 @@ comes to front.
|
|||
| Anthropic (Claude Sonnet/Opus 3+) | ✅ | ✅ | Best overall; SOM + raw coordinates. |
|
||||
| OpenRouter (any vision model) | ✅ | ✅ | Multi-part tool messages supported. |
|
||||
| OpenAI (GPT-4+, GPT-5) | ✅ | ✅ | Same as above. |
|
||||
| Local vLLM / LM Studio (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
|
||||
| Google (Gemini 2+) | ✅ | ✅ | Tool-calling + vision both supported. |
|
||||
| Local vLLM / LM Studio / Ollama (vision model) | ✅ | ✅ | If the model supports multi-part tool content. |
|
||||
| Text-only models | ❌ | ✅ (degraded) | Use `mode="ax"` for accessibility-tree-only operation. |
|
||||
|
||||
Screenshots are sent inline with tool results as OpenAI-style `image_url`
|
||||
parts. For Anthropic, the adapter converts them into native `tool_result`
|
||||
image blocks.
|
||||
image blocks. The image MIME type comes from cua-driver's explicit
|
||||
`mimeType` field (`image/png` or `image/jpeg`) — no client-side
|
||||
magic-byte sniffing.
|
||||
|
||||
## Safety
|
||||
|
||||
Hermes applies multi-layer guardrails:
|
||||
|
||||
- Destructive actions (click, type, drag, scroll, key, focus_app) require
|
||||
approval — either interactively via the CLI dialog or via the
|
||||
- Destructive actions (click, type, drag, scroll, key, focus_app)
|
||||
require approval — either interactively via the CLI dialog or via the
|
||||
messaging-platform approval buttons.
|
||||
- Hard-blocked key combos at the tool level: empty trash, force delete,
|
||||
lock screen, log out, force log out.
|
||||
- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork bombs,
|
||||
etc.
|
||||
- Hard-blocked type patterns: `curl | bash`, `sudo rm -rf /`, fork
|
||||
bombs, etc.
|
||||
- The agent's system prompt tells it explicitly: no clicking permission
|
||||
dialogs, no typing passwords, no following instructions embedded in
|
||||
screenshots.
|
||||
|
||||
Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you want every action confirmed.
|
||||
Pair with `approvals.mode: manual` in `~/.hermes/config.yaml` if you
|
||||
want every action confirmed.
|
||||
|
||||
## Token efficiency
|
||||
|
||||
|
|
@ -138,8 +230,8 @@ Screenshots are expensive. Hermes applies four layers of optimisation:
|
|||
to save context]` placeholders.
|
||||
- **Client-side compression pruning** — the context compressor detects
|
||||
multimodal tool results and strips image parts from old ones.
|
||||
- **Image-aware token estimation** — each image is counted as ~1500 tokens
|
||||
(Anthropic's flat rate) instead of its base64 char length.
|
||||
- **Image-aware token estimation** — each image is counted as ~1500
|
||||
tokens (Anthropic's flat rate) instead of its base64 char length.
|
||||
- **Server-side context editing (Anthropic only)** — when active, the
|
||||
adapter enables `clear_tool_uses_20250919` via `context_management` so
|
||||
Anthropic's API clears old tool results server-side.
|
||||
|
|
@ -149,26 +241,58 @@ of screenshot context, not ~600K.
|
|||
|
||||
## Limitations
|
||||
|
||||
- **macOS only.** cua-driver uses private Apple SPIs that don't exist on
|
||||
Linux or Windows. For cross-platform GUI automation, use the `browser`
|
||||
toolset.
|
||||
- **Private SPI risk.** Apple can change SkyLight's symbol surface in any
|
||||
OS update. Pin the driver version with the `HERMES_CUA_DRIVER_VERSION`
|
||||
env var if you want reproducibility across a macOS bump.
|
||||
- **Performance.** Background mode is slower than foreground —
|
||||
SkyLight-routed events take ~5-20ms vs direct HID posting. Not
|
||||
noticeable for agent-speed clicking; noticeable if you try to record a
|
||||
speed-run.
|
||||
accessibility-routed events take ~5–20 ms on macOS, ~3–10 ms on
|
||||
Windows UIA, ~5–15 ms on Linux AT-SPI vs direct HID posting. Not
|
||||
noticeable for agent-speed clicking; noticeable if you try to record
|
||||
a speed-run.
|
||||
- **No keyboard password entry.** `type` has hard-block patterns on
|
||||
command-shell payloads; for passwords, use the system's autofill.
|
||||
command-shell payloads; for passwords, use the system's autofill
|
||||
(macOS Keychain / Windows Credential Manager / GNOME Keyring /
|
||||
KWallet).
|
||||
- **Some apps don't expose an accessibility tree.** Modern UWP apps on
|
||||
Windows, Electron < 28 on Linux, and a few macOS apps with custom
|
||||
drawing (Logic, Final Cut, some games) have sparse or empty AX trees.
|
||||
Fall back to pixel coordinates if the tree is empty — or skip the
|
||||
task entirely.
|
||||
- **Windows: elevated (admin) windows can't be driven from a normal
|
||||
agent.** Windows UIPI (User Interface Privilege Isolation) enforces
|
||||
integrity-level boundaries: a Medium-integrity process (the default
|
||||
Hermes agent) cannot enumerate the UIA tree of, or inject mouse input
|
||||
into, a window owned by a High-integrity (Administrator) process.
|
||||
Symptom: `capture(mode='som')` returns 0 elements and `click(...)`
|
||||
reports success while doing nothing, even though the screenshot
|
||||
renders fine (GDI capture sits below the integrity check). Keyboard
|
||||
events partially bypass UIPI, so Tab / Enter can still navigate an
|
||||
elevated dialog. This is an OS constraint, not a cua-driver bug — it
|
||||
affects every Windows automation stack. To drive elevated windows,
|
||||
run the Hermes agent itself at High integrity (launch from an
|
||||
elevated terminal); otherwise target non-elevated windows.
|
||||
- **Platform-specific deployment gotchas:**
|
||||
- **macOS** uses private SkyLight SPIs. Apple can change them in any
|
||||
OS update. Hermes warns when the installed cua-driver is older than
|
||||
the version it was tested against.
|
||||
- **Windows** SSH sessions run in **Session 0**, which has no
|
||||
interactive desktop. Drive Hermes from inside the RDP / console
|
||||
session, or set up cua-driver's autostart Scheduled Task —
|
||||
[windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh)
|
||||
has the recipe.
|
||||
- **Linux** requires a reachable display server. Headless servers
|
||||
need Xvfb (`Xvfb :99 -screen 0 1920x1080x24`) before
|
||||
`computer_use` can capture or inject events. Pure Wayland sessions
|
||||
need an XWayland bridge for screen capture (cua-driver's Wayland
|
||||
inject path handles input independently).
|
||||
|
||||
For cross-platform GUI automation without the desktop overhead (and
|
||||
without TCC / Session 0 / X11 setup), the `browser` toolset uses a
|
||||
real headless Chromium and is the right answer for web-only tasks.
|
||||
|
||||
## Configuration
|
||||
|
||||
Override the driver binary path (tests / CI):
|
||||
Override the driver binary path (tests / CI / local builds):
|
||||
|
||||
```
|
||||
HERMES_CUA_DRIVER_CMD=/opt/homebrew/bin/cua-driver
|
||||
HERMES_CUA_DRIVER_VERSION=0.5.0 # optional pin
|
||||
HERMES_CUA_DRIVER_CMD=/path/to/your/cua-driver
|
||||
```
|
||||
|
||||
Swap the backend entirely (for testing):
|
||||
|
|
@ -177,25 +301,170 @@ Swap the backend entirely (for testing):
|
|||
HERMES_COMPUTER_USE_BACKEND=noop # records calls, no side effects
|
||||
```
|
||||
|
||||
### Telemetry
|
||||
|
||||
cua-driver ships with anonymous usage telemetry (PostHog) enabled by default
|
||||
upstream. **Hermes disables it for you** — on every cua-driver invocation
|
||||
(the MCP backend, `status`, `doctor`, and install) Hermes sets
|
||||
`CUA_DRIVER_RS_TELEMETRY_ENABLED=0` in the driver's environment.
|
||||
|
||||
To opt back in (let cua-driver use its own default and send telemetry), set
|
||||
this in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
computer_use:
|
||||
cua_telemetry: true # default: false (telemetry off)
|
||||
```
|
||||
|
||||
When it's on, `hermes computer-use doctor` reports `telemetry: enabled`;
|
||||
when off (the default), it reports `telemetry: disabled via
|
||||
CUA_DRIVER_RS_TELEMETRY_ENABLED`.
|
||||
|
||||
## Testing against a local cua-driver build
|
||||
|
||||
When you're developing cua-driver itself — or want to test an
|
||||
unreleased fix — point Hermes at a binary you built from source instead
|
||||
of the published release. Hermes resolves the driver with
|
||||
`shutil.which("cua-driver")` and **does not enforce
|
||||
`HERMES_CUA_DRIVER_VERSION`**, so a local build (reported as
|
||||
`0.0.0-local-*`) is accepted as-is. Two approaches:
|
||||
|
||||
### Option A — `install-local` (build + put it on PATH)
|
||||
|
||||
From your `trycua/cua` checkout, run the upstream local installer. It
|
||||
builds the Rust backend in release mode and drops `cua-driver` into the
|
||||
same install layout the production installer uses, adding its bin dir
|
||||
to your PATH:
|
||||
|
||||
```powershell
|
||||
# Windows (PowerShell), from the cua repo root
|
||||
./libs/cua-driver/scripts/install-local.ps1 -NoAutoStart
|
||||
```
|
||||
|
||||
```bash
|
||||
# macOS / Linux, from the cua repo root (defaults to a debug build without --release)
|
||||
./libs/cua-driver/scripts/install-local.sh --release
|
||||
```
|
||||
|
||||
- Windows stages the build under `%USERPROFILE%\.cua-driver\packages\…`
|
||||
and junctions
|
||||
`%LOCALAPPDATA%\Programs\Cua\cua-driver\bin` (added to your User
|
||||
PATH) to it. macOS/Linux symlinks `cua-driver` into `~/.local/bin`
|
||||
(override with `--bin-dir <path>`).
|
||||
- `-NoAutoStart` skips registering the `cua-driver-serve` logon daemon
|
||||
— you don't need it for Hermes testing (see notes).
|
||||
|
||||
Then open a fresh shell (so the PATH change is visible) and confirm:
|
||||
|
||||
```
|
||||
cua-driver --version # local builds report 0.0.0-local-release
|
||||
# Windows: (Get-Command cua-driver).Source
|
||||
# macOS/Linux: which cua-driver
|
||||
```
|
||||
|
||||
### Option B — point Hermes straight at the built binary (fastest loop)
|
||||
|
||||
Skip the install ceremony entirely: `cargo build` and set
|
||||
`HERMES_CUA_DRIVER_CMD` to the resulting binary. Best for rapid
|
||||
edit/build/test.
|
||||
|
||||
```bash
|
||||
cargo build -p cua-driver # add --release for a release build; run from libs/cua-driver/rust
|
||||
```
|
||||
|
||||
```
|
||||
# Windows (.env)
|
||||
HERMES_CUA_DRIVER_CMD=C:\path\to\cua\libs\cua-driver\rust\target\debug\cua-driver.exe
|
||||
# macOS / Linux (.env)
|
||||
HERMES_CUA_DRIVER_CMD=/path/to/cua/libs/cua-driver/rust/target/debug/cua-driver
|
||||
```
|
||||
|
||||
### Confirm Hermes is using your build
|
||||
|
||||
- `hermes computer-use status` prints the resolved binary path and
|
||||
version.
|
||||
- `hermes computer-use doctor` confirms the binary is reachable and
|
||||
exercises the full MCP path end-to-end.
|
||||
- In a session, `computer_use(action="capture")` exercises the spawned
|
||||
`cua-driver mcp` child process.
|
||||
|
||||
### Notes & gotchas
|
||||
|
||||
- **Hermes spawns its own `cua-driver mcp` child over stdio** — it does
|
||||
*not* attach to the long-running `cua-driver serve` autostart daemon
|
||||
or its named pipe. So the scheduled task / LaunchAgent is unnecessary
|
||||
for testing (`-NoAutoStart` is fine). The autostart daemon and the
|
||||
Windows UIAccess worker (`cua-driver-uia.exe`) only matter for
|
||||
foreground-safe input on some apps (e.g. WPF); the standard tool
|
||||
surface works through the stdio child. On Windows SSH sessions, the
|
||||
autostart pattern IS needed — see the Limitations section.
|
||||
- **Locked binary on Windows.** A running `cua-driver-serve` daemon can
|
||||
hold `cua-driver.exe` and block an overwrite on rebuild.
|
||||
`install-local.ps1` renames the locked binary out of the way
|
||||
automatically; if you `cargo build` manually (Option B), stop it
|
||||
first with `cua-driver autostart disable` (or `schtasks /End /TN
|
||||
cua-driver-serve`).
|
||||
- **Rebuild loop.** After editing cua-driver source, re-run
|
||||
`install-local` (rebuilds, restages, flips the `current` junction)
|
||||
for Option A, or just re-`cargo build` for Option B — no Hermes
|
||||
change needed either way.
|
||||
- **Local builds skip the version check.** Hermes warns when the
|
||||
installed cua-driver is older than its per-OS tested baseline, but
|
||||
exempts `0.0.0-local-*` dev builds — so your local build never
|
||||
triggers that warning.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**`computer_use backend unavailable: cua-driver is not installed`** — Run
|
||||
`hermes computer-use install` to fetch the cua-driver binary, or run
|
||||
`hermes tools` and enable the Computer Use toolset.
|
||||
**First action when anything's off: run `hermes computer-use doctor`.**
|
||||
The structured per-check matrix tells you (and any agent helping you
|
||||
debug) exactly what's wrong.
|
||||
|
||||
Specific failure modes the doctor doesn't catch:
|
||||
|
||||
**`computer_use backend unavailable: cua-driver is not installed`** —
|
||||
Run `hermes computer-use install` to fetch the cua-driver binary, or
|
||||
run `hermes tools` and enable the Computer Use toolset.
|
||||
|
||||
**Clicks seem to have no effect** — Capture and verify. A modal you
|
||||
didn't see may be blocking input. Dismiss it with `escape` or the close
|
||||
button.
|
||||
|
||||
**Element indices are stale** — SOM indices are only valid until the
|
||||
next `capture`. Re-capture after any state-changing action.
|
||||
next `capture`. Re-capture after any state-changing action. The
|
||||
wrapper carries opaque `element_token`s for stale detection — you'll
|
||||
see an explicit error rather than a wrong click.
|
||||
|
||||
**"blocked pattern in type text"** — The text you tried to `type`
|
||||
matches the dangerous-shell-pattern list. Break the command up or
|
||||
reconsider.
|
||||
|
||||
**Empty captures on Linux** — `DISPLAY` not set, or you're on pure
|
||||
Wayland without an XWayland bridge. `hermes computer-use doctor` will
|
||||
flag this as `ax_capability: fail` with a `Set DISPLAY (X11)…` hint.
|
||||
|
||||
**Empty captures on Windows over SSH** — You're in Session 0 (the
|
||||
services session). Drive from RDP / console directly, or set up the
|
||||
autostart pattern — see
|
||||
[cua.ai/docs/how-to-guides/driver/windows-ssh](https://cua.ai/docs/how-to-guides/driver/windows-ssh).
|
||||
|
||||
## See also
|
||||
|
||||
- [Universal skill: `macos-computer-use`](https://github.com/NousResearch/hermes-agent/blob/main/skills/apple/macos-computer-use/SKILL.md)
|
||||
- **Hermes-side skill** — `skills/computer-use/SKILL.md` — teaches the
|
||||
Hermes `computer_use` action vocabulary; this is what the agent loads.
|
||||
- **cua-driver skill pack** — for platform-specific deep dives
|
||||
(macOS no-foreground contract, Windows UIA + Session 0, Linux AT-SPI
|
||||
+ X11/Wayland, recording, browser pages), run
|
||||
`cua-driver skills install` and read `MACOS.md` / `WINDOWS.md` /
|
||||
`LINUX.md` / `RECORDING.md` / `WEB_APPS.md`. Once `cua-driver skills
|
||||
install` autodetects Hermes (planned follow-up), this happens
|
||||
automatically on install.
|
||||
- **cua.ai/docs** — the cua-driver project's documentation:
|
||||
- [What is computer use?](https://cua.ai/docs/explanation/what-is-computer-use) — concept intro
|
||||
- [The no-foreground contract](https://cua.ai/docs/explanation/the-no-foreground-contract) — *why* background mode matters
|
||||
- [Install reference](https://cua.ai/docs/how-to-guides/driver/install) — cross-platform install details
|
||||
- [Personalize the agent cursor](https://cua.ai/docs/how-to-guides/driver/personalize-cursor) — built-in shapes, custom assets, runtime overrides
|
||||
- [Drive Windows over SSH](https://cua.ai/docs/how-to-guides/driver/windows-ssh) — the Session 0 → Session 1+ autostart pattern
|
||||
- [Keep cua-driver running](https://cua.ai/docs/how-to-guides/driver/keep-running) — autostart / daemon lifecycle
|
||||
- [Connect your agent](https://cua.ai/docs/how-to-guides/driver/connect-your-agent) — register cua-driver with various harnesses (Hermes among them)
|
||||
- [cua-driver source (trycua/cua)](https://github.com/trycua/cua)
|
||||
- [Browser automation](./browser.md) for cross-platform web tasks.
|
||||
- [Browser automation](./browser.md) for cross-platform web tasks where you don't need to drive native apps.
|
||||
|
|
|
|||
|
|
@ -431,14 +431,14 @@ If you prefer JSX, use any bundler (esbuild, Vite, rollup) with React as an exte
|
|||
├── dist/
|
||||
│ ├── index.js # required — pre-built JS bundle (IIFE)
|
||||
│ └── style.css # optional — custom CSS
|
||||
└── plugin_api.py # optional — backend API routes (FastAPI)
|
||||
└── plugin_api.py # bundled plugins only — backend API routes (FastAPI)
|
||||
```
|
||||
|
||||
A single plugin directory can carry three orthogonal extensions:
|
||||
|
||||
- `plugin.yaml` + `__init__.py` — CLI/gateway plugin ([see plugins page](./plugins)).
|
||||
- `dashboard/manifest.json` + `dashboard/dist/index.js` — dashboard UI plugin.
|
||||
- `dashboard/plugin_api.py` — dashboard backend routes.
|
||||
- `dashboard/plugin_api.py` — bundled plugins only; backend API routes.
|
||||
|
||||
None of them are required; include only the layers you need.
|
||||
|
||||
|
|
@ -743,7 +743,10 @@ Routes are mounted under `/api/plugins/<name>/`, so the above becomes:
|
|||
- `GET /api/plugins/my-plugin/data`
|
||||
- `POST /api/plugins/my-plugin/action`
|
||||
|
||||
Plugin API routes bypass session-token authentication since the dashboard server binds to localhost by default. **Don't expose the dashboard on a public interface with `--host 0.0.0.0` if you run untrusted plugins** — their routes become reachable too.
|
||||
Security notes:
|
||||
|
||||
- Bundled plugin API routes bypass session-token authentication. The dashboard server binds to localhost by default, which mitigates the risks of this bypass.
|
||||
- User-installed and project dashboard plugins may still extend the UI with static JS/CSS, but their Python `api` files are not auto-imported by the dashboard server. Backend routes are reserved for bundled plugins.
|
||||
|
||||
#### Accessing Hermes internals
|
||||
|
||||
|
|
@ -804,11 +807,14 @@ The dashboard scans three directories for `dashboard/manifest.json`:
|
|||
|
||||
| Priority | Directory | Source label |
|
||||
|----------|-----------|--------------|
|
||||
| 1 (wins on conflict) | `~/.hermes/plugins/<name>/dashboard/` | `user` |
|
||||
| 2 | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
|
||||
| 2 | `<repo>/plugins/<name>/dashboard/` | `bundled` |
|
||||
| 1 (wins on conflict) | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
|
||||
| 1 (wins on conflict) | `<repo>/plugins/<name>/dashboard/` | `bundled` |
|
||||
| 2 | `~/.hermes/plugins/<name>/dashboard/` | `user` |
|
||||
| 3 | `./.hermes/plugins/<name>/dashboard/` | `project` — only when `HERMES_ENABLE_PROJECT_PLUGINS` is set |
|
||||
|
||||
Bundled dashboard plugins win name conflicts because only bundled plugins may
|
||||
register backend routes. Give user and project dashboard plugins unique names.
|
||||
|
||||
Discovery results are cached per dashboard process. After adding a new plugin, either:
|
||||
|
||||
```bash
|
||||
|
|
@ -908,10 +914,11 @@ Check that the file is in `~/.hermes/dashboard-themes/` and ends in `.yaml` or `
|
|||
The `sidebar` slot only renders when the active theme has `layoutVariant: cockpit`. Other slots always render. If you're registering into a slot with no hits, add `console.log` inside `registerSlot` to confirm the plugin bundle ran at all.
|
||||
|
||||
**Plugin backend routes return 404.**
|
||||
1. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
|
||||
2. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
|
||||
3. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
|
||||
4. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.
|
||||
1. Confirm the plugin is bundled with Hermes. User-installed and project dashboard plugins can extend the UI, but their Python backend routes are not auto-imported.
|
||||
2. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
|
||||
3. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
|
||||
4. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
|
||||
5. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.
|
||||
|
||||
**Theme change drops my color overrides.**
|
||||
`colorOverrides` are scoped to the active theme and cleared on theme switch — that's by design. If you want overrides that persist, put them in your theme's YAML, not in the live switcher.
|
||||
|
|
|
|||
|
|
@ -40,13 +40,57 @@ What you'll see:
|
|||
| Command | What it does |
|
||||
|---|---|
|
||||
| `/goal <text>` | Set (or replace) the standing goal. Kicks off the first turn immediately so you don't need to send a separate message. |
|
||||
| `/goal draft <text>` | Draft a structured completion contract from a plain-language objective, then set it. See [Completion contracts](#completion-contracts). |
|
||||
| `/goal show` | Print the active goal's completion contract. |
|
||||
| `/goal` or `/goal status` | Show the current goal, its status, and turns used. |
|
||||
| `/goal pause` | Stop the auto-continuation loop without clearing the goal. |
|
||||
| `/goal resume` | Resume the loop (resets the turn counter back to zero). |
|
||||
| `/goal clear` | Drop the goal entirely. |
|
||||
| `/goal wait <pid> [reason]` | Park the loop on a background process — it stops re-poking the agent every turn while the process runs, and auto-resumes when it exits. |
|
||||
| `/goal unwait` | Drop the wait barrier and resume the loop immediately. |
|
||||
|
||||
Works identically on the CLI and every gateway platform (Telegram, Discord, Slack, Matrix, Signal, WhatsApp, SMS, iMessage, Webhook, API server, and the web dashboard).
|
||||
|
||||
## Completion contracts
|
||||
|
||||
A bare `/goal <text>` works fine, but a *vague* goal makes for vague judging — the judge can only check what you told it to want. Codex's `/goal` guidance makes the same point: a durable objective works best when it names **what done means, how to prove it, what not to break, what's in scope, and when to stop**. Hermes adapts this as an optional **completion contract** layered on top of the existing goal loop.
|
||||
|
||||
A contract has five fields, all optional:
|
||||
|
||||
| Field | Meaning |
|
||||
|---|---|
|
||||
| `outcome` | The single end state that must be true when done. |
|
||||
| `verification` | The specific test / command / artifact that *proves* the outcome. |
|
||||
| `constraints` | What must not change or regress. |
|
||||
| `boundaries` | Which files, dirs, tools, or systems are in scope. |
|
||||
| `stop_when` | The condition under which Hermes should stop and ask for input. |
|
||||
|
||||
When a contract is set, both prompts change: the **continuation prompt** tells the agent to target the verification surface and respect the constraints, and the **judge prompt** decides `done` *only when the verification criterion is met with concrete evidence* (a command result, file excerpt, test output) — not a loose "looks done" claim. This directly tightens the most common `/goal` failure mode (premature completion or endless over-continuation on an underspecified objective).
|
||||
|
||||
### Two ways to set a contract
|
||||
|
||||
**1. Let Hermes draft it** (recommended — adapted from Codex's "let the agent draft the goal" tip):
|
||||
|
||||
```
|
||||
/goal draft Migrate the auth service from session cookies to JWT
|
||||
```
|
||||
|
||||
Hermes expands your one-liner into a full contract via the `goal_judge` auxiliary model, sets it, and shows you the result so you can review or tighten any field. If the aux model is unavailable, it falls back to a plain free-form goal — drafting never blocks setting a goal.
|
||||
|
||||
**2. Write it inline** with `field: value` lines:
|
||||
|
||||
```
|
||||
/goal Migrate auth to JWT
|
||||
verify: pytest tests/auth passes
|
||||
constraints: keep the /login response shape unchanged
|
||||
boundaries: only touch services/auth and its tests
|
||||
stop when: a DB schema migration is required
|
||||
```
|
||||
|
||||
The first non-field line(s) are the goal headline; recognized field prefixes (`verify:`, `verified by:`, `constraints:`, `preserve:`, `boundaries:`, `scope:`, `stop when:`, `blocked:`, …) populate the contract. A plain goal with an incidental colon (`Fix bug: the parser drops commas`) is **not** mangled — only known field prefixes are pulled out.
|
||||
|
||||
Use `/goal show` to review the active contract. Contracts persist in `SessionDB.state_meta` alongside the goal, so they survive `/resume`. Old goals from before this feature load unchanged (no contract). Contracts and `/subgoal` criteria compose: subgoals fold into the contract as extra criteria the judge must also satisfy.
|
||||
|
||||
## Adding criteria mid-goal: `/subgoal`
|
||||
|
||||
While a goal is active you can append extra acceptance criteria with `/subgoal <text>` without resetting the loop. Each call adds one numbered item to the goal's subgoal list; the **continuation prompt** the agent sees on the next turn includes the original goal plus an "Additional criteria the user added mid-loop" block, and the **judge prompt** is rewritten so the verdict must consider every subgoal — the goal isn't marked done until the original objective **and** every subgoal are met.
|
||||
|
|
@ -62,6 +106,29 @@ Subgoals are persisted alongside the goal in `SessionDB.state_meta`, so they sur
|
|||
|
||||
Use this when you start a loop ("fix the failing tests") and notice partway through that you also want it to "and add a regression test for the bug you just patched" — `/subgoal add a regression test` tightens the success criteria without breaking the running loop.
|
||||
|
||||
## Parking on a background process: automatic, with a manual override
|
||||
|
||||
Some goals are gated on something that takes minutes and runs on its own — CI on a pushed PR, a long build, a test matrix, a deploy, a rate-limit cooldown. Without help, the goal loop would re-poke the agent every turn into "is it done yet?" busy-work while it waits.
|
||||
|
||||
**This is handled automatically.** Every turn, the judge is shown the agent's live background processes (the `terminal(background=true)` registry — pid, session id, command, uptime, recent output, and any `watch_patterns` / `notify_on_complete` trigger) alongside the goal and the agent's response. When the agent's progress is genuinely gated on one of them, the judge returns a **`wait`** verdict instead of `continue`, and the loop **parks**: the next turns are skipped (no judge call, no continuation, no turn consumed) until the wait is satisfied — then it resumes normally with the result in hand. The judge can also park on a **time** basis (`wait_for_seconds`) for backoff/cooldown waits. `/goal status` shows `⏳ Goal (parked …)` while parked.
|
||||
|
||||
The judge picks the right kind of wait from the process's own signal:
|
||||
|
||||
- **`wait_on_session <id>`** — releases when the process's *own trigger* fires: it exits, **or** (if it was started with `watch_patterns`) its pattern matches. This is the one for a long-lived watcher / server / poller that signals **mid-run** (e.g. a build process that prints `BUILD SUCCESSFUL` and keeps running, or a `notify_on_complete` watcher) and may never exit on its own.
|
||||
- **`wait_on_pid <pid>`** — releases on process exit only.
|
||||
- **`wait_for_seconds <n>`** — releases after a fixed delay.
|
||||
|
||||
You don't type anything for this — it's the judge's decision, made from the process context the loop hands it. The manual commands exist as an override:
|
||||
|
||||
| Command | What it does |
|
||||
|---|---|
|
||||
| `/goal wait <pid> [reason]` | Manually park the loop until the process with that PID exits. |
|
||||
| `/goal unwait` | Clear any wait barrier (judge- or manually-set) and resume immediately. |
|
||||
|
||||
The barrier (pid- or time-based) is persisted with the goal in `SessionDB.state_meta`, so it survives `/resume`. `/goal pause`, `/goal resume`, and `/goal clear` all drop it. If the PID is already dead when the barrier is set (or dies while parked), or the time deadline passes, the barrier clears on the next check — a stale barrier can never wedge the loop.
|
||||
|
||||
Typical flow: the agent pushes a PR, starts a CI watcher with `terminal(background=true, notify_on_complete=true)`, and reports "watching CI." The judge sees the watcher process still running, returns `wait` on its pid, and the loop goes quiet — then picks back up the instant CI finishes and judges the goal against the actual result.
|
||||
|
||||
## Behavior details
|
||||
|
||||
### The judge
|
||||
|
|
@ -94,7 +161,7 @@ Any real message you send while a goal is active takes priority over the continu
|
|||
|
||||
### Mid-run safety (gateway)
|
||||
|
||||
While an agent is already running, `/goal status`, `/goal pause`, and `/goal clear` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
|
||||
While an agent is already running, `/goal status`, `/goal pause`, `/goal clear`, `/goal wait`, and `/goal unwait` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
|
||||
|
||||
### Persistence
|
||||
|
||||
|
|
|
|||
|
|
@ -61,6 +61,8 @@ AI-native cross-session user modeling with dialectic reasoning, session-scoped c
|
|||
- `dialecticCadence` — how often the dialectic LLM fires (LLM call frequency)
|
||||
- `dialecticDepth` — how many `.chat()` passes per dialectic invocation (1–3, depth of reasoning)
|
||||
|
||||
The auto-injected dialectic also scales its reasoning level by query length (longer query → deeper reasoning, capped at `reasoningLevelCap`); see [Query-Adaptive Reasoning Level](./honcho.md#query-adaptive-reasoning-level).
|
||||
|
||||
**Setup Wizard:**
|
||||
```bash
|
||||
hermes memory setup # select "honcho" — runs the Honcho-specific post-setup
|
||||
|
|
@ -315,31 +317,55 @@ echo "OPENVIKING_API_KEY=..." >> ~/.hermes/.env
|
|||
|
||||
### Mem0
|
||||
|
||||
Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication.
|
||||
Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication. Supports both Mem0 Platform (cloud) and OSS (self-hosted) modes.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Best for** | Hands-off memory management — Mem0 handles extraction automatically |
|
||||
| **Requires** | `pip install mem0ai` + API key |
|
||||
| **Data storage** | Mem0 Cloud |
|
||||
| **Cost** | Mem0 pricing |
|
||||
| **Requires** | `pip install mem0ai` + API key (platform) or LLM/vector store (OSS) |
|
||||
| **Data storage** | Mem0 Cloud (platform) or self-hosted (OSS) |
|
||||
| **Cost** | Mem0 pricing (platform) / free (OSS) |
|
||||
|
||||
**Tools:** `mem0_profile` (all stored memories), `mem0_search` (semantic search + reranking), `mem0_conclude` (store verbatim facts)
|
||||
**Tools (5):** `mem0_list` (list all memories, paginated), `mem0_search` (semantic search with reranking in platform mode), `mem0_add` (store verbatim facts), `mem0_update` (update by ID), `mem0_delete` (delete by ID)
|
||||
|
||||
**Setup:**
|
||||
**Setup (Platform):**
|
||||
```bash
|
||||
hermes memory setup # select "mem0"
|
||||
hermes memory setup # select "mem0" → "Platform"
|
||||
# Or manually:
|
||||
hermes config set memory.provider mem0
|
||||
echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env
|
||||
```
|
||||
|
||||
**Config:** `$HERMES_HOME/mem0.json`
|
||||
**Setup (OSS):**
|
||||
```bash
|
||||
hermes memory setup # select "mem0" → "Open Source (self-hosted)"
|
||||
# Or via flags:
|
||||
hermes memory setup mem0 --mode oss --oss-llm openai --oss-llm-key sk-... --oss-vector qdrant
|
||||
```
|
||||
|
||||
Preview without writing files:
|
||||
```bash
|
||||
hermes memory setup mem0 --mode oss --oss-llm-key sk-... --dry-run
|
||||
```
|
||||
|
||||
**Config:** `$HERMES_HOME/mem0.json` (behavioral settings). Only the secret `MEM0_API_KEY` belongs in `~/.hermes/.env`.
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `mode` | `platform` | `platform` (Mem0 Cloud) or `oss` (self-hosted) |
|
||||
| `user_id` | `hermes-user` | User identifier |
|
||||
| `agent_id` | `hermes` | Agent identifier |
|
||||
| `rerank` | `true` | Rerank search results for relevance (platform mode only) |
|
||||
|
||||
**OSS supported providers:**
|
||||
|
||||
| Component | Providers |
|
||||
|-----------|-----------|
|
||||
| LLM | openai, ollama |
|
||||
| Embedder | openai, ollama |
|
||||
| Vector Store | qdrant (local/server), pgvector |
|
||||
|
||||
**Switching modes:** Re-run `hermes memory setup mem0 --mode <platform|oss>` or edit `mem0.json` directly.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -569,7 +595,7 @@ hermes memory setup
|
|||
|----------|---------|------|-------|-------------|----------------|
|
||||
| **Honcho** | Cloud | Paid | 5 | `honcho-ai` | Dialectic user modeling + session-scoped context |
|
||||
| **OpenViking** | Self-hosted | Free | 5 | `openviking` + server | Filesystem hierarchy + tiered loading |
|
||||
| **Mem0** | Cloud | Paid | 3 | `mem0ai` | Server-side LLM extraction |
|
||||
| **Mem0** | Cloud/Self-hosted | Free/Paid | 5 | `mem0ai` | Server-side LLM extraction + OSS mode |
|
||||
| **Hindsight** | Cloud/Local | Free/Paid | 3 | `hindsight-client` | Knowledge graph + reflect synthesis |
|
||||
| **Holographic** | Local | Free | 2 | None | HRR algebra + trust scoring |
|
||||
| **RetainDB** | Cloud | $20/mo | 5 | `requests` | Delta compression |
|
||||
|
|
|
|||
|
|
@ -270,6 +270,31 @@ display:
|
|||
> writes to your memory/skill stores, are unaffected by this setting. Set it
|
||||
> per-platform via `display.platforms.<platform>.memory_notifications`.
|
||||
|
||||
## Running the review on a cheaper model (`auxiliary.background_review`)
|
||||
|
||||
The review runs on your **main chat model** by default, replaying the
|
||||
conversation — which is already warm in the prompt cache, so it's cheap cache
|
||||
reads. On an expensive main model you can run the review on a cheaper model
|
||||
instead:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
background_review:
|
||||
provider: openrouter
|
||||
model: google/gemini-3-flash-preview # auto (default) = main chat model
|
||||
```
|
||||
|
||||
When you point it at a model **different** from your main one, the review runs
|
||||
there for substantially lower cost (~3–5× in benchmarks). Because a different
|
||||
model can't reuse your main model's prompt cache anyway, the fork automatically
|
||||
replays a compact **digest** of the conversation (recent turns verbatim + a
|
||||
summary of older ones) rather than the full transcript — minimizing what it
|
||||
writes to the new cache. Capture holds: in testing, memory capture was
|
||||
identical and skill capture near-identical to the main-model review.
|
||||
|
||||
Leave it at `auto` (or set it to your main model) and nothing changes — the
|
||||
review keeps running on the main model with the full warm-cache replay.
|
||||
|
||||
## Controlling skill writes (`skills.write_approval`)
|
||||
|
||||
Skills use the same on/off gate, but the review UX differs because a
|
||||
|
|
|
|||
|
|
@ -71,6 +71,42 @@ hermes chat --toolsets skills -q "What skills do you have?"
|
|||
hermes chat --toolsets skills -q "Show me the axolotl skill"
|
||||
```
|
||||
|
||||
## Learning a skill from sources (`/learn`)
|
||||
|
||||
`/learn` is the fast way to turn something you already know — or a pile of
|
||||
reference material — into a reusable skill, without hand-writing the
|
||||
`SKILL.md`. It is open-ended: point it at *anything you can describe* and the
|
||||
agent gathers the material with the tools it already has, then authors a skill
|
||||
that follows the [house authoring standards](#skillmd-format) (≤60-char
|
||||
description, the standard section order, Hermes-tool framing, no invented
|
||||
commands).
|
||||
|
||||
```bash
|
||||
# A local SDK or doc directory — read with read_file / search_files
|
||||
/learn the REST client in ~/projects/acme-sdk, focus on auth + pagination
|
||||
|
||||
# An online doc page — fetched with web_extract
|
||||
/learn https://docs.example.com/api/quickstart
|
||||
|
||||
# The workflow you just walked the agent through in this conversation
|
||||
/learn how I just deployed the staging server
|
||||
|
||||
# Pasted notes / a described procedure
|
||||
/learn filing an expense: open the portal, New > Expense, attach the receipt, submit
|
||||
```
|
||||
|
||||
Because the live agent does the sourcing, `/learn` works the same in the CLI,
|
||||
the messaging gateway, the TUI, and the dashboard — and on any terminal backend
|
||||
(local, Docker, remote), since there is no separate ingestion engine. In the
|
||||
**dashboard**, the Skills page has a **Learn a skill** button that opens a panel
|
||||
with a directory field, a URL field, and an open-ended text box; it composes a
|
||||
`/learn` request and runs it in chat.
|
||||
|
||||
There is no model-tool footprint: `/learn` builds a standards-guided prompt and
|
||||
hands it to the agent as a normal turn. The agent saves the result with the
|
||||
`skill_manage` tool, so the [write-approval gate](#gating-agent-skill-writes-skillswrite_approval)
|
||||
applies if you have it on.
|
||||
|
||||
## Progressive Disclosure
|
||||
|
||||
Skills use a token-efficient loading pattern:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue