mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
docs: comprehensive documentation update for recent features
New documentation: - DingTalk messaging platform setup guide (dingtalk.md) Updated existing docs: - quickstart.md: add Alibaba Cloud, Kilo Code, Vercel AI Gateway to provider table - configuration.md: add Alibaba Cloud provider, website blocklist config, light/dark theme mode, smart approvals (ask/smart/off) - environment-variables.md: add Mattermost, Matrix, DingTalk, Browser Use, DashScope env vars - browser.md: add Browser Use cloud provider, /browser connect CDP mode, multi-provider architecture, fix limitation section contradiction - slash-commands.md: add /tools enable/disable/list, /browser connect/disconnect/status - messaging/index.md: add DingTalk, Mattermost, Matrix to architecture diagram, platform toolset table, security allowlists, and Next Steps links - security.md: add website access policy (blocklist) documentation - sidebars.ts: add Mattermost, Matrix, DingTalk to Messaging Gateway sidebar
This commit is contained in:
parent
ba728f3e63
commit
d9b9987ad3
9 changed files with 315 additions and 143 deletions
|
|
@ -1,27 +1,30 @@
|
|||
---
|
||||
title: Browser Automation
|
||||
description: Control cloud browsers with Browserbase integration for web interaction, form filling, scraping, and more.
|
||||
description: Control browsers with multiple providers, local Chrome via CDP, or cloud browsers for web interaction, form filling, scraping, and more.
|
||||
sidebar_label: Browser
|
||||
sidebar_position: 5
|
||||
---
|
||||
|
||||
# Browser Automation
|
||||
|
||||
Hermes Agent includes a full browser automation toolset that can run in two modes:
|
||||
Hermes Agent includes a full browser automation toolset with multiple backend options:
|
||||
|
||||
- **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
|
||||
- **Browser Use cloud mode** via [Browser Use](https://browser-use.com) as an alternative cloud browser provider
|
||||
- **Local Chrome via CDP** — connect browser tools to your own Chrome instance using `/browser connect`
|
||||
- **Local browser mode** via the `agent-browser` CLI and a local Chromium installation
|
||||
|
||||
In both modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
|
||||
In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
|
||||
|
||||
## Overview
|
||||
|
||||
The browser tools use the `agent-browser` CLI. In Browserbase mode, `agent-browser` connects to Browserbase cloud sessions. In local mode, it drives a local Chromium installation. Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
|
||||
Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
|
||||
|
||||
Key capabilities:
|
||||
|
||||
- **Cloud execution** — no local browser needed
|
||||
- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies
|
||||
- **Multi-provider cloud execution** — Browserbase or Browser Use, no local browser needed
|
||||
- **Local Chrome integration** — attach to your running Chrome via CDP for hands-on browsing
|
||||
- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
|
||||
- **Session isolation** — each task gets its own browser session
|
||||
- **Automatic cleanup** — inactive sessions are closed after a timeout
|
||||
- **Vision analysis** — screenshot + AI analysis for visual understanding
|
||||
|
|
@ -40,9 +43,48 @@ BROWSERBASE_PROJECT_ID=your-project-id-here
|
|||
|
||||
Get your credentials at [browserbase.com](https://browserbase.com).
|
||||
|
||||
### Browser Use cloud mode
|
||||
|
||||
To use Browser Use as your cloud browser provider, add:
|
||||
|
||||
```bash
|
||||
# Add to ~/.hermes/.env
|
||||
BROWSER_USE_API_KEY=***
|
||||
```
|
||||
|
||||
Get your API key at [browser-use.com](https://browser-use.com). Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.
|
||||
|
||||
### Local Chrome via CDP (`/browser connect`)
|
||||
|
||||
Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
|
||||
|
||||
In the CLI, use:
|
||||
|
||||
```
|
||||
/browser connect # Connect to Chrome at ws://localhost:9222
|
||||
/browser connect ws://host:port # Connect to a specific CDP endpoint
|
||||
/browser status # Check current connection
|
||||
/browser disconnect # Detach and return to cloud/local mode
|
||||
```
|
||||
|
||||
If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with `--remote-debugging-port=9222`.
|
||||
|
||||
:::tip
|
||||
To start Chrome manually with CDP enabled:
|
||||
```bash
|
||||
# Linux
|
||||
google-chrome --remote-debugging-port=9222
|
||||
|
||||
# macOS
|
||||
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
|
||||
```
|
||||
:::
|
||||
|
||||
When connected via CDP, all browser tools (`browser_navigate`, `browser_click`, etc.) operate on your live Chrome instance instead of spinning up a cloud session.
|
||||
|
||||
### Local browser mode
|
||||
|
||||
If you do **not** set Browserbase credentials, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
|
||||
If you do **not** set any cloud credentials and don't use `/browser connect`, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
|
||||
|
||||
### Optional Environment Variables
|
||||
|
||||
|
|
@ -232,10 +274,8 @@ If paid features aren't available on your plan, Hermes automatically falls back
|
|||
|
||||
## Limitations
|
||||
|
||||
- **Requires Browserbase account** — no local browser fallback
|
||||
- **Requires `agent-browser` CLI** — must be installed via npm
|
||||
- **Text-based interaction** — relies on accessibility tree, not pixel coordinates
|
||||
- **Snapshot size** — large pages may be truncated or LLM-summarized at 8000 characters
|
||||
- **Session timeout** — sessions expire based on your Browserbase plan settings
|
||||
- **Cost** — each session consumes Browserbase credits; use `browser_close` when done
|
||||
- **Session timeout** — cloud sessions expire based on your provider's plan settings
|
||||
- **Cost** — cloud sessions consume provider credits; use `browser_close` when done. Use `/browser connect` for free local browsing.
|
||||
- **No file downloads** — cannot download files from the browser
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue