docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)

- add code-derived reference pages for slash commands, tools, toolsets,
  bundled skills, and official optional skills
- document the skin system and link visual theming separately from
  conversational personality
- refresh quickstart, configuration, environment variable, and messaging
  docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
This commit is contained in:
Teknium 2026-03-13 21:34:41 -07:00 committed by GitHub
parent 2bf6b7ad1a
commit 984f00e0b0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
26 changed files with 1228 additions and 397 deletions

View file

@ -7,11 +7,16 @@ sidebar_position: 5
# Browser Automation
Hermes Agent includes a full browser automation toolset powered by [Browserbase](https://browserbase.com), enabling the agent to navigate websites, interact with page elements, fill forms, and extract information — all running in cloud-hosted browsers with built-in anti-bot stealth features.
Hermes Agent includes a full browser automation toolset that can run in two modes:
- **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
- **Local browser mode** via the `agent-browser` CLI and a local Chromium installation
In both modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
## Overview
The browser tools use the `agent-browser` CLI with Browserbase cloud execution. Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
The browser tools use the `agent-browser` CLI. In Browserbase mode, `agent-browser` connects to Browserbase cloud sessions. In local mode, it drives a local Chromium installation. Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
Key capabilities:
@ -23,16 +28,22 @@ Key capabilities:
## Setup
### Required Environment Variables
### Browserbase cloud mode
To use Browserbase-managed cloud browsers, add:
```bash
# Add to ~/.hermes/.env
BROWSERBASE_API_KEY=your-api-key-here
BROWSERBASE_API_KEY=***
BROWSERBASE_PROJECT_ID=your-project-id-here
```
Get your credentials at [browserbase.com](https://browserbase.com).
### Local browser mode
If you do **not** set Browserbase credentials, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
### Optional Environment Variables
```bash