mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

docs: deep audit — fix stale config keys, missing commands, and registry drift (#22784 )

* docs: deep audit — fix stale config keys, missing commands, and registry drift

Cross-checked ~80 high-impact docs pages (getting-started, reference, top-level
user-guide, user-guide/features) against the live registries:

  hermes_cli/commands.py    COMMAND_REGISTRY (slash commands)
  hermes_cli/auth.py        PROVIDER_REGISTRY (providers)
  hermes_cli/config.py      DEFAULT_CONFIG (config keys)
  toolsets.py               TOOLSETS (toolsets)
  tools/registry.py         get_all_tool_names() (tools)
  python -m hermes_cli.main <subcmd> --help (CLI args)

reference/
- cli-commands.md: drop duplicate hermes fallback row + duplicate section,
  add stepfun/lmstudio to --provider enum, expand auth/mcp/curator subcommand
  lists to match --help output (status/logout/spotify, login, archive/prune/
  list-archived).
- slash-commands.md: add missing /sessions and /reload-skills entries +
  correct the cross-platform Notes line.
- tools-reference.md: drop bogus '68 tools' headline, drop fictional
  'browser-cdp toolset' (these tools live in 'browser' and are runtime-gated),
  add missing 'kanban' and 'video' toolset sections, fix MCP example to use
  the real mcp_<server>_<tool> prefix.
- toolsets-reference.md: list browser_cdp/browser_dialog inside the 'browser'
  row, add missing 'kanban' and 'video' toolset rows, drop the stale
  '38 tools' count for hermes-cli.
- profile-commands.md: add missing install/update/info subcommands, document
  fish completion.
- environment-variables.md: dedupe GMI_API_KEY/GMI_BASE_URL rows (kept the
  one with the correct gmi-serving.com default).
- faq.md: Anthropic/Google/OpenAI examples — direct providers exist (not just
  via OpenRouter), refresh the OpenAI model list.

getting-started/
- installation.md: PortableGit (not MinGit) is what the Windows installer
  fetches; document the 32-bit MinGit fallback.
- installation.md / termux.md: installer prefers .[termux-all] then falls
  back to .[termux].
- nix-setup.md: Python 3.12 (not 3.11), Node.js 22 (not 20); fix invalid
  'nix flake update --flake' invocation.
- updating.md: 'hermes backup restore --state pre-update' doesn't exist —
  point at the snapshot/quick-snapshot flow; correct config key
  'updates.pre_update_backup' (was 'update.backup').

user-guide/
- configuration.md: api_max_retries default 3 (not 2); display.runtime_footer
  is the real key (not display.runtime_metadata_footer); checkpoints defaults
  enabled=false / max_snapshots=20 (not true / 50).
- configuring-models.md: 'hermes model list' / 'hermes model set ...' don't
  exist — hermes model is interactive only.
- tui.md: busy_indicator -> tui_status_indicator with values
  kaomoji|emoji|unicode|ascii (not kawaii|minimal|dots|wings|none).
- security.md: SSH backend keys (TERMINAL_SSH_HOST/USER/KEY) live in .env,
  not config.yaml.
- windows-wsl-quickstart.md: there is no 'hermes api' subcommand — the
  OpenAI-compatible API server runs inside hermes gateway.

user-guide/features/
- computer-use.md: approvals.mode (not security.approval_level); fix broken
  ./browser-use.md link to ./browser.md.
- fallback-providers.md: top-level fallback_providers (not
  model.fallback_providers); the picker is subcommand-based, not modal.
- api-server.md: API_SERVER_* are env vars — write to per-profile .env,
  not 'hermes config set' which targets YAML.
- web-search.md: drop web_crawl as a registered tool (it isn't); deep-crawl
  modes are exposed through web_extract.
- kanban.md: failure_limit default is 2, not '~5'.
- plugins.md: drop hard-coded '33 providers' count.
- honcho.md: fix unclosed quote in echo HONCHO_API_KEY snippet; document
  that 'hermes honcho' subcommand is gated on memory.provider=honcho;
  reconcile subcommand list with actual --help output.
- memory-providers.md: legacy 'hermes honcho setup' redirect documented.

Verified via 'npm run build' — site builds cleanly; broken-link count went
from 149 to 146 (no regressions, fixed a few in passing).

* docs: round 2 audit fixes + regenerate skill catalogs

Follow-up to the previous commit on this branch:

Round 2 manual fixes:
- quickstart.md: KIMI_CODING_API_KEY mentioned alongside KIMI_API_KEY;
  voice-mode and ACP install commands rewritten — bare 'pip install ...'
  doesn't work for curl-installed setups (no pip on PATH, not in repo
  dir); replaced with 'cd ~/.hermes/hermes-agent && uv pip install -e
  ".[voice]"'. ACP already ships in [all] so the curl install includes it.
- cli.md / configuration.md: 'auxiliary.compression.model' shown as
  'google/gemini-3-flash-preview' (the doc's own claimed default);
  actual default is empty (= use main model). Reworded as 'leave empty
  (default) or pin a cheap model'.
- built-in-plugins.md: added the bundled 'kanban/dashboard' plugin row
  that was missing from the table.

Regenerated skill catalogs:
- ran website/scripts/generate-skill-docs.py to refresh all 163 per-skill
  pages and both reference catalogs (skills-catalog.md,
  optional-skills-catalog.md). This adds the entries that were genuinely
  missing — productivity/teams-meeting-pipeline (bundled),
  optional/finance/* (entire category — 7 skills:
  3-statement-model, comps-analysis, dcf-model, excel-author, lbo-model,
  merger-model, pptx-author), creative/hyperframes,
  creative/kanban-video-orchestrator, devops/watchers,
  productivity/shop-app, research/searxng-search,
  apple/macos-computer-use — and rewrites every other per-skill page from
  the current SKILL.md. Most diffs are tiny (one line of refreshed
  metadata).

Validation:
- 'npm run build' succeeded.
- Broken-link count moved 146 -> 155 — the +9 are zh-Hans translation
  shells that lag every newly-added skill page (pre-existing pattern).
  No regressions on any en/ page.

2026-05-09 13:19:51 -07:00

9.7 KiB

Raw Blame History

title	sidebar_label	description
Page Agent	Page Agent	Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-...

{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}

Page Agent

Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natural language ("click login, fill username as John"). No Python, no headless browser, no extension required. Use this skill when the user is a web developer who wants to add an AI copilot to their SaaS / admin panel / B2B tool, make a legacy web app accessible via natural language, or evaluate page-agent against a local (Ollama) or cloud (Qwen / OpenAI / OpenRouter) LLM. NOT for server-side browser automation — point those users to Hermes' built-in browser tool instead.

Skill metadata


Source	Optional — install with `hermes skills install official/web-development/page-agent`
Path	`optional-skills/web-development/page-agent`
Version	`1.0.0`
Author	Hermes Agent
License	MIT
Platforms	linux, macos, windows
Tags	`web`, `javascript`, `agent`, `browser`, `gui`, `alibaba`, `embed`, `copilot`, `saas`

Reference: full SKILL.md

:::info The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. :::

page-agent

alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.

When to use this skill

Load this skill when a user wants to:

Ship an AI copilot inside their own web app (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
Modernize a legacy web app without rewriting the frontend — page-agent drops on top of existing DOM
Add accessibility via natural language — voice / screen-reader users drive the UI by describing what they want
Demo or evaluate page-agent against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
Build interactive training / product demos — let an AI walk a user through "how to submit an expense report" live in the real UI

When NOT to use this skill

User wants Hermes itself to drive a browser → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the opposite direction.
User wants cross-tab automation without embedding → use Playwright, browser-use, or the page-agent Chrome extension
User needs visual grounding / screenshots → page-agent is text-DOM only; use a multimodal browser agent instead

Prerequisites

Node 22.13+ or 24+, npm 10+ (docs claim 11+ but 10.9 works fine)
An OpenAI-compatible LLM endpoint: Qwen (DashScope), OpenAI, Ollama, OpenRouter, or anything speaking /v1/chat/completions
Browser with devtools (for debugging)

Path 1 — 30-second demo via CDN (no install)

Fastest way to see it work. Uses alibaba's free testing LLM proxy — for evaluation only, subject to their terms.

Add to any HTML page (or paste into the devtools console as a bookmarklet):

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>

A panel appears. Type an instruction. Done.

Bookmarklet form (drop into bookmarks bar, click on any page):

javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})();

Path 2 — npm install into your own web app (production use)

Inside an existing web project (React / Vue / Svelte / plain):

npm install page-agent

Wire it up with your own LLM endpoint — never ship the demo CDN to real users:

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: process.env.LLM_API_KEY,   // never hardcode
    language: 'en-US',
})

// Show the panel for end users:
agent.panel.show()

// Or drive it programmatically:
await agent.execute('Click submit button, then fill username as John')

Provider examples (any OpenAI-compatible endpoint works):

Provider	`baseURL`	`model`
Qwen / DashScope	`https://dashscope.aliyuncs.com/compatible-mode/v1`	`qwen3.5-plus`
OpenAI	`https://api.openai.com/v1`	`gpt-4o-mini`
Ollama (local)	`http://localhost:11434/v1`	`qwen3:14b`
OpenRouter	`https://openrouter.ai/api/v1`	`anthropic/claude-sonnet-4.6`

Key config fields (passed to new PageAgent({...})):

model, baseURL, apiKey — LLM connection
language — UI language (en-US, zh-CN, etc.)
Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list

Security. Don't put your apiKey in client-side code for a real deployment — proxy LLM calls through your backend and point baseURL at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.

Path 3 — clone the source repo (contributing, or hacking on it)

Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.

git clone https://github.com/alibaba/page-agent.git
cd page-agent
npm ci              # exact lockfile install (or `npm i` to allow updates)

Create .env in the repo root with an LLM endpoint. Example:

LLM_MODEL_NAME=gpt-4o-mini
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1

Ollama flavor:

LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=NA
LLM_MODEL_NAME=qwen3:14b

Common commands:

npm start           # docs/website dev server
npm run build       # build every package
npm run dev:demo    # serve IIFE bundle at http://localhost:5174/page-agent.demo.js
npm run dev:ext     # develop the browser extension (WXT + React)
npm run build:ext   # build the extension

Test on any website using the local IIFE bundle. Add this bookmarklet:

javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})();

Then: npm run dev:demo, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.

Warning: your .env LLM_API_KEY is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from .env.)

Repo layout (Path 3)

Monorepo with npm workspaces. Key packages:

Package	Path	Purpose
`page-agent`	`packages/page-agent/`	Main entry with UI panel
`@page-agent/core`	`packages/core/`	Core agent logic, no UI
`@page-agent/mcp`	`packages/mcp/`	MCP server (beta)
—	`packages/llms/`	LLM client
—	`packages/page-controller/`	DOM ops + visual feedback
—	`packages/ui/`	Panel + i18n
—	`packages/extension/`	Chrome/Firefox extension
—	`packages/website/`	Docs + landing site

Verifying it works

After Path 1 or Path 2:

Open the page in a browser with devtools open
You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong baseURL, or a bad API key)
Type a simple instruction matching something visible on the page ("click the Login link")
Watch the Network tab — you should see a request to your baseURL

After Path 3:

npm run dev:demo prints Accepting connections at http://localhost:5174
curl -I http://localhost:5174/page-agent.demo.js returns HTTP/1.1 200 OK with Content-Type: application/javascript
Click the bookmarklet on any site; panel appears

Pitfalls

Demo CDN in production — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use.
API key exposure — any key passed to new PageAgent({apiKey: ...}) ships in your JS bundle. Always proxy through your own backend for real deployments.
Non-OpenAI-compatible endpoints fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front.
CSP blocks — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin.
Restart dev server after editing .env in Path 3 — Vite only reads env at startup.
Node version — the repo declares ^22.13.0 || >=24. Node 20 will fail npm ci with engine errors.
npm 10 vs 11 — docs say npm 11+; npm 10.9 actually works fine.

Reference

Repo: https://github.com/alibaba/page-agent
Docs: https://alibaba.github.io/page-agent/
License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)

9.7 KiB Raw Blame History