diff --git a/optional-skills/web-development/DESCRIPTION.md b/optional-skills/web-development/DESCRIPTION.md new file mode 100644 index 000000000..588817bbc --- /dev/null +++ b/optional-skills/web-development/DESCRIPTION.md @@ -0,0 +1,5 @@ +# Web Development + +Optional skills for client-side web development workflows — embedding agents, copilots, and AI-native UX patterns into user-facing web apps. + +These are distinct from Hermes' own browser automation (Browserbase, Camofox), which operate *on* websites from outside. Web-development skills here help users build *into* their own websites. diff --git a/optional-skills/web-development/page-agent/SKILL.md b/optional-skills/web-development/page-agent/SKILL.md new file mode 100644 index 000000000..caab19901 --- /dev/null +++ b/optional-skills/web-development/page-agent/SKILL.md @@ -0,0 +1,189 @@ +--- +name: page-agent +description: Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single +``` + +A panel appears. Type an instruction. Done. + +Bookmarklet form (drop into bookmarks bar, click on any page): + +```javascript +javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})(); +``` + +## Path 2 — npm install into your own web app (production use) + +Inside an existing web project (React / Vue / Svelte / plain): + +```bash +npm install page-agent +``` + +Wire it up with your own LLM endpoint — **never ship the demo CDN to real users**: + +```javascript +import { PageAgent } from 'page-agent' + +const agent = new PageAgent({ + model: 'qwen3.5-plus', + baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1', + apiKey: process.env.LLM_API_KEY, // never hardcode + language: 'en-US', +}) + +// Show the panel for end users: +agent.panel.show() + +// Or drive it programmatically: +await agent.execute('Click submit button, then fill username as John') +``` + +Provider examples (any OpenAI-compatible endpoint works): + +| Provider | `baseURL` | `model` | +|----------|-----------|---------| +| Qwen / DashScope | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.5-plus` | +| OpenAI | `https://api.openai.com/v1` | `gpt-4o-mini` | +| Ollama (local) | `http://localhost:11434/v1` | `qwen3:14b` | +| OpenRouter | `https://openrouter.ai/api/v1` | `anthropic/claude-sonnet-4.6` | + +**Key config fields** (passed to `new PageAgent({...})`): + +- `model`, `baseURL`, `apiKey` — LLM connection +- `language` — UI language (`en-US`, `zh-CN`, etc.) +- Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list + +**Security.** Don't put your `apiKey` in client-side code for a real deployment — proxy LLM calls through your backend and point `baseURL` at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation. + +## Path 3 — clone the source repo (contributing, or hacking on it) + +Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension. + +```bash +git clone https://github.com/alibaba/page-agent.git +cd page-agent +npm ci # exact lockfile install (or `npm i` to allow updates) +``` + +Create `.env` in the repo root with an LLM endpoint. Example: + +``` +LLM_MODEL_NAME=gpt-4o-mini +LLM_API_KEY=sk-... +LLM_BASE_URL=https://api.openai.com/v1 +``` + +Ollama flavor: + +``` +LLM_BASE_URL=http://localhost:11434/v1 +LLM_API_KEY=NA +LLM_MODEL_NAME=qwen3:14b +``` + +Common commands: + +```bash +npm start # docs/website dev server +npm run build # build every package +npm run dev:demo # serve IIFE bundle at http://localhost:5174/page-agent.demo.js +npm run dev:ext # develop the browser extension (WXT + React) +npm run build:ext # build the extension +``` + +**Test on any website** using the local IIFE bundle. Add this bookmarklet: + +```javascript +javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})(); +``` + +Then: `npm run dev:demo`, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save. + +**Warning:** your `.env` `LLM_API_KEY` is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from `.env`.) + +## Repo layout (Path 3) + +Monorepo with npm workspaces. Key packages: + +| Package | Path | Purpose | +|---------|------|---------| +| `page-agent` | `packages/page-agent/` | Main entry with UI panel | +| `@page-agent/core` | `packages/core/` | Core agent logic, no UI | +| `@page-agent/mcp` | `packages/mcp/` | MCP server (beta) | +| — | `packages/llms/` | LLM client | +| — | `packages/page-controller/` | DOM ops + visual feedback | +| — | `packages/ui/` | Panel + i18n | +| — | `packages/extension/` | Chrome/Firefox extension | +| — | `packages/website/` | Docs + landing site | + +## Verifying it works + +After Path 1 or Path 2: +1. Open the page in a browser with devtools open +2. You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong `baseURL`, or a bad API key) +3. Type a simple instruction matching something visible on the page ("click the Login link") +4. Watch the Network tab — you should see a request to your `baseURL` + +After Path 3: +1. `npm run dev:demo` prints `Accepting connections at http://localhost:5174` +2. `curl -I http://localhost:5174/page-agent.demo.js` returns `HTTP/1.1 200 OK` with `Content-Type: application/javascript` +3. Click the bookmarklet on any site; panel appears + +## Pitfalls + +- **Demo CDN in production** — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use. +- **API key exposure** — any key passed to `new PageAgent({apiKey: ...})` ships in your JS bundle. Always proxy through your own backend for real deployments. +- **Non-OpenAI-compatible endpoints** fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front. +- **CSP blocks** — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin. +- **Restart dev server** after editing `.env` in Path 3 — Vite only reads env at startup. +- **Node version** — the repo declares `^22.13.0 || >=24`. Node 20 will fail `npm ci` with engine errors. +- **npm 10 vs 11** — docs say npm 11+; npm 10.9 actually works fine. + +## Reference + +- Repo: https://github.com/alibaba/page-agent +- Docs: https://alibaba.github.io/page-agent/ +- License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)