feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill

New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.
This commit is contained in:
teknium1 2026-03-08 21:02:14 -07:00
parent 0c4cff352a
commit a8bf414f4a
11 changed files with 835 additions and 9 deletions

View file

@ -69,7 +69,7 @@ hermes-agent/
│ ├── file_tools.py # File read/write/search/patch tools
│ ├── file_operations.py # File operations helpers
│ ├── web_tools.py # Firecrawl search/extract
│ ├── browser_tool.py # Browserbase browser automation
│ ├── browser_tool.py # Browserbase browser automation (browser_console, session recording)
│ ├── vision_tools.py # Image analysis via auxiliary LLM
│ ├── image_generation_tool.py # FLUX image generation via fal.ai
│ ├── tts_tool.py # Text-to-speech
@ -113,7 +113,7 @@ hermes-agent/
├── cron/ # Scheduler implementation
├── environments/ # RL training environments (Atropos integration)
├── honcho_integration/ # Honcho client & session management
├── skills/ # Bundled skill sources
├── skills/ # Bundled skill sources (includes dogfood QA testing)
├── optional-skills/ # Official optional skills (not activated by default)
├── scripts/ # Install scripts, utilities
├── tests/ # Full pytest suite (~2300+ tests)