mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA. ## New tool: browser_console Returns console messages (log/warn/error/info) AND uncaught JavaScript exceptions in a single call. Uses agent-browser's 'console' and 'errors' commands through the existing session plumbing. Supports --clear to reset buffers. Verified working in both local and Browserbase cloud modes. ## Enhanced tool: browser_vision(annotate=True) New boolean parameter on browser_vision. When true, agent-browser overlays numbered [N] labels on interactive elements — each [N] maps to ref @eN. Annotation data (element name, role, bounding box) returned alongside the vision analysis. Useful for QA reports and spatial reasoning. ## Config: browser.record_sessions Auto-record browser sessions as WebM video files when enabled: - Starts recording on first browser_navigate - Stops and saves on browser_close - Saves to ~/.hermes/browser_recordings/ - Works in both local and cloud modes (verified) - Disabled by default ## Built-in skill: dogfood Systematic exploratory QA testing for web applications. Teaches the agent a 5-phase workflow: 1. Plan — accept URL, create output dirs, set scope 2. Explore — systematic crawl with annotated screenshots 3. Collect Evidence — screenshots, console errors, JS exceptions 4. Categorize — severity (Critical/High/Medium/Low) and category (Functional/Visual/Accessibility/Console/UX/Content) 5. Report — structured markdown with per-issue evidence Includes: - skills/dogfood/SKILL.md — full workflow instructions - skills/dogfood/references/issue-taxonomy.md — severity/category defs - skills/dogfood/templates/dogfood-report-template.md — report template ## Tests 21 new tests covering: - browser_console message/error parsing, clear flag, empty/failed states - browser_console schema registration - browser_vision annotate schema and flag passing - record_sessions config defaults and recording lifecycle - Dogfood skill file existence and content validation Addresses #315.
This commit is contained in:
parent
0c4cff352a
commit
a8bf414f4a
11 changed files with 835 additions and 9 deletions
|
|
@ -69,7 +69,7 @@ hermes-agent/
|
||||||
│ ├── file_tools.py # File read/write/search/patch tools
|
│ ├── file_tools.py # File read/write/search/patch tools
|
||||||
│ ├── file_operations.py # File operations helpers
|
│ ├── file_operations.py # File operations helpers
|
||||||
│ ├── web_tools.py # Firecrawl search/extract
|
│ ├── web_tools.py # Firecrawl search/extract
|
||||||
│ ├── browser_tool.py # Browserbase browser automation
|
│ ├── browser_tool.py # Browserbase browser automation (browser_console, session recording)
|
||||||
│ ├── vision_tools.py # Image analysis via auxiliary LLM
|
│ ├── vision_tools.py # Image analysis via auxiliary LLM
|
||||||
│ ├── image_generation_tool.py # FLUX image generation via fal.ai
|
│ ├── image_generation_tool.py # FLUX image generation via fal.ai
|
||||||
│ ├── tts_tool.py # Text-to-speech
|
│ ├── tts_tool.py # Text-to-speech
|
||||||
|
|
@ -113,7 +113,7 @@ hermes-agent/
|
||||||
├── cron/ # Scheduler implementation
|
├── cron/ # Scheduler implementation
|
||||||
├── environments/ # RL training environments (Atropos integration)
|
├── environments/ # RL training environments (Atropos integration)
|
||||||
├── honcho_integration/ # Honcho client & session management
|
├── honcho_integration/ # Honcho client & session management
|
||||||
├── skills/ # Bundled skill sources
|
├── skills/ # Bundled skill sources (includes dogfood QA testing)
|
||||||
├── optional-skills/ # Official optional skills (not activated by default)
|
├── optional-skills/ # Official optional skills (not activated by default)
|
||||||
├── scripts/ # Install scripts, utilities
|
├── scripts/ # Install scripts, utilities
|
||||||
├── tests/ # Full pytest suite (~2300+ tests)
|
├── tests/ # Full pytest suite (~2300+ tests)
|
||||||
|
|
|
||||||
1
cli.py
1
cli.py
|
|
@ -161,6 +161,7 @@ def load_cli_config() -> Dict[str, Any]:
|
||||||
},
|
},
|
||||||
"browser": {
|
"browser": {
|
||||||
"inactivity_timeout": 120, # Auto-cleanup inactive browser sessions after 2 min
|
"inactivity_timeout": 120, # Auto-cleanup inactive browser sessions after 2 min
|
||||||
|
"record_sessions": False, # Auto-record browser sessions as WebM videos
|
||||||
},
|
},
|
||||||
"compression": {
|
"compression": {
|
||||||
"enabled": True, # Auto-compress when approaching context limit
|
"enabled": True, # Auto-compress when approaching context limit
|
||||||
|
|
|
||||||
|
|
@ -81,6 +81,7 @@ DEFAULT_CONFIG = {
|
||||||
|
|
||||||
"browser": {
|
"browser": {
|
||||||
"inactivity_timeout": 120,
|
"inactivity_timeout": 120,
|
||||||
|
"record_sessions": False, # Auto-record browser sessions as WebM videos
|
||||||
},
|
},
|
||||||
|
|
||||||
"compression": {
|
"compression": {
|
||||||
|
|
|
||||||
162
skills/dogfood/SKILL.md
Normal file
162
skills/dogfood/SKILL.md
Normal file
|
|
@ -0,0 +1,162 @@
|
||||||
|
---
|
||||||
|
name: dogfood
|
||||||
|
description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
|
||||||
|
version: 1.0.0
|
||||||
|
metadata:
|
||||||
|
hermes:
|
||||||
|
tags: [qa, testing, browser, web, dogfood]
|
||||||
|
related_skills: []
|
||||||
|
---
|
||||||
|
|
||||||
|
# Dogfood: Systematic Web Application QA Testing
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`)
|
||||||
|
- A target URL and testing scope from the user
|
||||||
|
|
||||||
|
## Inputs
|
||||||
|
|
||||||
|
The user provides:
|
||||||
|
1. **Target URL** — the entry point for testing
|
||||||
|
2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
|
||||||
|
3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
Follow this 5-phase systematic workflow:
|
||||||
|
|
||||||
|
### Phase 1: Plan
|
||||||
|
|
||||||
|
1. Create the output directory structure:
|
||||||
|
```
|
||||||
|
{output_dir}/
|
||||||
|
├── screenshots/ # Evidence screenshots
|
||||||
|
└── report.md # Final report (generated in Phase 5)
|
||||||
|
```
|
||||||
|
2. Identify the testing scope based on user input.
|
||||||
|
3. Build a rough sitemap by planning which pages and features to test:
|
||||||
|
- Landing/home page
|
||||||
|
- Navigation links (header, footer, sidebar)
|
||||||
|
- Key user flows (sign up, login, search, checkout, etc.)
|
||||||
|
- Forms and interactive elements
|
||||||
|
- Edge cases (empty states, error pages, 404s)
|
||||||
|
|
||||||
|
### Phase 2: Explore
|
||||||
|
|
||||||
|
For each page or feature in your plan:
|
||||||
|
|
||||||
|
1. **Navigate** to the page:
|
||||||
|
```
|
||||||
|
browser_navigate(url="https://example.com/page")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Take a snapshot** to understand the DOM structure:
|
||||||
|
```
|
||||||
|
browser_snapshot()
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Check the console** for JavaScript errors:
|
||||||
|
```
|
||||||
|
browser_console(clear=true)
|
||||||
|
```
|
||||||
|
Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
|
||||||
|
|
||||||
|
4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
|
||||||
|
```
|
||||||
|
browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
|
||||||
|
```
|
||||||
|
The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
|
||||||
|
|
||||||
|
5. **Test interactive elements** systematically:
|
||||||
|
- Click buttons and links: `browser_click(ref="@eN")`
|
||||||
|
- Fill forms: `browser_type(ref="@eN", text="test input")`
|
||||||
|
- Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
|
||||||
|
- Scroll through content: `browser_scroll(direction="down")`
|
||||||
|
- Test form validation with invalid inputs
|
||||||
|
- Test empty submissions
|
||||||
|
|
||||||
|
6. **After each interaction**, check for:
|
||||||
|
- Console errors: `browser_console()`
|
||||||
|
- Visual changes: `browser_vision(question="What changed after the interaction?")`
|
||||||
|
- Expected vs actual behavior
|
||||||
|
|
||||||
|
### Phase 3: Collect Evidence
|
||||||
|
|
||||||
|
For every issue found:
|
||||||
|
|
||||||
|
1. **Take a screenshot** showing the issue:
|
||||||
|
```
|
||||||
|
browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
|
||||||
|
```
|
||||||
|
Save the `screenshot_path` from the response — you will reference it in the report.
|
||||||
|
|
||||||
|
2. **Record the details**:
|
||||||
|
- URL where the issue occurs
|
||||||
|
- Steps to reproduce
|
||||||
|
- Expected behavior
|
||||||
|
- Actual behavior
|
||||||
|
- Console errors (if any)
|
||||||
|
- Screenshot path
|
||||||
|
|
||||||
|
3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
|
||||||
|
- Severity: Critical / High / Medium / Low
|
||||||
|
- Category: Functional / Visual / Accessibility / Console / UX / Content
|
||||||
|
|
||||||
|
### Phase 4: Categorize
|
||||||
|
|
||||||
|
1. Review all collected issues.
|
||||||
|
2. De-duplicate — merge issues that are the same bug manifesting in different places.
|
||||||
|
3. Assign final severity and category to each issue.
|
||||||
|
4. Sort by severity (Critical first, then High, Medium, Low).
|
||||||
|
5. Count issues by severity and category for the executive summary.
|
||||||
|
|
||||||
|
### Phase 5: Report
|
||||||
|
|
||||||
|
Generate the final report using the template at `templates/dogfood-report-template.md`.
|
||||||
|
|
||||||
|
The report must include:
|
||||||
|
1. **Executive summary** with total issue count, breakdown by severity, and testing scope
|
||||||
|
2. **Per-issue sections** with:
|
||||||
|
- Issue number and title
|
||||||
|
- Severity and category badges
|
||||||
|
- URL where observed
|
||||||
|
- Description of the issue
|
||||||
|
- Steps to reproduce
|
||||||
|
- Expected vs actual behavior
|
||||||
|
- Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
|
||||||
|
- Console errors if relevant
|
||||||
|
3. **Summary table** of all issues
|
||||||
|
4. **Testing notes** — what was tested, what was not, any blockers
|
||||||
|
|
||||||
|
Save the report to `{output_dir}/report.md`.
|
||||||
|
|
||||||
|
## Tools Reference
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `browser_navigate` | Go to a URL |
|
||||||
|
| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
|
||||||
|
| `browser_click` | Click an element by ref (`@eN`) or text |
|
||||||
|
| `browser_type` | Type into an input field |
|
||||||
|
| `browser_scroll` | Scroll up/down on the page |
|
||||||
|
| `browser_back` | Go back in browser history |
|
||||||
|
| `browser_press` | Press a keyboard key |
|
||||||
|
| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
|
||||||
|
| `browser_console` | Get JS console output and errors |
|
||||||
|
| `browser_close` | Close the browser session |
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
|
||||||
|
- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
|
||||||
|
- **Test with both valid and invalid inputs** — form validation bugs are common.
|
||||||
|
- **Scroll through long pages** — content below the fold may have rendering issues.
|
||||||
|
- **Test navigation flows** — click through multi-step processes end-to-end.
|
||||||
|
- **Check responsive behavior** by noting any layout issues visible in screenshots.
|
||||||
|
- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
|
||||||
|
- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
|
||||||
109
skills/dogfood/references/issue-taxonomy.md
Normal file
109
skills/dogfood/references/issue-taxonomy.md
Normal file
|
|
@ -0,0 +1,109 @@
|
||||||
|
# Issue Taxonomy
|
||||||
|
|
||||||
|
Use this taxonomy to classify issues found during dogfood QA testing.
|
||||||
|
|
||||||
|
## Severity Levels
|
||||||
|
|
||||||
|
### Critical
|
||||||
|
The issue makes a core feature completely unusable or causes data loss.
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
- Application crashes or shows a blank white page
|
||||||
|
- Form submission silently loses user data
|
||||||
|
- Authentication is completely broken (can't log in at all)
|
||||||
|
- Payment flow fails and charges the user without completing the order
|
||||||
|
- Security vulnerability (e.g., XSS, exposed credentials in console)
|
||||||
|
|
||||||
|
### High
|
||||||
|
The issue significantly impairs functionality but a workaround may exist.
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
- A key button does nothing when clicked (but refreshing fixes it)
|
||||||
|
- Search returns no results for valid queries
|
||||||
|
- Form validation rejects valid input
|
||||||
|
- Page loads but critical content is missing or garbled
|
||||||
|
- Navigation link leads to a 404 or wrong page
|
||||||
|
- Uncaught JavaScript exceptions in the console on core pages
|
||||||
|
|
||||||
|
### Medium
|
||||||
|
The issue is noticeable and affects user experience but doesn't block core functionality.
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
- Layout is misaligned or overlapping on certain screen sections
|
||||||
|
- Images fail to load (broken image icons)
|
||||||
|
- Slow performance (visible loading delays > 3 seconds)
|
||||||
|
- Form field lacks proper validation feedback (no error message on bad input)
|
||||||
|
- Console warnings that suggest deprecated or misconfigured features
|
||||||
|
- Inconsistent styling between similar pages
|
||||||
|
|
||||||
|
### Low
|
||||||
|
Minor polish issues that don't affect functionality.
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
- Typos or grammatical errors in text content
|
||||||
|
- Minor spacing or alignment inconsistencies
|
||||||
|
- Placeholder text left in production ("Lorem ipsum")
|
||||||
|
- Favicon missing
|
||||||
|
- Console info/debug messages that shouldn't be in production
|
||||||
|
- Subtle color contrast issues that don't fail WCAG requirements
|
||||||
|
|
||||||
|
## Categories
|
||||||
|
|
||||||
|
### Functional
|
||||||
|
Issues where features don't work as expected.
|
||||||
|
|
||||||
|
- Buttons/links that don't respond
|
||||||
|
- Forms that don't submit or submit incorrectly
|
||||||
|
- Broken user flows (can't complete a multi-step process)
|
||||||
|
- Incorrect data displayed
|
||||||
|
- Features that work partially
|
||||||
|
|
||||||
|
### Visual
|
||||||
|
Issues with the visual presentation of the page.
|
||||||
|
|
||||||
|
- Layout problems (overlapping elements, broken grids)
|
||||||
|
- Broken images or missing media
|
||||||
|
- Styling inconsistencies
|
||||||
|
- Responsive design failures
|
||||||
|
- Z-index issues (elements hidden behind others)
|
||||||
|
- Text overflow or truncation
|
||||||
|
|
||||||
|
### Accessibility
|
||||||
|
Issues that prevent or hinder access for users with disabilities.
|
||||||
|
|
||||||
|
- Missing alt text on meaningful images
|
||||||
|
- Poor color contrast (fails WCAG AA)
|
||||||
|
- Elements not reachable via keyboard navigation
|
||||||
|
- Missing form labels or ARIA attributes
|
||||||
|
- Focus indicators missing or unclear
|
||||||
|
- Screen reader incompatible content
|
||||||
|
|
||||||
|
### Console
|
||||||
|
Issues detected through JavaScript console output.
|
||||||
|
|
||||||
|
- Uncaught exceptions and unhandled promise rejections
|
||||||
|
- Failed network requests (4xx, 5xx errors in console)
|
||||||
|
- Deprecation warnings
|
||||||
|
- CORS errors
|
||||||
|
- Mixed content warnings (HTTP resources on HTTPS page)
|
||||||
|
- Excessive console.log output left from development
|
||||||
|
|
||||||
|
### UX (User Experience)
|
||||||
|
Issues where functionality works but the experience is poor.
|
||||||
|
|
||||||
|
- Confusing navigation or information architecture
|
||||||
|
- Missing loading indicators (user doesn't know something is happening)
|
||||||
|
- No feedback after user actions (e.g., button click with no visible result)
|
||||||
|
- Inconsistent interaction patterns
|
||||||
|
- Missing confirmation dialogs for destructive actions
|
||||||
|
- Poor error messages that don't help the user recover
|
||||||
|
|
||||||
|
### Content
|
||||||
|
Issues with the text, media, or information on the page.
|
||||||
|
|
||||||
|
- Typos and grammatical errors
|
||||||
|
- Placeholder/dummy content in production
|
||||||
|
- Outdated information
|
||||||
|
- Missing content (empty sections)
|
||||||
|
- Broken or dead links to external resources
|
||||||
|
- Incorrect or misleading labels
|
||||||
86
skills/dogfood/templates/dogfood-report-template.md
Normal file
86
skills/dogfood/templates/dogfood-report-template.md
Normal file
|
|
@ -0,0 +1,86 @@
|
||||||
|
# Dogfood QA Report
|
||||||
|
|
||||||
|
**Target:** {target_url}
|
||||||
|
**Date:** {date}
|
||||||
|
**Scope:** {scope_description}
|
||||||
|
**Tester:** Hermes Agent (automated exploratory QA)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
| Severity | Count |
|
||||||
|
|----------|-------|
|
||||||
|
| 🔴 Critical | {critical_count} |
|
||||||
|
| 🟠 High | {high_count} |
|
||||||
|
| 🟡 Medium | {medium_count} |
|
||||||
|
| 🔵 Low | {low_count} |
|
||||||
|
| **Total** | **{total_count}** |
|
||||||
|
|
||||||
|
**Overall Assessment:** {one_sentence_assessment}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Issues
|
||||||
|
|
||||||
|
<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
|
||||||
|
|
||||||
|
### Issue #{issue_number}: {issue_title}
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| **Severity** | {severity} |
|
||||||
|
| **Category** | {category} |
|
||||||
|
| **URL** | {url_where_found} |
|
||||||
|
|
||||||
|
**Description:**
|
||||||
|
{detailed_description_of_the_issue}
|
||||||
|
|
||||||
|
**Steps to Reproduce:**
|
||||||
|
1. {step_1}
|
||||||
|
2. {step_2}
|
||||||
|
3. {step_3}
|
||||||
|
|
||||||
|
**Expected Behavior:**
|
||||||
|
{what_should_happen}
|
||||||
|
|
||||||
|
**Actual Behavior:**
|
||||||
|
{what_actually_happens}
|
||||||
|
|
||||||
|
**Screenshot:**
|
||||||
|
MEDIA:{screenshot_path}
|
||||||
|
|
||||||
|
**Console Errors** (if applicable):
|
||||||
|
```
|
||||||
|
{console_error_output}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
<!-- End of per-issue section -->
|
||||||
|
|
||||||
|
## Issues Summary Table
|
||||||
|
|
||||||
|
| # | Title | Severity | Category | URL |
|
||||||
|
|---|-------|----------|----------|-----|
|
||||||
|
| {n} | {title} | {severity} | {category} | {url} |
|
||||||
|
|
||||||
|
## Testing Coverage
|
||||||
|
|
||||||
|
### Pages Tested
|
||||||
|
- {list_of_pages_visited}
|
||||||
|
|
||||||
|
### Features Tested
|
||||||
|
- {list_of_features_exercised}
|
||||||
|
|
||||||
|
### Not Tested / Out of Scope
|
||||||
|
- {areas_not_covered_and_why}
|
||||||
|
|
||||||
|
### Blockers
|
||||||
|
- {any_issues_that_prevented_testing_certain_areas}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
{any_additional_observations_or_recommendations}
|
||||||
276
tests/tools/test_browser_console.py
Normal file
276
tests/tools/test_browser_console.py
Normal file
|
|
@ -0,0 +1,276 @@
|
||||||
|
"""Tests for browser_console tool and browser_vision annotate param."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from unittest.mock import patch, MagicMock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
|
||||||
|
|
||||||
|
|
||||||
|
# ── browser_console ──────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestBrowserConsole:
|
||||||
|
"""browser_console() returns console messages + JS errors in one call."""
|
||||||
|
|
||||||
|
def test_returns_console_messages_and_errors(self):
|
||||||
|
from tools.browser_tool import browser_console
|
||||||
|
|
||||||
|
console_response = {
|
||||||
|
"success": True,
|
||||||
|
"data": {
|
||||||
|
"messages": [
|
||||||
|
{"text": "hello", "type": "log", "timestamp": 1},
|
||||||
|
{"text": "oops", "type": "error", "timestamp": 2},
|
||||||
|
]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
errors_response = {
|
||||||
|
"success": True,
|
||||||
|
"data": {
|
||||||
|
"errors": [
|
||||||
|
{"message": "Uncaught TypeError", "timestamp": 3},
|
||||||
|
]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
with patch("tools.browser_tool._run_browser_command") as mock_cmd:
|
||||||
|
mock_cmd.side_effect = [console_response, errors_response]
|
||||||
|
result = json.loads(browser_console(task_id="test"))
|
||||||
|
|
||||||
|
assert result["success"] is True
|
||||||
|
assert result["total_messages"] == 2
|
||||||
|
assert result["total_errors"] == 1
|
||||||
|
assert result["console_messages"][0]["text"] == "hello"
|
||||||
|
assert result["console_messages"][1]["text"] == "oops"
|
||||||
|
assert result["js_errors"][0]["message"] == "Uncaught TypeError"
|
||||||
|
|
||||||
|
def test_passes_clear_flag(self):
|
||||||
|
from tools.browser_tool import browser_console
|
||||||
|
|
||||||
|
empty = {"success": True, "data": {"messages": [], "errors": []}}
|
||||||
|
with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
|
||||||
|
browser_console(clear=True, task_id="test")
|
||||||
|
|
||||||
|
calls = mock_cmd.call_args_list
|
||||||
|
# Both console and errors should get --clear
|
||||||
|
assert calls[0][0] == ("test", "console", ["--clear"])
|
||||||
|
assert calls[1][0] == ("test", "errors", ["--clear"])
|
||||||
|
|
||||||
|
def test_no_clear_by_default(self):
|
||||||
|
from tools.browser_tool import browser_console
|
||||||
|
|
||||||
|
empty = {"success": True, "data": {"messages": [], "errors": []}}
|
||||||
|
with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
|
||||||
|
browser_console(task_id="test")
|
||||||
|
|
||||||
|
calls = mock_cmd.call_args_list
|
||||||
|
assert calls[0][0] == ("test", "console", [])
|
||||||
|
assert calls[1][0] == ("test", "errors", [])
|
||||||
|
|
||||||
|
def test_empty_console_and_errors(self):
|
||||||
|
from tools.browser_tool import browser_console
|
||||||
|
|
||||||
|
empty = {"success": True, "data": {"messages": [], "errors": []}}
|
||||||
|
with patch("tools.browser_tool._run_browser_command", return_value=empty):
|
||||||
|
result = json.loads(browser_console(task_id="test"))
|
||||||
|
|
||||||
|
assert result["total_messages"] == 0
|
||||||
|
assert result["total_errors"] == 0
|
||||||
|
assert result["console_messages"] == []
|
||||||
|
assert result["js_errors"] == []
|
||||||
|
|
||||||
|
def test_handles_failed_commands(self):
|
||||||
|
from tools.browser_tool import browser_console
|
||||||
|
|
||||||
|
failed = {"success": False, "error": "No session"}
|
||||||
|
with patch("tools.browser_tool._run_browser_command", return_value=failed):
|
||||||
|
result = json.loads(browser_console(task_id="test"))
|
||||||
|
|
||||||
|
# Should still return success with empty data
|
||||||
|
assert result["success"] is True
|
||||||
|
assert result["total_messages"] == 0
|
||||||
|
assert result["total_errors"] == 0
|
||||||
|
|
||||||
|
|
||||||
|
# ── browser_console schema ───────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestBrowserConsoleSchema:
|
||||||
|
"""browser_console is properly registered in the tool registry."""
|
||||||
|
|
||||||
|
def test_schema_in_browser_schemas(self):
|
||||||
|
from tools.browser_tool import BROWSER_TOOL_SCHEMAS
|
||||||
|
|
||||||
|
names = [s["name"] for s in BROWSER_TOOL_SCHEMAS]
|
||||||
|
assert "browser_console" in names
|
||||||
|
|
||||||
|
def test_schema_has_clear_param(self):
|
||||||
|
from tools.browser_tool import BROWSER_TOOL_SCHEMAS
|
||||||
|
|
||||||
|
schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_console")
|
||||||
|
props = schema["parameters"]["properties"]
|
||||||
|
assert "clear" in props
|
||||||
|
assert props["clear"]["type"] == "boolean"
|
||||||
|
|
||||||
|
|
||||||
|
# ── browser_vision annotate ──────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestBrowserVisionAnnotate:
|
||||||
|
"""browser_vision supports annotate parameter."""
|
||||||
|
|
||||||
|
def test_schema_has_annotate_param(self):
|
||||||
|
from tools.browser_tool import BROWSER_TOOL_SCHEMAS
|
||||||
|
|
||||||
|
schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_vision")
|
||||||
|
props = schema["parameters"]["properties"]
|
||||||
|
assert "annotate" in props
|
||||||
|
assert props["annotate"]["type"] == "boolean"
|
||||||
|
|
||||||
|
def test_annotate_false_no_flag(self):
|
||||||
|
"""Without annotate, screenshot command has no --annotate flag."""
|
||||||
|
from tools.browser_tool import browser_vision
|
||||||
|
|
||||||
|
with (
|
||||||
|
patch("tools.browser_tool._run_browser_command") as mock_cmd,
|
||||||
|
patch("tools.browser_tool._aux_vision_client") as mock_client,
|
||||||
|
patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
|
||||||
|
patch("tools.browser_tool._get_vision_model", return_value="test-model"),
|
||||||
|
):
|
||||||
|
mock_cmd.return_value = {"success": True, "data": {}}
|
||||||
|
# Will fail at screenshot file read, but we can check the command
|
||||||
|
try:
|
||||||
|
browser_vision("test", annotate=False, task_id="test")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if mock_cmd.called:
|
||||||
|
args = mock_cmd.call_args[0]
|
||||||
|
cmd_args = args[2] if len(args) > 2 else []
|
||||||
|
assert "--annotate" not in cmd_args
|
||||||
|
|
||||||
|
def test_annotate_true_adds_flag(self):
|
||||||
|
"""With annotate=True, screenshot command includes --annotate."""
|
||||||
|
from tools.browser_tool import browser_vision
|
||||||
|
|
||||||
|
with (
|
||||||
|
patch("tools.browser_tool._run_browser_command") as mock_cmd,
|
||||||
|
patch("tools.browser_tool._aux_vision_client") as mock_client,
|
||||||
|
patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
|
||||||
|
patch("tools.browser_tool._get_vision_model", return_value="test-model"),
|
||||||
|
):
|
||||||
|
mock_cmd.return_value = {"success": True, "data": {}}
|
||||||
|
try:
|
||||||
|
browser_vision("test", annotate=True, task_id="test")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if mock_cmd.called:
|
||||||
|
args = mock_cmd.call_args[0]
|
||||||
|
cmd_args = args[2] if len(args) > 2 else []
|
||||||
|
assert "--annotate" in cmd_args
|
||||||
|
|
||||||
|
|
||||||
|
# ── auto-recording config ────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestRecordSessionsConfig:
|
||||||
|
"""browser.record_sessions config option."""
|
||||||
|
|
||||||
|
def test_default_config_has_record_sessions(self):
|
||||||
|
from hermes_cli.config import DEFAULT_CONFIG
|
||||||
|
|
||||||
|
browser_cfg = DEFAULT_CONFIG.get("browser", {})
|
||||||
|
assert "record_sessions" in browser_cfg
|
||||||
|
assert browser_cfg["record_sessions"] is False
|
||||||
|
|
||||||
|
def test_maybe_start_recording_disabled(self):
|
||||||
|
"""Recording doesn't start when config says record_sessions: false."""
|
||||||
|
from tools.browser_tool import _maybe_start_recording, _recording_sessions
|
||||||
|
|
||||||
|
with (
|
||||||
|
patch("tools.browser_tool._run_browser_command") as mock_cmd,
|
||||||
|
patch("builtins.open", side_effect=FileNotFoundError),
|
||||||
|
):
|
||||||
|
_maybe_start_recording("test-task")
|
||||||
|
|
||||||
|
mock_cmd.assert_not_called()
|
||||||
|
assert "test-task" not in _recording_sessions
|
||||||
|
|
||||||
|
def test_maybe_stop_recording_noop_when_not_recording(self):
|
||||||
|
"""Stopping when not recording is a no-op."""
|
||||||
|
from tools.browser_tool import _maybe_stop_recording, _recording_sessions
|
||||||
|
|
||||||
|
_recording_sessions.discard("test-task") # ensure not in set
|
||||||
|
with patch("tools.browser_tool._run_browser_command") as mock_cmd:
|
||||||
|
_maybe_stop_recording("test-task")
|
||||||
|
|
||||||
|
mock_cmd.assert_not_called()
|
||||||
|
|
||||||
|
|
||||||
|
# ── dogfood skill files ──────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestDogfoodSkill:
|
||||||
|
"""Dogfood skill files exist and have correct structure."""
|
||||||
|
|
||||||
|
@pytest.fixture(autouse=True)
|
||||||
|
def _skill_dir(self):
|
||||||
|
# Use the actual repo skills dir (not temp)
|
||||||
|
self.skill_dir = os.path.join(
|
||||||
|
os.path.dirname(__file__), "..", "..", "skills", "dogfood"
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_skill_md_exists(self):
|
||||||
|
assert os.path.exists(os.path.join(self.skill_dir, "SKILL.md"))
|
||||||
|
|
||||||
|
def test_taxonomy_exists(self):
|
||||||
|
assert os.path.exists(
|
||||||
|
os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_report_template_exists(self):
|
||||||
|
assert os.path.exists(
|
||||||
|
os.path.join(self.skill_dir, "templates", "dogfood-report-template.md")
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_skill_md_has_frontmatter(self):
|
||||||
|
with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
|
||||||
|
content = f.read()
|
||||||
|
assert content.startswith("---")
|
||||||
|
assert "name: dogfood" in content
|
||||||
|
assert "description:" in content
|
||||||
|
|
||||||
|
def test_skill_references_browser_console(self):
|
||||||
|
with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
|
||||||
|
content = f.read()
|
||||||
|
assert "browser_console" in content
|
||||||
|
|
||||||
|
def test_skill_references_annotate(self):
|
||||||
|
with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
|
||||||
|
content = f.read()
|
||||||
|
assert "annotate" in content
|
||||||
|
|
||||||
|
def test_taxonomy_has_severity_levels(self):
|
||||||
|
with open(
|
||||||
|
os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
|
||||||
|
) as f:
|
||||||
|
content = f.read()
|
||||||
|
assert "Critical" in content
|
||||||
|
assert "High" in content
|
||||||
|
assert "Medium" in content
|
||||||
|
assert "Low" in content
|
||||||
|
|
||||||
|
def test_taxonomy_has_categories(self):
|
||||||
|
with open(
|
||||||
|
os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
|
||||||
|
) as f:
|
||||||
|
content = f.read()
|
||||||
|
assert "Functional" in content
|
||||||
|
assert "Visual" in content
|
||||||
|
assert "Accessibility" in content
|
||||||
|
assert "Console" in content
|
||||||
|
|
@ -144,6 +144,7 @@ def _socket_safe_tmpdir() -> str:
|
||||||
# Track active sessions per task
|
# Track active sessions per task
|
||||||
# Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
|
# Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
|
||||||
_active_sessions: Dict[str, Dict[str, str]] = {} # task_id -> {session_name, ...}
|
_active_sessions: Dict[str, Dict[str, str]] = {} # task_id -> {session_name, ...}
|
||||||
|
_recording_sessions: set = set() # task_ids with active recordings
|
||||||
|
|
||||||
# Flag to track if cleanup has been done
|
# Flag to track if cleanup has been done
|
||||||
_cleanup_done = False
|
_cleanup_done = False
|
||||||
|
|
@ -478,11 +479,31 @@ BROWSER_TOOL_SCHEMAS = [
|
||||||
"question": {
|
"question": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "What you want to know about the page visually. Be specific about what you're looking for."
|
"description": "What you want to know about the page visually. Be specific about what you're looking for."
|
||||||
|
},
|
||||||
|
"annotate": {
|
||||||
|
"type": "boolean",
|
||||||
|
"default": False,
|
||||||
|
"description": "If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout."
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"required": ["question"]
|
"required": ["question"]
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"name": "browser_console",
|
||||||
|
"description": "Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first.",
|
||||||
|
"parameters": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"clear": {
|
||||||
|
"type": "boolean",
|
||||||
|
"default": False,
|
||||||
|
"description": "If true, clear the message buffers after reading"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"required": []
|
||||||
|
}
|
||||||
|
},
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -998,9 +1019,10 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
|
||||||
session_info = _get_session_info(effective_task_id)
|
session_info = _get_session_info(effective_task_id)
|
||||||
is_first_nav = session_info.get("_first_nav", True)
|
is_first_nav = session_info.get("_first_nav", True)
|
||||||
|
|
||||||
# Mark that we've done at least one navigation
|
# Auto-start recording if configured and this is first navigation
|
||||||
if is_first_nav:
|
if is_first_nav:
|
||||||
session_info["_first_nav"] = False
|
session_info["_first_nav"] = False
|
||||||
|
_maybe_start_recording(effective_task_id)
|
||||||
|
|
||||||
result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
|
result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
|
||||||
|
|
||||||
|
|
@ -1264,6 +1286,10 @@ def browser_close(task_id: Optional[str] = None) -> str:
|
||||||
JSON string with close result
|
JSON string with close result
|
||||||
"""
|
"""
|
||||||
effective_task_id = task_id or "default"
|
effective_task_id = task_id or "default"
|
||||||
|
|
||||||
|
# Stop auto-recording before closing
|
||||||
|
_maybe_stop_recording(effective_task_id)
|
||||||
|
|
||||||
result = _run_browser_command(effective_task_id, "close", [])
|
result = _run_browser_command(effective_task_id, "close", [])
|
||||||
|
|
||||||
# Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
|
# Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
|
||||||
|
|
@ -1294,6 +1320,103 @@ def browser_close(task_id: Optional[str] = None) -> str:
|
||||||
}, ensure_ascii=False)
|
}, ensure_ascii=False)
|
||||||
|
|
||||||
|
|
||||||
|
def browser_console(clear: bool = False, task_id: Optional[str] = None) -> str:
|
||||||
|
"""Get browser console messages and JavaScript errors.
|
||||||
|
|
||||||
|
Returns both console output (log/warn/error/info from the page's JS)
|
||||||
|
and uncaught exceptions (crashes, unhandled promise rejections).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
clear: If True, clear the message/error buffers after reading
|
||||||
|
task_id: Task identifier for session isolation
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
JSON string with console messages and JS errors
|
||||||
|
"""
|
||||||
|
effective_task_id = task_id or "default"
|
||||||
|
|
||||||
|
console_args = ["--clear"] if clear else []
|
||||||
|
error_args = ["--clear"] if clear else []
|
||||||
|
|
||||||
|
console_result = _run_browser_command(effective_task_id, "console", console_args)
|
||||||
|
errors_result = _run_browser_command(effective_task_id, "errors", error_args)
|
||||||
|
|
||||||
|
messages = []
|
||||||
|
if console_result.get("success"):
|
||||||
|
for msg in console_result.get("data", {}).get("messages", []):
|
||||||
|
messages.append({
|
||||||
|
"type": msg.get("type", "log"),
|
||||||
|
"text": msg.get("text", ""),
|
||||||
|
"source": "console",
|
||||||
|
})
|
||||||
|
|
||||||
|
errors = []
|
||||||
|
if errors_result.get("success"):
|
||||||
|
for err in errors_result.get("data", {}).get("errors", []):
|
||||||
|
errors.append({
|
||||||
|
"message": err.get("message", ""),
|
||||||
|
"source": "exception",
|
||||||
|
})
|
||||||
|
|
||||||
|
return json.dumps({
|
||||||
|
"success": True,
|
||||||
|
"console_messages": messages,
|
||||||
|
"js_errors": errors,
|
||||||
|
"total_messages": len(messages),
|
||||||
|
"total_errors": len(errors),
|
||||||
|
}, ensure_ascii=False)
|
||||||
|
|
||||||
|
|
||||||
|
def _maybe_start_recording(task_id: str):
|
||||||
|
"""Start recording if browser.record_sessions is enabled in config."""
|
||||||
|
if task_id in _recording_sessions:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
|
||||||
|
config_path = hermes_home / "config.yaml"
|
||||||
|
record_enabled = False
|
||||||
|
if config_path.exists():
|
||||||
|
import yaml
|
||||||
|
with open(config_path) as f:
|
||||||
|
cfg = yaml.safe_load(f) or {}
|
||||||
|
record_enabled = cfg.get("browser", {}).get("record_sessions", False)
|
||||||
|
|
||||||
|
if not record_enabled:
|
||||||
|
return
|
||||||
|
|
||||||
|
recordings_dir = hermes_home / "browser_recordings"
|
||||||
|
recordings_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
_cleanup_old_recordings(max_age_hours=72)
|
||||||
|
|
||||||
|
import time
|
||||||
|
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||||
|
recording_path = recordings_dir / f"session_{timestamp}_{task_id[:16]}.webm"
|
||||||
|
|
||||||
|
result = _run_browser_command(task_id, "record", ["start", str(recording_path)])
|
||||||
|
if result.get("success"):
|
||||||
|
_recording_sessions.add(task_id)
|
||||||
|
logger.info("Auto-recording browser session %s to %s", task_id, recording_path)
|
||||||
|
else:
|
||||||
|
logger.debug("Could not start auto-recording: %s", result.get("error"))
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("Auto-recording setup failed: %s", e)
|
||||||
|
|
||||||
|
|
||||||
|
def _maybe_stop_recording(task_id: str):
|
||||||
|
"""Stop recording if one is active for this session."""
|
||||||
|
if task_id not in _recording_sessions:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
result = _run_browser_command(task_id, "record", ["stop"])
|
||||||
|
if result.get("success"):
|
||||||
|
path = result.get("data", {}).get("path", "")
|
||||||
|
logger.info("Saved browser recording for session %s: %s", task_id, path)
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("Could not stop recording for %s: %s", task_id, e)
|
||||||
|
finally:
|
||||||
|
_recording_sessions.discard(task_id)
|
||||||
|
|
||||||
|
|
||||||
def browser_get_images(task_id: Optional[str] = None) -> str:
|
def browser_get_images(task_id: Optional[str] = None) -> str:
|
||||||
"""
|
"""
|
||||||
Get all images on the current page.
|
Get all images on the current page.
|
||||||
|
|
@ -1348,7 +1471,7 @@ def browser_get_images(task_id: Optional[str] = None) -> str:
|
||||||
}, ensure_ascii=False)
|
}, ensure_ascii=False)
|
||||||
|
|
||||||
|
|
||||||
def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
def browser_vision(question: str, annotate: bool = False, task_id: Optional[str] = None) -> str:
|
||||||
"""
|
"""
|
||||||
Take a screenshot of the current page and analyze it with vision AI.
|
Take a screenshot of the current page and analyze it with vision AI.
|
||||||
|
|
||||||
|
|
@ -1362,6 +1485,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
question: What you want to know about the page visually
|
question: What you want to know about the page visually
|
||||||
|
annotate: If True, overlay numbered [N] labels on interactive elements
|
||||||
task_id: Task identifier for session isolation
|
task_id: Task identifier for session isolation
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
|
|
@ -1393,10 +1517,13 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||||
_cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
|
_cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
|
||||||
|
|
||||||
# Take screenshot using agent-browser
|
# Take screenshot using agent-browser
|
||||||
|
screenshot_args = [str(screenshot_path)]
|
||||||
|
if annotate:
|
||||||
|
screenshot_args.insert(0, "--annotate")
|
||||||
result = _run_browser_command(
|
result = _run_browser_command(
|
||||||
effective_task_id,
|
effective_task_id,
|
||||||
"screenshot",
|
"screenshot",
|
||||||
[str(screenshot_path)],
|
screenshot_args,
|
||||||
timeout=30
|
timeout=30
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -1456,11 +1583,15 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||||
)
|
)
|
||||||
|
|
||||||
analysis = response.choices[0].message.content
|
analysis = response.choices[0].message.content
|
||||||
return json.dumps({
|
response_data = {
|
||||||
"success": True,
|
"success": True,
|
||||||
"analysis": analysis,
|
"analysis": analysis,
|
||||||
"screenshot_path": str(screenshot_path),
|
"screenshot_path": str(screenshot_path),
|
||||||
}, ensure_ascii=False)
|
}
|
||||||
|
# Include annotation data if annotated screenshot was taken
|
||||||
|
if annotate and result.get("data", {}).get("annotations"):
|
||||||
|
response_data["annotations"] = result["data"]["annotations"]
|
||||||
|
return json.dumps(response_data, ensure_ascii=False)
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
# Keep the screenshot if it was captured successfully — the failure is
|
# Keep the screenshot if it was captured successfully — the failure is
|
||||||
|
|
@ -1490,6 +1621,25 @@ def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
|
||||||
pass # Non-critical — don't fail the screenshot operation
|
pass # Non-critical — don't fail the screenshot operation
|
||||||
|
|
||||||
|
|
||||||
|
def _cleanup_old_recordings(max_age_hours=72):
|
||||||
|
"""Remove browser recordings older than max_age_hours to prevent disk bloat."""
|
||||||
|
import time
|
||||||
|
try:
|
||||||
|
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
|
||||||
|
recordings_dir = hermes_home / "browser_recordings"
|
||||||
|
if not recordings_dir.exists():
|
||||||
|
return
|
||||||
|
cutoff = time.time() - (max_age_hours * 3600)
|
||||||
|
for f in recordings_dir.glob("session_*.webm"):
|
||||||
|
try:
|
||||||
|
if f.stat().st_mtime < cutoff:
|
||||||
|
f.unlink()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
# Cleanup and Management Functions
|
# Cleanup and Management Functions
|
||||||
# ============================================================================
|
# ============================================================================
|
||||||
|
|
@ -1561,6 +1711,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
|
||||||
bb_session_id = session_info.get("bb_session_id", "unknown")
|
bb_session_id = session_info.get("bb_session_id", "unknown")
|
||||||
logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id)
|
logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id)
|
||||||
|
|
||||||
|
# Stop auto-recording before closing (saves the file)
|
||||||
|
_maybe_stop_recording(task_id)
|
||||||
|
|
||||||
# Try to close via agent-browser first (needs session in _active_sessions)
|
# Try to close via agent-browser first (needs session in _active_sessions)
|
||||||
try:
|
try:
|
||||||
_run_browser_command(task_id, "close", [], timeout=10)
|
_run_browser_command(task_id, "close", [], timeout=10)
|
||||||
|
|
@ -1776,6 +1929,13 @@ registry.register(
|
||||||
name="browser_vision",
|
name="browser_vision",
|
||||||
toolset="browser",
|
toolset="browser",
|
||||||
schema=_BROWSER_SCHEMA_MAP["browser_vision"],
|
schema=_BROWSER_SCHEMA_MAP["browser_vision"],
|
||||||
handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")),
|
handler=lambda args, **kw: browser_vision(question=args.get("question", ""), annotate=args.get("annotate", False), task_id=kw.get("task_id")),
|
||||||
|
check_fn=check_browser_requirements,
|
||||||
|
)
|
||||||
|
registry.register(
|
||||||
|
name="browser_console",
|
||||||
|
toolset="browser",
|
||||||
|
schema=_BROWSER_SCHEMA_MAP["browser_console"],
|
||||||
|
handler=lambda args, **kw: browser_console(clear=args.get("clear", False), task_id=kw.get("task_id")),
|
||||||
check_fn=check_browser_requirements,
|
check_fn=check_browser_requirements,
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -620,6 +620,16 @@ code_execution:
|
||||||
max_tool_calls: 50 # Max tool calls within code execution
|
max_tool_calls: 50 # Max tool calls within code execution
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Browser
|
||||||
|
|
||||||
|
Configure browser automation behavior:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
browser:
|
||||||
|
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
|
||||||
|
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
|
||||||
|
```
|
||||||
|
|
||||||
## Delegation
|
## Delegation
|
||||||
|
|
||||||
Configure subagent behavior for the delegate tool:
|
Configure subagent behavior for the delegate tool:
|
||||||
|
|
|
||||||
|
|
@ -142,6 +142,16 @@ What does the chart on this page show?
|
||||||
|
|
||||||
Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
|
Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
|
||||||
|
|
||||||
|
### `browser_console`
|
||||||
|
|
||||||
|
Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
|
||||||
|
|
||||||
|
```
|
||||||
|
Check the browser console for any JavaScript errors
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
|
||||||
|
|
||||||
### `browser_close`
|
### `browser_close`
|
||||||
|
|
||||||
Close the browser session and release resources. Call this when done to free up Browserbase session quota.
|
Close the browser session and release resources. Call this when done to free up Browserbase session quota.
|
||||||
|
|
@ -175,6 +185,17 @@ Agent workflow:
|
||||||
4. browser_close()
|
4. browser_close()
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Session Recording
|
||||||
|
|
||||||
|
Automatically record browser sessions as WebM video files:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
browser:
|
||||||
|
record_sessions: true # default: false
|
||||||
|
```
|
||||||
|
|
||||||
|
When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
|
||||||
|
|
||||||
## Stealth Features
|
## Stealth Features
|
||||||
|
|
||||||
Browserbase provides automatic stealth capabilities:
|
Browserbase provides automatic stealth capabilities:
|
||||||
|
|
|
||||||
|
|
@ -15,7 +15,7 @@ Tools are functions that extend the agent's capabilities. They're organized into
|
||||||
| **Web** | `web_search`, `web_extract` | Search the web, extract page content |
|
| **Web** | `web_search`, `web_extract` | Search the web, extract page content |
|
||||||
| **Terminal** | `terminal`, `process` | Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes |
|
| **Terminal** | `terminal`, `process` | Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes |
|
||||||
| **File** | `read_file`, `write_file`, `patch`, `search_files` | Read, write, edit, and search files |
|
| **File** | `read_file`, `write_file`, `patch`, `search_files` | Read, write, edit, and search files |
|
||||||
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
|
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, `browser_console`, etc. | Full browser automation via Browserbase |
|
||||||
| **Vision** | `vision_analyze` | Image analysis via multimodal models |
|
| **Vision** | `vision_analyze` | Image analysis via multimodal models |
|
||||||
| **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
|
| **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
|
||||||
| **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
|
| **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue