feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill

New browser capabilities and a built-in skill for agent-driven web QA. ## New tool: browser_console Returns console messages (log/warn/error/info) AND uncaught JavaScript exceptions in a single call. Uses agent-browser's 'console' and 'errors' commands through the existing session plumbing. Supports --clear to reset buffers. Verified working in both local and Browserbase cloud modes. ## Enhanced tool: browser_vision(annotate=True) New boolean parameter on browser_vision. When true, agent-browser overlays numbered [N] labels on interactive elements — each [N] maps to ref @eN. Annotation data (element name, role, bounding box) returned alongside the vision analysis. Useful for QA reports and spatial reasoning. ## Config: browser.record_sessions Auto-record browser sessions as WebM video files when enabled: - Starts recording on first browser_navigate - Stops and saves on browser_close - Saves to ~/.hermes/browser_recordings/ - Works in both local and cloud modes (verified) - Disabled by default ## Built-in skill: dogfood Systematic exploratory QA testing for web applications. Teaches the agent a 5-phase workflow: 1. Plan — accept URL, create output dirs, set scope 2. Explore — systematic crawl with annotated screenshots 3. Collect Evidence — screenshots, console errors, JS exceptions 4. Categorize — severity (Critical/High/Medium/Low) and category (Functional/Visual/Accessibility/Console/UX/Content) 5. Report — structured markdown with per-issue evidence Includes: - skills/dogfood/SKILL.md — full workflow instructions - skills/dogfood/references/issue-taxonomy.md — severity/category defs - skills/dogfood/templates/dogfood-report-template.md — report template ## Tests 21 new tests covering: - browser_console message/error parsing, clear flag, empty/failed states - browser_console schema registration - browser_vision annotate schema and flag passing - record_sessions config defaults and recording lifecycle - Dogfood skill file existence and content validation Addresses #315.
2026-04-25 00:51:20 +00:00 · 2026-03-08 21:02:14 -07:00 · 2026-03-08 21:02:14 -07:00 · a8bf414f4a
commit a8bf414f4a
parent 0c4cff352a
11 changed files with 835 additions and 9 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -69,7 +69,7 @@ hermes-agent/
 │   ├── file_tools.py          # File read/write/search/patch tools
 │   ├── file_operations.py     # File operations helpers
 │   ├── web_tools.py           # Firecrawl search/extract
-│   ├── browser_tool.py        # Browserbase browser automation
+│   ├── browser_tool.py        # Browserbase browser automation (browser_console, session recording)
 │   ├── vision_tools.py        # Image analysis via auxiliary LLM
 │   ├── image_generation_tool.py # FLUX image generation via fal.ai
 │   ├── tts_tool.py            # Text-to-speech
@ -113,7 +113,7 @@ hermes-agent/
 ├── cron/                 # Scheduler implementation
 ├── environments/         # RL training environments (Atropos integration)
 ├── honcho_integration/   # Honcho client & session management
-├── skills/               # Bundled skill sources
+├── skills/               # Bundled skill sources (includes dogfood QA testing)
 ├── optional-skills/      # Official optional skills (not activated by default)
 ├── scripts/              # Install scripts, utilities
 ├── tests/                # Full pytest suite (~2300+ tests)
--- a/cli.py
+++ b/cli.py
@ -161,6 +161,7 @@ def load_cli_config() -> Dict[str, Any]:
        },
        "browser": {
            "inactivity_timeout": 120,  # Auto-cleanup inactive browser sessions after 2 min
            "record_sessions": False,  # Auto-record browser sessions as WebM videos
        },
        "compression": {
            "enabled": True,      # Auto-compress when approaching context limit
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@ -81,6 +81,7 @@ DEFAULT_CONFIG = {
    "browser": {
        "inactivity_timeout": 120,
        "record_sessions": False,  # Auto-record browser sessions as WebM videos
    },
    "compression": {
--- a/skills/dogfood/SKILL.md
+++ b/skills/dogfood/SKILL.md
@ -0,0 +1,162 @@
 ---
 name: dogfood
 description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
 version: 1.0.0
 metadata:
  hermes:
    tags: [qa, testing, browser, web, dogfood]
    related_skills: []
 ---
 # Dogfood: Systematic Web Application QA Testing
 ## Overview
 This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report.
 ## Prerequisites
 - Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`)
 - A target URL and testing scope from the user
 ## Inputs
 The user provides:
 1. **Target URL** — the entry point for testing
 2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
 3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
 ## Workflow
 Follow this 5-phase systematic workflow:
 ### Phase 1: Plan
 1. Create the output directory structure:
   ```
   {output_dir}/
   ├── screenshots/       # Evidence screenshots
   └── report.md          # Final report (generated in Phase 5)
   ```
 2. Identify the testing scope based on user input.
 3. Build a rough sitemap by planning which pages and features to test:
   - Landing/home page
   - Navigation links (header, footer, sidebar)
   - Key user flows (sign up, login, search, checkout, etc.)
   - Forms and interactive elements
   - Edge cases (empty states, error pages, 404s)
 ### Phase 2: Explore
 For each page or feature in your plan:
 1. **Navigate** to the page:
   ```
   browser_navigate(url="https://example.com/page")
   ```
 2. **Take a snapshot** to understand the DOM structure:
   ```
   browser_snapshot()
   ```
 3. **Check the console** for JavaScript errors:
   ```
   browser_console(clear=true)
   ```
   Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
 4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
   ```
   browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
   ```
   The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
 5. **Test interactive elements** systematically:
   - Click buttons and links: `browser_click(ref="@eN")`
   - Fill forms: `browser_type(ref="@eN", text="test input")`
   - Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
   - Scroll through content: `browser_scroll(direction="down")`
   - Test form validation with invalid inputs
   - Test empty submissions
 6. **After each interaction**, check for:
   - Console errors: `browser_console()`
   - Visual changes: `browser_vision(question="What changed after the interaction?")`
   - Expected vs actual behavior
 ### Phase 3: Collect Evidence
 For every issue found:
 1. **Take a screenshot** showing the issue:
   ```
   browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
   ```
   Save the `screenshot_path` from the response — you will reference it in the report.
 2. **Record the details**:
   - URL where the issue occurs
   - Steps to reproduce
   - Expected behavior
   - Actual behavior
   - Console errors (if any)
   - Screenshot path
 3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
   - Severity: Critical / High / Medium / Low
   - Category: Functional / Visual / Accessibility / Console / UX / Content
 ### Phase 4: Categorize
 1. Review all collected issues.
 2. De-duplicate — merge issues that are the same bug manifesting in different places.
 3. Assign final severity and category to each issue.
 4. Sort by severity (Critical first, then High, Medium, Low).
 5. Count issues by severity and category for the executive summary.
 ### Phase 5: Report
 Generate the final report using the template at `templates/dogfood-report-template.md`.
 The report must include:
 1. **Executive summary** with total issue count, breakdown by severity, and testing scope
 2. **Per-issue sections** with:
   - Issue number and title
   - Severity and category badges
   - URL where observed
   - Description of the issue
   - Steps to reproduce
   - Expected vs actual behavior
   - Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
   - Console errors if relevant
 3. **Summary table** of all issues
 4. **Testing notes** — what was tested, what was not, any blockers
 Save the report to `{output_dir}/report.md`.
 ## Tools Reference
 | Tool | Purpose |
 |------|---------|
 | `browser_navigate` | Go to a URL |
 | `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
 | `browser_click` | Click an element by ref (`@eN`) or text |
 | `browser_type` | Type into an input field |
 | `browser_scroll` | Scroll up/down on the page |
 | `browser_back` | Go back in browser history |
 | `browser_press` | Press a keyboard key |
 | `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
 | `browser_console` | Get JS console output and errors |
 | `browser_close` | Close the browser session |
 ## Tips
 - **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
 - **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
 - **Test with both valid and invalid inputs** — form validation bugs are common.
 - **Scroll through long pages** — content below the fold may have rendering issues.
 - **Test navigation flows** — click through multi-step processes end-to-end.
 - **Check responsive behavior** by noting any layout issues visible in screenshots.
 - **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
 - When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
--- a/skills/dogfood/references/issue-taxonomy.md
+++ b/skills/dogfood/references/issue-taxonomy.md
@ -0,0 +1,109 @@
 # Issue Taxonomy
 Use this taxonomy to classify issues found during dogfood QA testing.
 ## Severity Levels
 ### Critical
 The issue makes a core feature completely unusable or causes data loss.
 **Examples:**
 - Application crashes or shows a blank white page
 - Form submission silently loses user data
 - Authentication is completely broken (can't log in at all)
 - Payment flow fails and charges the user without completing the order
 - Security vulnerability (e.g., XSS, exposed credentials in console)
 ### High
 The issue significantly impairs functionality but a workaround may exist.
 **Examples:**
 - A key button does nothing when clicked (but refreshing fixes it)
 - Search returns no results for valid queries
 - Form validation rejects valid input
 - Page loads but critical content is missing or garbled
 - Navigation link leads to a 404 or wrong page
 - Uncaught JavaScript exceptions in the console on core pages
 ### Medium
 The issue is noticeable and affects user experience but doesn't block core functionality.
 **Examples:**
 - Layout is misaligned or overlapping on certain screen sections
 - Images fail to load (broken image icons)
 - Slow performance (visible loading delays > 3 seconds)
 - Form field lacks proper validation feedback (no error message on bad input)
 - Console warnings that suggest deprecated or misconfigured features
 - Inconsistent styling between similar pages
 ### Low
 Minor polish issues that don't affect functionality.
 **Examples:**
 - Typos or grammatical errors in text content
 - Minor spacing or alignment inconsistencies
 - Placeholder text left in production ("Lorem ipsum")
 - Favicon missing
 - Console info/debug messages that shouldn't be in production
 - Subtle color contrast issues that don't fail WCAG requirements
 ## Categories
 ### Functional
 Issues where features don't work as expected.
 - Buttons/links that don't respond
 - Forms that don't submit or submit incorrectly
 - Broken user flows (can't complete a multi-step process)
 - Incorrect data displayed
 - Features that work partially
 ### Visual
 Issues with the visual presentation of the page.
 - Layout problems (overlapping elements, broken grids)
 - Broken images or missing media
 - Styling inconsistencies
 - Responsive design failures
 - Z-index issues (elements hidden behind others)
 - Text overflow or truncation
 ### Accessibility
 Issues that prevent or hinder access for users with disabilities.
 - Missing alt text on meaningful images
 - Poor color contrast (fails WCAG AA)
 - Elements not reachable via keyboard navigation
 - Missing form labels or ARIA attributes
 - Focus indicators missing or unclear
 - Screen reader incompatible content
 ### Console
 Issues detected through JavaScript console output.
 - Uncaught exceptions and unhandled promise rejections
 - Failed network requests (4xx, 5xx errors in console)
 - Deprecation warnings
 - CORS errors
 - Mixed content warnings (HTTP resources on HTTPS page)
 - Excessive console.log output left from development
 ### UX (User Experience)
 Issues where functionality works but the experience is poor.
 - Confusing navigation or information architecture
 - Missing loading indicators (user doesn't know something is happening)
 - No feedback after user actions (e.g., button click with no visible result)
 - Inconsistent interaction patterns
 - Missing confirmation dialogs for destructive actions
 - Poor error messages that don't help the user recover
 ### Content
 Issues with the text, media, or information on the page.
 - Typos and grammatical errors
 - Placeholder/dummy content in production
 - Outdated information
 - Missing content (empty sections)
 - Broken or dead links to external resources
 - Incorrect or misleading labels
--- a/skills/dogfood/templates/dogfood-report-template.md
+++ b/skills/dogfood/templates/dogfood-report-template.md
@ -0,0 +1,86 @@
 # Dogfood QA Report
 **Target:** {target_url}
 **Date:** {date}
 **Scope:** {scope_description}
 **Tester:** Hermes Agent (automated exploratory QA)
 ---
 ## Executive Summary
 | Severity | Count |
 |----------|-------|
 | 🔴 Critical | {critical_count} |
 | 🟠 High | {high_count} |
 | 🟡 Medium | {medium_count} |
 | 🔵 Low | {low_count} |
 | **Total** | **{total_count}** |
 **Overall Assessment:** {one_sentence_assessment}
 ---
 ## Issues
 <!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
 ### Issue #{issue_number}: {issue_title}
 | Field | Value |
 |-------|-------|
 | **Severity** | {severity} |
 | **Category** | {category} |
 | **URL** | {url_where_found} |
 **Description:**
 {detailed_description_of_the_issue}
 **Steps to Reproduce:**
 1. {step_1}
 2. {step_2}
 3. {step_3}
 **Expected Behavior:**
 {what_should_happen}
 **Actual Behavior:**
 {what_actually_happens}
 **Screenshot:**
 MEDIA:{screenshot_path}
 **Console Errors** (if applicable):
 ```
 {console_error_output}
 ```
 ---
 <!-- End of per-issue section -->
 ## Issues Summary Table
 | # | Title | Severity | Category | URL |
 |---|-------|----------|----------|-----|
 | {n} | {title} | {severity} | {category} | {url} |
 ## Testing Coverage
 ### Pages Tested
 - {list_of_pages_visited}
 ### Features Tested
 - {list_of_features_exercised}
 ### Not Tested / Out of Scope
 - {areas_not_covered_and_why}
 ### Blockers
 - {any_issues_that_prevented_testing_certain_areas}
 ---
 ## Notes
 {any_additional_observations_or_recommendations}
--- a/tests/tools/test_browser_console.py
+++ b/tests/tools/test_browser_console.py
@ -0,0 +1,276 @@
 """Tests for browser_console tool and browser_vision annotate param."""
 import json
 import os
 import sys
 from unittest.mock import patch, MagicMock
 import pytest
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
 # ── browser_console ──────────────────────────────────────────────────
 class TestBrowserConsole:
    """browser_console() returns console messages + JS errors in one call."""
    def test_returns_console_messages_and_errors(self):
        from tools.browser_tool import browser_console
        console_response = {
            "success": True,
            "data": {
                "messages": [
                    {"text": "hello", "type": "log", "timestamp": 1},
                    {"text": "oops", "type": "error", "timestamp": 2},
                ]
            },
        }
        errors_response = {
            "success": True,
            "data": {
                "errors": [
                    {"message": "Uncaught TypeError", "timestamp": 3},
                ]
            },
        }
        with patch("tools.browser_tool._run_browser_command") as mock_cmd:
            mock_cmd.side_effect = [console_response, errors_response]
            result = json.loads(browser_console(task_id="test"))
        assert result["success"] is True
        assert result["total_messages"] == 2
        assert result["total_errors"] == 1
        assert result["console_messages"][0]["text"] == "hello"
        assert result["console_messages"][1]["text"] == "oops"
        assert result["js_errors"][0]["message"] == "Uncaught TypeError"
    def test_passes_clear_flag(self):
        from tools.browser_tool import browser_console
        empty = {"success": True, "data": {"messages": [], "errors": []}}
        with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
            browser_console(clear=True, task_id="test")
        calls = mock_cmd.call_args_list
        # Both console and errors should get --clear
        assert calls[0][0] == ("test", "console", ["--clear"])
        assert calls[1][0] == ("test", "errors", ["--clear"])
    def test_no_clear_by_default(self):
        from tools.browser_tool import browser_console
        empty = {"success": True, "data": {"messages": [], "errors": []}}
        with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
            browser_console(task_id="test")
        calls = mock_cmd.call_args_list
        assert calls[0][0] == ("test", "console", [])
        assert calls[1][0] == ("test", "errors", [])
    def test_empty_console_and_errors(self):
        from tools.browser_tool import browser_console
        empty = {"success": True, "data": {"messages": [], "errors": []}}
        with patch("tools.browser_tool._run_browser_command", return_value=empty):
            result = json.loads(browser_console(task_id="test"))
        assert result["total_messages"] == 0
        assert result["total_errors"] == 0
        assert result["console_messages"] == []
        assert result["js_errors"] == []
    def test_handles_failed_commands(self):
        from tools.browser_tool import browser_console
        failed = {"success": False, "error": "No session"}
        with patch("tools.browser_tool._run_browser_command", return_value=failed):
            result = json.loads(browser_console(task_id="test"))
        # Should still return success with empty data
        assert result["success"] is True
        assert result["total_messages"] == 0
        assert result["total_errors"] == 0
 # ── browser_console schema ───────────────────────────────────────────
 class TestBrowserConsoleSchema:
    """browser_console is properly registered in the tool registry."""
    def test_schema_in_browser_schemas(self):
        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
        names = [s["name"] for s in BROWSER_TOOL_SCHEMAS]
        assert "browser_console" in names
    def test_schema_has_clear_param(self):
        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
        schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_console")
        props = schema["parameters"]["properties"]
        assert "clear" in props
        assert props["clear"]["type"] == "boolean"
 # ── browser_vision annotate ──────────────────────────────────────────
 class TestBrowserVisionAnnotate:
    """browser_vision supports annotate parameter."""
    def test_schema_has_annotate_param(self):
        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
        schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_vision")
        props = schema["parameters"]["properties"]
        assert "annotate" in props
        assert props["annotate"]["type"] == "boolean"
    def test_annotate_false_no_flag(self):
        """Without annotate, screenshot command has no --annotate flag."""
        from tools.browser_tool import browser_vision
        with (
            patch("tools.browser_tool._run_browser_command") as mock_cmd,
            patch("tools.browser_tool._aux_vision_client") as mock_client,
            patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
            patch("tools.browser_tool._get_vision_model", return_value="test-model"),
        ):
            mock_cmd.return_value = {"success": True, "data": {}}
            # Will fail at screenshot file read, but we can check the command
            try:
                browser_vision("test", annotate=False, task_id="test")
            except Exception:
                pass
            if mock_cmd.called:
                args = mock_cmd.call_args[0]
                cmd_args = args[2] if len(args) > 2 else []
                assert "--annotate" not in cmd_args
    def test_annotate_true_adds_flag(self):
        """With annotate=True, screenshot command includes --annotate."""
        from tools.browser_tool import browser_vision
        with (
            patch("tools.browser_tool._run_browser_command") as mock_cmd,
            patch("tools.browser_tool._aux_vision_client") as mock_client,
            patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
            patch("tools.browser_tool._get_vision_model", return_value="test-model"),
        ):
            mock_cmd.return_value = {"success": True, "data": {}}
            try:
                browser_vision("test", annotate=True, task_id="test")
            except Exception:
                pass
            if mock_cmd.called:
                args = mock_cmd.call_args[0]
                cmd_args = args[2] if len(args) > 2 else []
                assert "--annotate" in cmd_args
 # ── auto-recording config ────────────────────────────────────────────
 class TestRecordSessionsConfig:
    """browser.record_sessions config option."""
    def test_default_config_has_record_sessions(self):
        from hermes_cli.config import DEFAULT_CONFIG
        browser_cfg = DEFAULT_CONFIG.get("browser", {})
        assert "record_sessions" in browser_cfg
        assert browser_cfg["record_sessions"] is False
    def test_maybe_start_recording_disabled(self):
        """Recording doesn't start when config says record_sessions: false."""
        from tools.browser_tool import _maybe_start_recording, _recording_sessions
        with (
            patch("tools.browser_tool._run_browser_command") as mock_cmd,
            patch("builtins.open", side_effect=FileNotFoundError),
        ):
            _maybe_start_recording("test-task")
        mock_cmd.assert_not_called()
        assert "test-task" not in _recording_sessions
    def test_maybe_stop_recording_noop_when_not_recording(self):
        """Stopping when not recording is a no-op."""
        from tools.browser_tool import _maybe_stop_recording, _recording_sessions
        _recording_sessions.discard("test-task")  # ensure not in set
        with patch("tools.browser_tool._run_browser_command") as mock_cmd:
            _maybe_stop_recording("test-task")
        mock_cmd.assert_not_called()
 # ── dogfood skill files ──────────────────────────────────────────────
 class TestDogfoodSkill:
    """Dogfood skill files exist and have correct structure."""
    @pytest.fixture(autouse=True)
    def _skill_dir(self):
        # Use the actual repo skills dir (not temp)
        self.skill_dir = os.path.join(
            os.path.dirname(__file__), "..", "..", "skills", "dogfood"
        )
    def test_skill_md_exists(self):
        assert os.path.exists(os.path.join(self.skill_dir, "SKILL.md"))
    def test_taxonomy_exists(self):
        assert os.path.exists(
            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
        )
    def test_report_template_exists(self):
        assert os.path.exists(
            os.path.join(self.skill_dir, "templates", "dogfood-report-template.md")
        )
    def test_skill_md_has_frontmatter(self):
        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
            content = f.read()
        assert content.startswith("---")
        assert "name: dogfood" in content
        assert "description:" in content
    def test_skill_references_browser_console(self):
        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
            content = f.read()
        assert "browser_console" in content
    def test_skill_references_annotate(self):
        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
            content = f.read()
        assert "annotate" in content
    def test_taxonomy_has_severity_levels(self):
        with open(
            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
        ) as f:
            content = f.read()
        assert "Critical" in content
        assert "High" in content
        assert "Medium" in content
        assert "Low" in content
    def test_taxonomy_has_categories(self):
        with open(
            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
        ) as f:
            content = f.read()
        assert "Functional" in content
        assert "Visual" in content
        assert "Accessibility" in content
        assert "Console" in content
--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@ -144,6 +144,7 @@ def _socket_safe_tmpdir() -> str:
 # Track active sessions per task
 # Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
 _active_sessions: Dict[str, Dict[str, str]] = {}  # task_id -> {session_name, ...}
 _recording_sessions: set = set()  # task_ids with active recordings
 # Flag to track if cleanup has been done
 _cleanup_done = False
@ -478,11 +479,31 @@ BROWSER_TOOL_SCHEMAS = [
                "question": {
                    "type": "string",
                    "description": "What you want to know about the page visually. Be specific about what you're looking for."
                },
                "annotate": {
                    "type": "boolean",
                    "default": False,
                    "description": "If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout."
                }
            },
            "required": ["question"]
        }
    },
    {
        "name": "browser_console",
        "description": "Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first.",
        "parameters": {
            "type": "object",
            "properties": {
                "clear": {
                    "type": "boolean",
                    "default": False,
                    "description": "If true, clear the message buffers after reading"
                }
            },
            "required": []
        }
    },
 ]
@ -998,9 +1019,10 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
    session_info = _get_session_info(effective_task_id)
    is_first_nav = session_info.get("_first_nav", True)
-    # Mark that we've done at least one navigation
+    # Auto-start recording if configured and this is first navigation
    if is_first_nav:
        session_info["_first_nav"] = False
        _maybe_start_recording(effective_task_id)
    result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
@ -1264,6 +1286,10 @@ def browser_close(task_id: Optional[str] = None) -> str:
        JSON string with close result
    """
    effective_task_id = task_id or "default"
    # Stop auto-recording before closing
    _maybe_stop_recording(effective_task_id)
    result = _run_browser_command(effective_task_id, "close", [])
    # Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
@ -1294,6 +1320,103 @@ def browser_close(task_id: Optional[str] = None) -> str:
        }, ensure_ascii=False)
 def browser_console(clear: bool = False, task_id: Optional[str] = None) -> str:
    """Get browser console messages and JavaScript errors.
    Returns both console output (log/warn/error/info from the page's JS)
    and uncaught exceptions (crashes, unhandled promise rejections).
    Args:
        clear: If True, clear the message/error buffers after reading
        task_id: Task identifier for session isolation
    Returns:
        JSON string with console messages and JS errors
    """
    effective_task_id = task_id or "default"
    console_args = ["--clear"] if clear else []
    error_args = ["--clear"] if clear else []
    console_result = _run_browser_command(effective_task_id, "console", console_args)
    errors_result = _run_browser_command(effective_task_id, "errors", error_args)
    messages = []
    if console_result.get("success"):
        for msg in console_result.get("data", {}).get("messages", []):
            messages.append({
                "type": msg.get("type", "log"),
                "text": msg.get("text", ""),
                "source": "console",
            })
    errors = []
    if errors_result.get("success"):
        for err in errors_result.get("data", {}).get("errors", []):
            errors.append({
                "message": err.get("message", ""),
                "source": "exception",
            })
    return json.dumps({
        "success": True,
        "console_messages": messages,
        "js_errors": errors,
        "total_messages": len(messages),
        "total_errors": len(errors),
    }, ensure_ascii=False)
 def _maybe_start_recording(task_id: str):
    """Start recording if browser.record_sessions is enabled in config."""
    if task_id in _recording_sessions:
        return
    try:
        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
        config_path = hermes_home / "config.yaml"
        record_enabled = False
        if config_path.exists():
            import yaml
            with open(config_path) as f:
                cfg = yaml.safe_load(f) or {}
            record_enabled = cfg.get("browser", {}).get("record_sessions", False)
        if not record_enabled:
            return
        recordings_dir = hermes_home / "browser_recordings"
        recordings_dir.mkdir(parents=True, exist_ok=True)
        _cleanup_old_recordings(max_age_hours=72)
        import time
        timestamp = time.strftime("%Y%m%d_%H%M%S")
        recording_path = recordings_dir / f"session_{timestamp}_{task_id[:16]}.webm"
        result = _run_browser_command(task_id, "record", ["start", str(recording_path)])
        if result.get("success"):
            _recording_sessions.add(task_id)
            logger.info("Auto-recording browser session %s to %s", task_id, recording_path)
        else:
            logger.debug("Could not start auto-recording: %s", result.get("error"))
    except Exception as e:
        logger.debug("Auto-recording setup failed: %s", e)
 def _maybe_stop_recording(task_id: str):
    """Stop recording if one is active for this session."""
    if task_id not in _recording_sessions:
        return
    try:
        result = _run_browser_command(task_id, "record", ["stop"])
        if result.get("success"):
            path = result.get("data", {}).get("path", "")
            logger.info("Saved browser recording for session %s: %s", task_id, path)
    except Exception as e:
        logger.debug("Could not stop recording for %s: %s", task_id, e)
    finally:
        _recording_sessions.discard(task_id)
 def browser_get_images(task_id: Optional[str] = None) -> str:
    """
    Get all images on the current page.
@ -1348,7 +1471,7 @@ def browser_get_images(task_id: Optional[str] = None) -> str:
        }, ensure_ascii=False)
-def browser_vision(question: str, task_id: Optional[str] = None) -> str:
+def browser_vision(question: str, annotate: bool = False, task_id: Optional[str] = None) -> str:
    """
    Take a screenshot of the current page and analyze it with vision AI.
@ -1362,6 +1485,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
    Args:
        question: What you want to know about the page visually
        annotate: If True, overlay numbered [N] labels on interactive elements
        task_id: Task identifier for session isolation
    Returns:
@ -1393,10 +1517,13 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
        _cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
        # Take screenshot using agent-browser
        screenshot_args = [str(screenshot_path)]
        if annotate:
            screenshot_args.insert(0, "--annotate")
        result = _run_browser_command(
            effective_task_id, 
            "screenshot", 
-            [str(screenshot_path)],
+            screenshot_args,
            timeout=30
        )
@ -1456,11 +1583,15 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
        )
        analysis = response.choices[0].message.content
-        return json.dumps({
+        response_data = {
            "success": True,
            "analysis": analysis,
            "screenshot_path": str(screenshot_path),
-        }, ensure_ascii=False)
+        }
        # Include annotation data if annotated screenshot was taken
        if annotate and result.get("data", {}).get("annotations"):
            response_data["annotations"] = result["data"]["annotations"]
        return json.dumps(response_data, ensure_ascii=False)
    except Exception as e:
        # Keep the screenshot if it was captured successfully — the failure is
@ -1490,6 +1621,25 @@ def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
        pass  # Non-critical — don't fail the screenshot operation
 def _cleanup_old_recordings(max_age_hours=72):
    """Remove browser recordings older than max_age_hours to prevent disk bloat."""
    import time
    try:
        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
        recordings_dir = hermes_home / "browser_recordings"
        if not recordings_dir.exists():
            return
        cutoff = time.time() - (max_age_hours * 3600)
        for f in recordings_dir.glob("session_*.webm"):
            try:
                if f.stat().st_mtime < cutoff:
                    f.unlink()
            except Exception:
                pass
    except Exception:
        pass
 # ============================================================================
 # Cleanup and Management Functions
 # ============================================================================
@ -1561,6 +1711,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
        bb_session_id = session_info.get("bb_session_id", "unknown")
        logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id)
        # Stop auto-recording before closing (saves the file)
        _maybe_stop_recording(task_id)
        # Try to close via agent-browser first (needs session in _active_sessions)
        try:
            _run_browser_command(task_id, "close", [], timeout=10)
@ -1776,6 +1929,13 @@ registry.register(
    name="browser_vision",
    toolset="browser",
    schema=_BROWSER_SCHEMA_MAP["browser_vision"],
-    handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")),
+    handler=lambda args, **kw: browser_vision(question=args.get("question", ""), annotate=args.get("annotate", False), task_id=kw.get("task_id")),
    check_fn=check_browser_requirements,
 )
 registry.register(
    name="browser_console",
    toolset="browser",
    schema=_BROWSER_SCHEMA_MAP["browser_console"],
    handler=lambda args, **kw: browser_console(clear=args.get("clear", False), task_id=kw.get("task_id")),
    check_fn=check_browser_requirements,
 )
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -620,6 +620,16 @@ code_execution:
  max_tool_calls: 50           # Max tool calls within code execution
 ```
 ## Browser
 Configure browser automation behavior:
 ```yaml
 browser:
  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
 ```
 ## Delegation
 Configure subagent behavior for the delegate tool:
--- a/website/docs/user-guide/features/browser.md
+++ b/website/docs/user-guide/features/browser.md
@ -142,6 +142,16 @@ What does the chart on this page show?
 Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
 ### `browser_console`
 Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
 ```
 Check the browser console for any JavaScript errors
 ```
 Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
 ### `browser_close`
 Close the browser session and release resources. Call this when done to free up Browserbase session quota.
@ -175,6 +185,17 @@ Agent workflow:
 4. browser_close()
 ```
 ## Session Recording
 Automatically record browser sessions as WebM video files:
 ```yaml
 browser:
  record_sessions: true  # default: false
 ```
 When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
 ## Stealth Features
 Browserbase provides automatic stealth capabilities:
--- a/website/docs/user-guide/features/tools.md
+++ b/website/docs/user-guide/features/tools.md
@ -15,7 +15,7 @@ Tools are functions that extend the agent's capabilities. They're organized into
 | **Web** | `web_search`, `web_extract` | Search the web, extract page content |
 | **Terminal** | `terminal`, `process` | Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes |
 | **File** | `read_file`, `write_file`, `patch`, `search_files` | Read, write, edit, and search files |
-| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
+| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, `browser_console`, etc. | Full browser automation via Browserbase |
 | **Vision** | `vision_analyze` | Image analysis via multimodal models |
 | **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
 | **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |