hermes-agent/plugins/browser/firecrawl/provider.py
kshitijk4poor c74ff2c8ef fix(browser): self-review pass — dead-import, log levels, future-proofing
Addresses findings from two self-review passes pre-merge.

First pass (3-agent parallel review):

1. plugins/browser/browser_use/provider.py: drop the
   ``_ = managed_nous_tools_enabled`` dead-import-hider in
   _get_config_or_none(). The import was actively misleading — the
   helper IS used in _get_config() (separate method, separate import),
   not here. The "keep static analysis happy" comment was wrong about
   what the helper does in this scope.

2. agent/browser_provider.py: drop ``pragma: no cover`` from
   is_configured() / provider_name() backward-compat aliases. They ARE
   covered by ``TestLegacyAbcAliases`` — the pragma would have masked
   future regressions.

3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden()
   to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot
   instead of hardcoded set of 3 keys. Future maintainers adding a 4th
   built-in provider now just extend _PROVIDER_REGISTRY; the override
   detection adapts automatically. Previously the hardcoded
   ``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip
   True forever on any 4-key registry, silently routing every install
   onto the legacy fixture path.

4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set
   but the registry has no matching plugin (typo, uninstalled plugin,
   discovery failure), emit a WARNING with actionable text instead of
   silently falling through to auto-detect. Legacy code surfaced a typed
   credentials error via direct class instantiation; this log restores
   the signal in the post-migration path.

5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE
   documentation. Module docstring + 13-line block-comment + 5-line
   inline comment was repeating the same point. Kept the docstring and
   trimmed the block-comment to 5 lines.

6. agent/browser_registry.py: upgrade is_available()-raised logging from
   DEBUG to WARNING with exc_info=True. A provider's availability check
   throwing is unusual enough that users debugging "no cloud provider"
   need the traceback in logs.

7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level
   imports (os, shutil, tempfile — only referenced inside the
   SUBPROCESS_SCRIPT string literal that runs in a child process).

Second pass (architecture + claim-verification review):

8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider
   auto-detect branch. Prior text claimed it "routes through the plugin
   registry's legacy preference walk so third-party plugins still get a
   chance to be selected when they're explicitly configured" — false on
   both counts. The branch uses module-level legacy class aliases
   (BrowserUseProvider / BrowserbaseProvider) directly; third-party
   plugins are intentionally reachable only via explicit
   ``browser.cloud_provider``. Corrected comment now matches behaviour
   and cross-references _LEGACY_PREFERENCE for the firecrawl gate
   rationale.

9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py:
   drop the unused ``get_active_browser_provider as
   _registry_get_active_browser_provider`` alias from the
   ``from agent.browser_registry import ...`` block. It was never
   referenced; matching test-stub line in the agent.browser_registry
   SimpleNamespace also dropped. ``get_provider`` is still imported (used
   by the explicit-config dispatch path at line 535).

10. plugins/browser/firecrawl/provider.py: align emergency_cleanup()
    with the early-guard pattern used in browserbase + browser_use
    plugins. Previously firecrawl tried the DELETE and relied on
    ``_headers()`` raising ValueError to trip a "missing credentials"
    warning; same final outcome but a different control flow that read
    like a bug to a maintainer skimming the three modules. Now: if
    is_available() is False, log+return early — identical shape to the
    other two providers.

Verification: 54/54 unit tests + 13/13 parity scenarios still pass.
2026-05-17 04:04:15 -07:00

163 lines
5.2 KiB
Python

"""Firecrawl cloud browser provider — plugin form.
Subclasses :class:`agent.browser_provider.BrowserProvider` (the plugin-facing
ABC introduced in PR #25214). The legacy in-tree module
``tools.browser_providers.firecrawl`` was removed in the same PR; this file
is now the canonical implementation.
This is the cloud-browser path — distinct from the firecrawl WEB plugin at
``plugins/web/firecrawl/`` which handles search/extract/crawl on
``/v2/search`` / ``/v2/scrape`` / ``/v2/crawl``. The two plugins share the
``FIRECRAWL_API_KEY`` env var but talk to different endpoints (this one
hits ``/v2/browser``).
Config keys this provider responds to::
browser:
cloud_provider: "firecrawl" # explicit selection only — not in the
# legacy auto-detect walk
Auth env vars::
FIRECRAWL_API_KEY=... # https://firecrawl.dev
FIRECRAWL_API_URL=... # optional override (default https://api.firecrawl.dev)
FIRECRAWL_BROWSER_TTL=... # optional, default 300 seconds
"""
from __future__ import annotations
import logging
import os
import uuid
from typing import Any, Dict
import requests
from agent.browser_provider import BrowserProvider
logger = logging.getLogger(__name__)
_BASE_URL = "https://api.firecrawl.dev"
class FirecrawlBrowserProvider(BrowserProvider):
"""Firecrawl (https://firecrawl.dev) cloud browser backend.
Cloud-browser path only — search/extract/crawl live in the separate
``plugins/web/firecrawl/`` plugin.
"""
@property
def name(self) -> str:
return "firecrawl"
@property
def display_name(self) -> str:
return "Firecrawl"
def is_available(self) -> bool:
return bool(os.environ.get("FIRECRAWL_API_KEY"))
# ------------------------------------------------------------------
# Session lifecycle
# ------------------------------------------------------------------
def _api_url(self) -> str:
return os.environ.get("FIRECRAWL_API_URL", _BASE_URL)
def _headers(self) -> Dict[str, str]:
api_key = os.environ.get("FIRECRAWL_API_KEY")
if not api_key:
raise ValueError(
"FIRECRAWL_API_KEY environment variable is required. "
"Get your key at https://firecrawl.dev"
)
return {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}",
}
def create_session(self, task_id: str) -> Dict[str, object]:
ttl = int(os.environ.get("FIRECRAWL_BROWSER_TTL", "300"))
body: Dict[str, object] = {"ttl": ttl}
response = requests.post(
f"{self._api_url()}/v2/browser",
headers=self._headers(),
json=body,
timeout=30,
)
if not response.ok:
raise RuntimeError(
f"Failed to create Firecrawl browser session: "
f"{response.status_code} {response.text}"
)
data = response.json()
session_name = f"hermes_{task_id}_{uuid.uuid4().hex[:8]}"
logger.info("Created Firecrawl browser session %s", session_name)
return {
"session_name": session_name,
"bb_session_id": data["id"],
"cdp_url": data["cdpUrl"],
"features": {"firecrawl": True},
}
def close_session(self, session_id: str) -> bool:
try:
response = requests.delete(
f"{self._api_url()}/v2/browser/{session_id}",
headers=self._headers(),
timeout=10,
)
if response.status_code in {200, 201, 204}:
logger.debug("Successfully closed Firecrawl session %s", session_id)
return True
else:
logger.warning(
"Failed to close Firecrawl session %s: HTTP %s - %s",
session_id,
response.status_code,
response.text[:200],
)
return False
except Exception as e:
logger.error("Exception closing Firecrawl session %s: %s", session_id, e)
return False
def emergency_cleanup(self, session_id: str) -> None:
if not self.is_available():
logger.warning(
"Cannot emergency-cleanup Firecrawl session %s — missing credentials",
session_id,
)
return
try:
requests.delete(
f"{self._api_url()}/v2/browser/{session_id}",
headers=self._headers(),
timeout=5,
)
except Exception as e:
logger.debug(
"Emergency cleanup failed for Firecrawl session %s: %s", session_id, e
)
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Firecrawl",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{
"key": "FIRECRAWL_API_KEY",
"prompt": "Firecrawl API key",
"url": "https://firecrawl.dev",
},
],
"post_setup": "agent_browser",
}