mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.
What moved into the plugin (all previously in tools/web_tools.py):
Lazy Firecrawl SDK proxy
- _load_firecrawl_cls() — caches the imported SDK class
- _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
imports until first construction or isinstance check.
Client construction (dual auth)
- _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path
- _get_firecrawl_gateway_url() — managed Nous tool-gateway URL
- _is_tool_gateway_ready() — gateway URL + Nous token check
- _has_direct_firecrawl_config() — direct config present?
- _get_firecrawl_client() — combined client construction
honoring web.use_gateway
- check_firecrawl_api_key() — top-level "is firecrawl usable"
- _firecrawl_backend_help_suffix() — managed-gateway help string
- _raise_web_backend_configuration_error() — typed misconfig error
Response shape normalization (vendor-specific)
- _to_plain_object(), _normalize_result_list() — SDK→dict helpers
- _extract_web_search_results() — handles SDK/direct/gateway shapes
- _extract_scrape_payload() — nested-data unwrap for scrape
Per-URL extract loop
- 60s asyncio.wait_for timeout per URL
- Pre-scrape website-policy gate
- Post-scrape redirect-aware SSRF re-check
- Format-aware content selection (markdown / html / auto)
- Per-URL errors returned as {"error": str} entries, no raises
Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.
The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.
Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.
Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.
E2E verified:
- All 7 providers register: brave-free, ddgs, exa, firecrawl,
parallel, searxng, tavily
- inspect.iscoroutinefunction(firecrawl.extract) -> True
- Firecrawl proxy is a callable lazy proxy at module level
- check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence
|
||
|---|---|---|
| .. | ||
| brave_free | ||
| ddgs | ||
| exa | ||
| firecrawl | ||
| parallel | ||
| searxng | ||
| tavily | ||
| __init__.py | ||