mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.
What moved into the plugin (all previously in tools/web_tools.py):
Lazy Firecrawl SDK proxy
- _load_firecrawl_cls() — caches the imported SDK class
- _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
imports until first construction or isinstance check.
Client construction (dual auth)
- _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path
- _get_firecrawl_gateway_url() — managed Nous tool-gateway URL
- _is_tool_gateway_ready() — gateway URL + Nous token check
- _has_direct_firecrawl_config() — direct config present?
- _get_firecrawl_client() — combined client construction
honoring web.use_gateway
- check_firecrawl_api_key() — top-level "is firecrawl usable"
- _firecrawl_backend_help_suffix() — managed-gateway help string
- _raise_web_backend_configuration_error() — typed misconfig error
Response shape normalization (vendor-specific)
- _to_plain_object(), _normalize_result_list() — SDK→dict helpers
- _extract_web_search_results() — handles SDK/direct/gateway shapes
- _extract_scrape_payload() — nested-data unwrap for scrape
Per-URL extract loop
- 60s asyncio.wait_for timeout per URL
- Pre-scrape website-policy gate
- Post-scrape redirect-aware SSRF re-check
- Format-aware content selection (markdown / html / auto)
- Per-URL errors returned as {"error": str} entries, no raises
Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.
The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.
Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.
Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.
E2E verified:
- All 7 providers register: brave-free, ddgs, exa, firecrawl,
parallel, searxng, tavily
- inspect.iscoroutinefunction(firecrawl.extract) -> True
- Firecrawl proxy is a callable lazy proxy at module level
- check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| _parser.py | ||
| _subprocess_compat.py | ||
| auth.py | ||
| auth_commands.py | ||
| azure_detect.py | ||
| backup.py | ||
| banner.py | ||
| browser_connect.py | ||
| callbacks.py | ||
| checkpoints.py | ||
| claw.py | ||
| cli_output.py | ||
| clipboard.py | ||
| codex_models.py | ||
| codex_runtime_plugin_migration.py | ||
| codex_runtime_switch.py | ||
| colors.py | ||
| commands.py | ||
| completion.py | ||
| config.py | ||
| copilot_auth.py | ||
| cron.py | ||
| curator.py | ||
| curses_ui.py | ||
| debug.py | ||
| default_soul.py | ||
| dingtalk_auth.py | ||
| doctor.py | ||
| dump.py | ||
| env_loader.py | ||
| fallback_cmd.py | ||
| gateway.py | ||
| gateway_windows.py | ||
| goals.py | ||
| hooks.py | ||
| inventory.py | ||
| kanban.py | ||
| kanban_db.py | ||
| kanban_diagnostics.py | ||
| kanban_specify.py | ||
| logs.py | ||
| main.py | ||
| mcp_config.py | ||
| memory_setup.py | ||
| model_catalog.py | ||
| model_normalize.py | ||
| model_switch.py | ||
| models.py | ||
| nous_subscription.py | ||
| oneshot.py | ||
| pairing.py | ||
| platforms.py | ||
| plugins.py | ||
| plugins_cmd.py | ||
| profile_distribution.py | ||
| profiles.py | ||
| providers.py | ||
| pt_input_extras.py | ||
| pty_bridge.py | ||
| relaunch.py | ||
| runtime_provider.py | ||
| security_advisories.py | ||
| setup.py | ||
| skills_config.py | ||
| skills_hub.py | ||
| skin_engine.py | ||
| slack_cli.py | ||
| status.py | ||
| stdio.py | ||
| timeouts.py | ||
| tips.py | ||
| tools_config.py | ||
| uninstall.py | ||
| vercel_auth.py | ||
| voice.py | ||
| web_server.py | ||
| webhook.py | ||