fix(web): preserve firecrawl crawl + website-policy gate after migration

Two regressions discovered by running the full tests/tools/ suite after the dispatcher cutover, both fixed in this commit: 1. web_crawl_tool incorrectly errored "search-only" for firecrawl --------------------------------------------------------------------- The cutover treated any provider with supports_crawl()==False as a search-only backend and returned the typed search-only error. But firecrawl can crawl via the legacy multi-page-extract path inside web_crawl_tool — it just doesn't expose supports_crawl on the plugin (adding native firecrawl crawl is a clean follow-up). Fix: only emit the search-only error when the provider supports NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the provider supports extract but not crawl (firecrawl), fall through to the legacy firecrawl-via-extract path below. 2. firecrawl plugin's check_website_access wasn't patchable --------------------------------------------------------------------- The plugin imported `from tools.website_policy import check_website_access` INSIDE the extract() function body, so monkeypatching the name on plugins.web.firecrawl.provider had no effect — the inner import re-bound the name on every call. Fix: hoist the import to module level. Cheap (website_policy itself has no heavy deps) and makes the standard monkeypatch.setattr(firecrawl_provider, "check_website_access", ...) pattern work. Test updates (tests/tools/test_website_policy.py — 4 tests): - test_web_extract_short_circuits_blocked_url - test_web_extract_blocks_redirected_final_url Both: patch the gate at plugins.web.firecrawl.provider (where it runs after migration) and force the firecrawl plugin to be the active extract provider via FIRECRAWL_API_KEY. - test_web_crawl_short_circuits_blocked_url - test_web_crawl_blocks_redirected_final_url Both: unchanged — the dispatcher-level gate at tools.web_tools.py line 1651 still uses the imported `check_website_access` name and the firecrawl-fallthrough path is exercised as before. Verified: 22/22 tests/tools/test_website_policy.py pass.
2026-05-18 04:41:56 +00:00 · 2026-05-14 00:34:28 +05:30 · 2026-05-14 00:34:28 +05:30 · 5e54330e27
commit 5e54330e27
parent b05253ceed
3 changed files with 39 additions and 23 deletions
--- a/tools/web_tools.py
+++ b/tools/web_tools.py
@ -1618,22 +1618,26 @@ async def web_crawl_tool(

        crawl_provider = _wsp_get_provider(backend) if backend else None
        if crawl_provider is not None and not crawl_provider.supports_crawl():
-            # Configured name IS registered but doesn't support crawl
-            # (search-only providers like brave-free / ddgs / searxng).
-            # Surface a typed error rather than silently switching to a
-            # different crawl backend.
-            return json.dumps(
-                {
-                    "success": False,
-                    "error": (
-                        f"{crawl_provider.display_name} is a search-only "
-                        "backend and cannot crawl URLs. "
-                        "Set FIRECRAWL_API_KEY for crawling, or use "
-                        "web_search instead."
-                    ),
-                },
-                ensure_ascii=False,
-            )
+            # When the configured provider is search-only AND cannot
+            # extract URLs either (brave-free / ddgs / searxng), surface a
+            # typed "search-only" error rather than silently switching to
+            # a different crawl backend. When the provider supports extract
+            # but not crawl (e.g. firecrawl), fall through to the legacy
+            # firecrawl-via-extract path below.
+            if not crawl_provider.supports_extract():
+                return json.dumps(
+                    {
+                        "success": False,
+                        "error": (
+                            f"{crawl_provider.display_name} is a search-only "
+                            "backend and cannot crawl URLs. "
+                            "Set FIRECRAWL_API_KEY for crawling, or use "
+                            "web_search instead."
+                        ),
+                    },
+                    ensure_ascii=False,
+                )
+            crawl_provider = None  # let legacy firecrawl path handle it
        if crawl_provider is None:
            crawl_provider = get_active_crawl_provider()