refactor(web): dispatch all three tools through web_search_registry

Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in
tools/web_tools.py to dispatch through agent.web_search_registry
instead of the legacy hardcoded if-elif backend chains.

Per-tool changes:

  web_search_tool (sync)
    Replace 5 backend branches (parallel, exa, registry-3-providers,
    tavily, firecrawl-fallthrough) with a single registry path:
      1. _get_search_backend() resolves the configured name
      2. _wsp_get_provider(name) for explicit-config-wins semantics
      3. get_active_search_provider() fallback for typo / unknown name
      4. provider.search(query, limit) — sync for all 7 providers

  web_extract_tool (async)
    Replace 4 backend branches (parallel-async, exa-sync, tavily-sync,
    search-only-error, firecrawl-perurl-loop) with:
      1. Same provider resolution as search.
      2. When configured backend IS registered but doesn't support
         extract (search-only providers like brave-free), surface a
         typed "search-only" error matching the legacy text — tests
         assert that wording.
      3. inspect.iscoroutinefunction(provider.extract) detects sync vs
         async: parallel + firecrawl are async; exa + tavily are sync.
         Sync extracts run in asyncio.to_thread() so we don't block.

  web_crawl_tool (async)
    Replace tavily-specific branch + search-only-error block with:
      1. _wsp_get_provider(backend) — explicit config first
      2. Search-only typed error when the configured name doesn't
         support crawl (matches legacy phrasing)
      3. get_active_crawl_provider() fallback otherwise
      4. provider.crawl(url, **kwargs) — async-or-sync dispatch as above
      5. Response post-processing (LLM summarization, trimming) stays
         unchanged — it's not provider-specific.
    When no plugin advertises supports_crawl, falls through to the
    existing Firecrawl-via-web-summarize path below (unchanged).

Test updates (2 tests in tests/tools/test_web_tools_config.py):
  - test_web_search_clamps_limit_before_backend_call:
      patch("tools.web_tools._parallel_search") -> patch the registry
      provider returned by agent.web_search_registry.get_provider
  - test_search_error_response_does_not_expose_diagnostics:
      patch("tools.web_tools._get_firecrawl_client") -> same pattern

Tests unchanged (still pass):
  - All TestXBackendWiring classes (test _get_backend / _is_backend_available
    config-resolution, independent of dispatch)
  - All TestXSearchOnlyErrors classes (test the search-only error path
    via web_extract_tool / web_crawl_tool — error text preserved)
  - 141 passing web tests total, 0 regressions.

Dead-code cleanup deferred to a follow-up commit so this diff stays
focused on the cutover. After this commit:
  - tools.web_tools._exa_search / _exa_extract / _parallel_search /
    _parallel_extract / _tavily_request / _normalize_tavily_* /
    _get_firecrawl_client / _extract_web_search_results /
    _extract_scrape_payload / _to_plain_object / _normalize_result_list
    are no longer called by the dispatchers, but still exist.
  - The config-resolution layer (_get_backend, _is_backend_available,
    _is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in
    use and must stay.
  - The Firecrawl proxy and check_firecrawl_api_key are still imported
    by integration tests and patched by unit tests — must stay (or be
    re-exported from the plugin).
This commit is contained in:
kshitijk4poor 2026-05-14 00:26:42 +05:30 committed by Teknium
parent 143184e943
commit b05253ceed
2 changed files with 175 additions and 238 deletions

View file

@ -485,15 +485,28 @@ class TestWebSearchSchema:
def test_web_search_clamps_limit_before_backend_call(self):
import tools.web_tools
with patch("tools.web_tools._get_backend", return_value="parallel"), \
patch("tools.web_tools._parallel_search", return_value={"success": True, "data": {"web": []}}) as mock_search, \
# After the web-provider plugin migration, _parallel_search lives in
# plugins.web.parallel.provider.ParallelWebSearchProvider.search; the
# tool dispatcher resolves a provider from the registry and calls
# provider.search(query, limit). Mock the provider lookup so we can
# assert the limit is clamped before reaching the backend.
fake_search = MagicMock(return_value={"success": True, "data": {"web": []}})
fake_provider = MagicMock(
name="ParallelWebSearchProvider",
supports_search=MagicMock(return_value=True),
)
fake_provider.search = fake_search
fake_provider.name = "parallel"
with patch("tools.web_tools._get_search_backend", return_value="parallel"), \
patch("agent.web_search_registry.get_provider", return_value=fake_provider), \
patch("tools.interrupt.is_interrupted", return_value=False), \
patch.object(tools.web_tools._debug, "log_call"), \
patch.object(tools.web_tools._debug, "save"):
result = json.loads(tools.web_tools.web_search_tool("docs", limit=500))
assert result == {"success": True, "data": {"web": []}}
mock_search.assert_called_once_with("docs", 100)
fake_search.assert_called_once_with("docs", 100)
class TestWebSearchErrorHandling:
@ -502,11 +515,19 @@ class TestWebSearchErrorHandling:
def test_search_error_response_does_not_expose_diagnostics(self):
import tools.web_tools
firecrawl_client = MagicMock()
firecrawl_client.search.side_effect = RuntimeError("boom")
# After the web-provider plugin migration, the firecrawl client lives
# at plugins.web.firecrawl.provider._get_firecrawl_client. We mock the
# registry's get_provider to return a fake provider whose .search()
# raises so we can verify error sanitization.
fake_provider = MagicMock(
name="FirecrawlWebSearchProvider",
supports_search=MagicMock(return_value=True),
)
fake_provider.search.side_effect = RuntimeError("boom")
fake_provider.name = "firecrawl"
with patch("tools.web_tools._get_backend", return_value="firecrawl"), \
patch("tools.web_tools._get_firecrawl_client", return_value=firecrawl_client), \
with patch("tools.web_tools._get_search_backend", return_value="firecrawl"), \
patch("agent.web_search_registry.get_provider", return_value=fake_provider), \
patch("tools.interrupt.is_interrupted", return_value=False), \
patch.object(tools.web_tools._debug, "log_call") as mock_log_call, \
patch.object(tools.web_tools._debug, "save"):