fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup

Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.

Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
   `("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
   description and the legacy `_get_backend()` candidate order both
   call for the 7-tuple
   `(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
   Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
   and no config, `get_active_search_provider()` now returns tavily
   (was None); with EXA+PARALLEL it returns parallel (was None); with
   BRAVE+FIRECRAWL it returns firecrawl (was brave-free).

2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
   docstring, and inline comment all listed the old 4-tuple and claimed
   "brave-free first because it was the shipped default". The legacy
   default is `"firecrawl"`. Rewritten to match the new ordering and
   reference `tools.web_tools._get_backend()` as the source of truth.

3. `agent/web_search_registry.py` — `get_active_crawl_provider`
   docstring said "only Tavily implements it among built-in providers".
   Firecrawl also advertises `supports_crawl=True` after the previous
   commit. Updated to "Tavily and Firecrawl".

4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
   the only built-in backend that natively crawls". Updated.

5. `agent/web_search_provider.py` — ABC docstring mentioned only
   `search` / `extract` capabilities. Added `crawl` for accuracy.

6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
   cache globals (`_firecrawl_client`, `_parallel_client`,
   `_async_parallel_client`, `_exa_client`) were declared but never read
   (all reads/writes go through `_wt.*` per the `extracting-inline-
   helpers-to-plugins` recipe). Removed the dead declarations; the
   reset-for-tests helpers in firecrawl + parallel now clear the
   canonical `_wt._<name>` slots, matching the pattern exa already used.

Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
This commit is contained in:
kshitijk4poor 2026-05-14 02:02:01 +05:30 committed by Teknium
parent 21e3a863bb
commit 657e6d87cc
6 changed files with 82 additions and 48 deletions

View file

@ -112,9 +112,11 @@ Firecrawl = _FirecrawlProxy()
# ---------------------------------------------------------------------------
# Client construction (direct vs managed-gateway)
# ---------------------------------------------------------------------------
_firecrawl_client: Any = None
_firecrawl_client_config: Any = None
#
# The canonical cache slots live on :mod:`tools.web_tools` so tests that do
# ``tools.web_tools._firecrawl_client = None`` between cases see fresh
# state. The plugin reads/writes through that public module — see
# :func:`_get_firecrawl_client` below.
def _get_direct_firecrawl_config() -> Optional[tuple]:
@ -257,10 +259,15 @@ def _get_firecrawl_client() -> Any:
def _reset_client_for_tests() -> None:
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly."""
global _firecrawl_client, _firecrawl_client_config
_firecrawl_client = None
_firecrawl_client_config = None
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly.
Clears the canonical slots on :mod:`tools.web_tools` (where
:func:`_get_firecrawl_client` reads/writes them).
"""
import tools.web_tools as _wt
_wt._firecrawl_client = None
_wt._firecrawl_client_config = None
# ---------------------------------------------------------------------------