fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup

Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.

Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
   `("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
   description and the legacy `_get_backend()` candidate order both
   call for the 7-tuple
   `(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
   Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
   and no config, `get_active_search_provider()` now returns tavily
   (was None); with EXA+PARALLEL it returns parallel (was None); with
   BRAVE+FIRECRAWL it returns firecrawl (was brave-free).

2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
   docstring, and inline comment all listed the old 4-tuple and claimed
   "brave-free first because it was the shipped default". The legacy
   default is `"firecrawl"`. Rewritten to match the new ordering and
   reference `tools.web_tools._get_backend()` as the source of truth.

3. `agent/web_search_registry.py` — `get_active_crawl_provider`
   docstring said "only Tavily implements it among built-in providers".
   Firecrawl also advertises `supports_crawl=True` after the previous
   commit. Updated to "Tavily and Firecrawl".

4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
   the only built-in backend that natively crawls". Updated.

5. `agent/web_search_provider.py` — ABC docstring mentioned only
   `search` / `extract` capabilities. Added `crawl` for accuracy.

6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
   cache globals (`_firecrawl_client`, `_parallel_client`,
   `_async_parallel_client`, `_exa_client`) were declared but never read
   (all reads/writes go through `_wt.*` per the `extracting-inline-
   helpers-to-plugins` recipe). Removed the dead declarations; the
   reset-for-tests helpers in firecrawl + parallel now clear the
   canonical `_wt._<name>` slots, matching the pattern exa already used.

Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
This commit is contained in:
kshitijk4poor 2026-05-14 02:02:01 +05:30 committed by Teknium
parent 21e3a863bb
commit 657e6d87cc
6 changed files with 82 additions and 48 deletions

View file

@ -32,9 +32,10 @@ from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
# Module-level cache for the Exa client so we don't reconstruct it per
# call. Matches the legacy `_exa_client` pattern in tools/web_tools.py.
_exa_client: Any = None
# Module-level note: the canonical ``_exa_client`` cache slot lives on
# :mod:`tools.web_tools` so tests that do ``tools.web_tools._exa_client =
# None`` between cases see fresh state. The plugin reads/writes through
# that public module (see :func:`_get_exa_client`).
def _get_exa_client() -> Any:

View file

@ -112,9 +112,11 @@ Firecrawl = _FirecrawlProxy()
# ---------------------------------------------------------------------------
# Client construction (direct vs managed-gateway)
# ---------------------------------------------------------------------------
_firecrawl_client: Any = None
_firecrawl_client_config: Any = None
#
# The canonical cache slots live on :mod:`tools.web_tools` so tests that do
# ``tools.web_tools._firecrawl_client = None`` between cases see fresh
# state. The plugin reads/writes through that public module — see
# :func:`_get_firecrawl_client` below.
def _get_direct_firecrawl_config() -> Optional[tuple]:
@ -257,10 +259,15 @@ def _get_firecrawl_client() -> Any:
def _reset_client_for_tests() -> None:
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly."""
global _firecrawl_client, _firecrawl_client_config
_firecrawl_client = None
_firecrawl_client_config = None
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly.
Clears the canonical slots on :mod:`tools.web_tools` (where
:func:`_get_firecrawl_client` reads/writes them).
"""
import tools.web_tools as _wt
_wt._firecrawl_client = None
_wt._firecrawl_client_config = None
# ---------------------------------------------------------------------------

View file

@ -36,13 +36,11 @@ from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
# Module-level client caches mirroring the legacy `tools.web_tools._parallel_client`
# / `_async_parallel_client` pattern. For tests, the canonical cache lives on
# tools.web_tools so existing setup_method() handlers that reset
# ``tools.web_tools._parallel_client = None`` keep working — we read/write
# the cache via that module rather than these module-level globals.
_parallel_client: Any = None
_async_parallel_client: Any = None
# Module-level note: the canonical cache slots ``_parallel_client`` and
# ``_async_parallel_client`` live on :mod:`tools.web_tools` so tests that do
# ``tools.web_tools._parallel_client = None`` between cases see fresh state.
# The plugin reads/writes through that public module (see
# :func:`_get_sync_client` / :func:`_get_async_client`).
def _ensure_parallel_sdk_installed() -> None:
@ -117,10 +115,15 @@ def _get_async_client() -> Any:
def _reset_clients_for_tests() -> None:
"""Drop both cached clients so tests can re-instantiate cleanly."""
global _parallel_client, _async_parallel_client
_parallel_client = None
_async_parallel_client = None
"""Drop both cached clients so tests can re-instantiate cleanly.
Clears the canonical slots on :mod:`tools.web_tools` (where
:func:`_get_sync_client` / :func:`_get_async_client` read/write them).
"""
import tools.web_tools as _wt
_wt._parallel_client = None
_wt._async_parallel_client = None
# Backward-compatible aliases for the names that lived in tools.web_tools

View file

@ -5,8 +5,8 @@ capabilities advertised:
- ``supports_search()`` -> True (Tavily ``/search``)
- ``supports_extract()`` -> True (Tavily ``/extract``)
- ``supports_crawl()`` -> True (Tavily ``/crawl``) Tavily is the only
built-in backend that natively crawls
- ``supports_crawl()`` -> True (Tavily ``/crawl``) sync HTTP crawl;
Firecrawl also advertises ``supports_crawl=True`` (async)
All three are sync the underlying call is ``httpx.post(...)``. The
dispatcher in :func:`tools.web_tools.web_crawl_tool` (which is itself