mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup
Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.
Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
`("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
description and the legacy `_get_backend()` candidate order both
call for the 7-tuple
`(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
and no config, `get_active_search_provider()` now returns tavily
(was None); with EXA+PARALLEL it returns parallel (was None); with
BRAVE+FIRECRAWL it returns firecrawl (was brave-free).
2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
docstring, and inline comment all listed the old 4-tuple and claimed
"brave-free first because it was the shipped default". The legacy
default is `"firecrawl"`. Rewritten to match the new ordering and
reference `tools.web_tools._get_backend()` as the source of truth.
3. `agent/web_search_registry.py` — `get_active_crawl_provider`
docstring said "only Tavily implements it among built-in providers".
Firecrawl also advertises `supports_crawl=True` after the previous
commit. Updated to "Tavily and Firecrawl".
4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
the only built-in backend that natively crawls". Updated.
5. `agent/web_search_provider.py` — ABC docstring mentioned only
`search` / `extract` capabilities. Added `crawl` for accuracy.
6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
cache globals (`_firecrawl_client`, `_parallel_client`,
`_async_parallel_client`, `_exa_client`) were declared but never read
(all reads/writes go through `_wt.*` per the `extracting-inline-
helpers-to-plugins` recipe). Removed the dead declarations; the
reset-for-tests helpers in firecrawl + parallel now clear the
canonical `_wt._<name>` slots, matching the pattern exa already used.
Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
This commit is contained in:
parent
21e3a863bb
commit
657e6d87cc
6 changed files with 82 additions and 48 deletions
|
|
@ -61,13 +61,14 @@ from typing import Any, Dict, List
|
|||
|
||||
|
||||
class WebSearchProvider(abc.ABC):
|
||||
"""Abstract base class for a web search/extract backend.
|
||||
"""Abstract base class for a web search/extract/crawl backend.
|
||||
|
||||
Subclasses must implement :meth:`is_available` and at least one of
|
||||
:meth:`search` / :meth:`extract`. The :meth:`supports_search` and
|
||||
:meth:`supports_extract` capability flags let the registry route each
|
||||
tool call to the right provider, and let multi-capability providers
|
||||
(SearXNG, Firecrawl, Tavily, …) advertise both.
|
||||
:meth:`search` / :meth:`extract` / :meth:`crawl`. The
|
||||
:meth:`supports_search` / :meth:`supports_extract` / :meth:`supports_crawl`
|
||||
capability flags let the registry route each tool call to the right
|
||||
provider, and let multi-capability providers (Firecrawl, Tavily, Exa,
|
||||
…) advertise multiple capabilities from a single class.
|
||||
"""
|
||||
|
||||
@property
|
||||
|
|
|
|||
|
|
@ -11,17 +11,23 @@ Active selection
|
|||
----------------
|
||||
The active provider is chosen by configuration with this precedence:
|
||||
|
||||
1. ``web.search_backend`` (for search) or ``web.extract_backend`` (for extract)
|
||||
2. ``web.backend`` (shared fallback)
|
||||
3. If exactly one capability-eligible provider is registered, use it.
|
||||
4. Legacy preference order (``brave-free`` → ``firecrawl`` → ``searxng`` → ``ddgs``)
|
||||
so installs that omitted the config key keep working.
|
||||
1. ``web.search_backend`` / ``web.extract_backend`` / ``web.crawl_backend``
|
||||
(per-capability override).
|
||||
2. ``web.backend`` (shared fallback).
|
||||
3. If exactly one capability-eligible provider is registered AND available,
|
||||
use it.
|
||||
4. Legacy preference order — ``firecrawl`` → ``parallel`` → ``tavily`` →
|
||||
``exa`` → ``searxng`` → ``brave-free`` → ``ddgs`` — filtered by
|
||||
availability. Matches the historic ``tools.web_tools._get_backend()``
|
||||
candidate order so installs that never set a config key keep landing
|
||||
on the same provider they did before the plugin migration.
|
||||
5. Otherwise ``None`` — the tool surfaces a helpful error pointing at
|
||||
``hermes tools``.
|
||||
|
||||
The capability filter (``supports_search`` vs ``supports_extract``) is applied
|
||||
at every step so a search-only provider (``brave-free``) configured as
|
||||
``web.extract_backend`` correctly falls through.
|
||||
The capability filter (``supports_search`` / ``supports_extract`` /
|
||||
``supports_crawl``) is applied at every step so a search-only provider
|
||||
(``brave-free``) configured as ``web.extract_backend`` correctly falls
|
||||
through to an extract-capable backend.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
|
@ -107,10 +113,21 @@ def _read_config_key(*path: str) -> Optional[str]:
|
|||
return None
|
||||
|
||||
|
||||
# Legacy preference order — preserves behaviour for users who set no config
|
||||
# at all. brave-free first because it was the shipped default after the
|
||||
# Brave migration; firecrawl second for back-compat with older configs.
|
||||
_LEGACY_PREFERENCE = ("brave-free", "firecrawl", "searxng", "ddgs")
|
||||
# Legacy preference order — preserves behaviour for users who set no
|
||||
# ``web.backend`` / ``web.<capability>_backend`` config key at all. Matches
|
||||
# the historic candidate order in :func:`tools.web_tools._get_backend`
|
||||
# (paid providers first so existing paid setups don't get downgraded to
|
||||
# a free tier on upgrade). Filtered by ``is_available()`` at walk time so
|
||||
# we don't surface a provider the user has no credentials for.
|
||||
_LEGACY_PREFERENCE = (
|
||||
"firecrawl",
|
||||
"parallel",
|
||||
"tavily",
|
||||
"exa",
|
||||
"searxng",
|
||||
"brave-free",
|
||||
"ddgs",
|
||||
)
|
||||
|
||||
|
||||
def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearchProvider]:
|
||||
|
|
@ -130,10 +147,14 @@ def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearc
|
|||
supports *capability* AND ``is_available()`` reports True, return it.
|
||||
|
||||
3. **Legacy preference walk, filtered by availability.** Walk the
|
||||
:data:`_LEGACY_PREFERENCE` order looking for a provider whose
|
||||
:data:`_LEGACY_PREFERENCE` order (firecrawl → parallel → tavily →
|
||||
exa → searxng → brave-free → ddgs) looking for a provider whose
|
||||
``supports_<capability>()`` is True AND whose ``is_available()`` is
|
||||
True. This is the path that fires when no config key is set — pick
|
||||
the highest-priority backend the user actually has credentials for.
|
||||
True. Matches the historic ``tools.web_tools._get_backend()``
|
||||
candidate order so users with credentials but no explicit config
|
||||
key keep landing on the same provider as pre-migration. This is
|
||||
the path that fires when no config key is set — pick the
|
||||
highest-priority backend the user actually has credentials for.
|
||||
|
||||
Returns None when no provider is configured AND no available provider
|
||||
matches the legacy preference; the dispatcher then returns a "set up a
|
||||
|
|
@ -179,8 +200,8 @@ def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearc
|
|||
|
||||
# 2. + 3. Fallback path — filter by availability so we don't surface
|
||||
# a provider the user has no credentials for. Without this filter,
|
||||
# brave-free's slot in the legacy preference order would make it
|
||||
# the "active" provider on a fresh install with no API keys at all.
|
||||
# a registered-but-unconfigured provider could end up "active" on
|
||||
# a fresh install with no API keys at all.
|
||||
eligible = [
|
||||
p for p in snapshot.values()
|
||||
if _capable(p) and _is_available_safe(p)
|
||||
|
|
@ -226,9 +247,10 @@ def get_active_crawl_provider() -> Optional[WebSearchProvider]:
|
|||
Reads ``web.crawl_backend`` (preferred) or ``web.backend`` (shared
|
||||
fallback) from config.yaml; falls back per the module docstring.
|
||||
|
||||
Crawl is a niche capability — only Tavily implements it among built-in
|
||||
providers. Most callers should expect ``None`` and fall back to a
|
||||
different strategy (e.g. summarize-via-LLM).
|
||||
Crawl is a niche capability — among built-in providers only Tavily and
|
||||
Firecrawl implement it. Callers should expect ``None`` and fall back to
|
||||
a different strategy (e.g. summarize-via-LLM) when neither is
|
||||
configured.
|
||||
"""
|
||||
explicit = _read_config_key("web", "crawl_backend") or _read_config_key("web", "backend")
|
||||
return _resolve(explicit, capability="crawl")
|
||||
|
|
|
|||
|
|
@ -32,9 +32,10 @@ from agent.web_search_provider import WebSearchProvider
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Module-level cache for the Exa client so we don't reconstruct it per
|
||||
# call. Matches the legacy `_exa_client` pattern in tools/web_tools.py.
|
||||
_exa_client: Any = None
|
||||
# Module-level note: the canonical ``_exa_client`` cache slot lives on
|
||||
# :mod:`tools.web_tools` so tests that do ``tools.web_tools._exa_client =
|
||||
# None`` between cases see fresh state. The plugin reads/writes through
|
||||
# that public module (see :func:`_get_exa_client`).
|
||||
|
||||
|
||||
def _get_exa_client() -> Any:
|
||||
|
|
|
|||
|
|
@ -112,9 +112,11 @@ Firecrawl = _FirecrawlProxy()
|
|||
# ---------------------------------------------------------------------------
|
||||
# Client construction (direct vs managed-gateway)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_firecrawl_client: Any = None
|
||||
_firecrawl_client_config: Any = None
|
||||
#
|
||||
# The canonical cache slots live on :mod:`tools.web_tools` so tests that do
|
||||
# ``tools.web_tools._firecrawl_client = None`` between cases see fresh
|
||||
# state. The plugin reads/writes through that public module — see
|
||||
# :func:`_get_firecrawl_client` below.
|
||||
|
||||
|
||||
def _get_direct_firecrawl_config() -> Optional[tuple]:
|
||||
|
|
@ -257,10 +259,15 @@ def _get_firecrawl_client() -> Any:
|
|||
|
||||
|
||||
def _reset_client_for_tests() -> None:
|
||||
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly."""
|
||||
global _firecrawl_client, _firecrawl_client_config
|
||||
_firecrawl_client = None
|
||||
_firecrawl_client_config = None
|
||||
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly.
|
||||
|
||||
Clears the canonical slots on :mod:`tools.web_tools` (where
|
||||
:func:`_get_firecrawl_client` reads/writes them).
|
||||
"""
|
||||
import tools.web_tools as _wt
|
||||
|
||||
_wt._firecrawl_client = None
|
||||
_wt._firecrawl_client_config = None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -36,13 +36,11 @@ from agent.web_search_provider import WebSearchProvider
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Module-level client caches mirroring the legacy `tools.web_tools._parallel_client`
|
||||
# / `_async_parallel_client` pattern. For tests, the canonical cache lives on
|
||||
# tools.web_tools so existing setup_method() handlers that reset
|
||||
# ``tools.web_tools._parallel_client = None`` keep working — we read/write
|
||||
# the cache via that module rather than these module-level globals.
|
||||
_parallel_client: Any = None
|
||||
_async_parallel_client: Any = None
|
||||
# Module-level note: the canonical cache slots ``_parallel_client`` and
|
||||
# ``_async_parallel_client`` live on :mod:`tools.web_tools` so tests that do
|
||||
# ``tools.web_tools._parallel_client = None`` between cases see fresh state.
|
||||
# The plugin reads/writes through that public module (see
|
||||
# :func:`_get_sync_client` / :func:`_get_async_client`).
|
||||
|
||||
|
||||
def _ensure_parallel_sdk_installed() -> None:
|
||||
|
|
@ -117,10 +115,15 @@ def _get_async_client() -> Any:
|
|||
|
||||
|
||||
def _reset_clients_for_tests() -> None:
|
||||
"""Drop both cached clients so tests can re-instantiate cleanly."""
|
||||
global _parallel_client, _async_parallel_client
|
||||
_parallel_client = None
|
||||
_async_parallel_client = None
|
||||
"""Drop both cached clients so tests can re-instantiate cleanly.
|
||||
|
||||
Clears the canonical slots on :mod:`tools.web_tools` (where
|
||||
:func:`_get_sync_client` / :func:`_get_async_client` read/write them).
|
||||
"""
|
||||
import tools.web_tools as _wt
|
||||
|
||||
_wt._parallel_client = None
|
||||
_wt._async_parallel_client = None
|
||||
|
||||
|
||||
# Backward-compatible aliases for the names that lived in tools.web_tools
|
||||
|
|
|
|||
|
|
@ -5,8 +5,8 @@ capabilities advertised:
|
|||
|
||||
- ``supports_search()`` -> True (Tavily ``/search``)
|
||||
- ``supports_extract()`` -> True (Tavily ``/extract``)
|
||||
- ``supports_crawl()`` -> True (Tavily ``/crawl``) — Tavily is the only
|
||||
built-in backend that natively crawls
|
||||
- ``supports_crawl()`` -> True (Tavily ``/crawl``) — sync HTTP crawl;
|
||||
Firecrawl also advertises ``supports_crawl=True`` (async)
|
||||
|
||||
All three are sync — the underlying call is ``httpx.post(...)``. The
|
||||
dispatcher in :func:`tools.web_tools.web_crawl_tool` (which is itself
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue