hermes-agent/plugins/web/exa/provider.py
kshitijk4poor 748f3e016b refactor(web): delete inline vendor helpers, re-export from plugins
Removes ~580 lines of dead code from tools/web_tools.py that were
superseded by the plugin migration but kept around in the cutover commit
to keep the diff focused. Replaces them with thin re-export shims so
existing tests and external callers that reach for the legacy
``tools.web_tools.<name>`` paths continue to work transparently.

Deleted from tools/web_tools.py
--------------------------------
- Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy,
  _FIRECRAWL_CLS_CACHE, the Firecrawl singleton)
- Firecrawl client section (_get_direct_firecrawl_config,
  _get_firecrawl_gateway_url, _is_tool_gateway_ready,
  _has_direct_firecrawl_config, _raise_web_backend_configuration_error,
  _firecrawl_backend_help_suffix, _get_firecrawl_client)
- Parallel client section (_get_parallel_client,
  _get_async_parallel_client, _parallel_client, _async_parallel_client)
- Tavily client section (_TAVILY_BASE_URL, _tavily_request,
  _normalize_tavily_search_results, _normalize_tavily_documents)
- Generic SDK normalizers (_to_plain_object, _normalize_result_list,
  _extract_web_search_results, _extract_scrape_payload)
- Exa client section (_get_exa_client, _exa_client, _exa_search,
  _exa_extract)
- Parallel helpers (_parallel_search, _parallel_extract)
- Duplicate inline check_firecrawl_api_key

Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines).

Re-exports added at top of tools/web_tools.py
---------------------------------------------
- From plugins.web.firecrawl.provider:
  Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls,
  _get_direct_firecrawl_config, _get_firecrawl_gateway_url,
  _is_tool_gateway_ready, _has_direct_firecrawl_config,
  _firecrawl_backend_help_suffix, _raise_web_backend_configuration_error,
  _get_firecrawl_client, _to_plain_object, _normalize_result_list,
  _extract_web_search_results, _extract_scrape_payload,
  check_firecrawl_api_key
- From plugins.web.tavily.provider:
  _tavily_request, _normalize_tavily_search_results,
  _normalize_tavily_documents
- From plugins.web.parallel.provider:
  _get_parallel_client, _get_async_parallel_client
- From plugins.web.exa.provider:
  _get_exa_client

Plus retained module-level imports for backward-compat with tests:
- httpx (tests patch tools.web_tools.httpx for tavily request mocking)
- build_vendor_gateway_url, _read_nous_access_token,
  resolve_managed_tool_gateway, managed_nous_tools_enabled,
  prefers_gateway (tests patch tools.web_tools.<name>)

Plugin indirection pattern (key technique)
------------------------------------------
For functions inside the firecrawl/parallel/exa plugins to honor
unit-test patches that target ``tools.web_tools.<name>``, the plugin
implementations now do ``import tools.web_tools as _wt`` at call time
and read helper names through that module (``_wt._read_nous_access_token``,
``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the
existing test patches transparently reach the plugin code without any
test changes.

The cached client globals (_firecrawl_client, _firecrawl_client_config,
_parallel_client, _async_parallel_client, _exa_client) also now live on
tools.web_tools so existing test setup_method handlers that reset
``tools.web_tools._<vendor>_client = None`` between cases keep working.
The plugins read/write the cache via getattr/setattr on the web_tools
module.

Verified
--------
- 173/173 targeted web tests pass:
  test_web_providers.py, test_web_providers_brave_free.py,
  test_web_providers_ddgs.py, test_web_providers_searxng.py,
  test_web_tools_config.py, test_web_tools_tavily.py,
  test_website_policy.py, test_config_null_guard.py
- Compile-clean (py_compile.compile passes)
- All inline implementations now exist in exactly one place
  (plugins.web.<vendor>.provider)

Follow-up clean-up
------------------
- Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows
  (next commit)
- Delete tools/web_providers/ directory entirely
- Add tests/plugins/web/ coverage
- Full tests/tools/ + tests/gateway/ regression sweep before promoting PR
2026-05-13 22:31:28 -07:00

211 lines
7 KiB
Python

"""Exa web search + content extraction — plugin form.
Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Uses the
official Exa SDK (``exa-py``) which is lazy-loaded via
:func:`tools.lazy_deps.ensure` so that cold-start CLI users don't pay the
SDK import cost when Exa isn't configured.
Config keys this provider responds to::
web:
search_backend: "exa" # explicit per-capability
extract_backend: "exa" # explicit per-capability
backend: "exa" # shared fallback for both
Env var::
EXA_API_KEY=... # https://exa.ai (paid tier; free trial available)
The previous in-tree implementation lived at
``tools.web_tools._exa_search`` / ``_exa_extract``; this file is the
canonical replacement. Behavior is bit-for-bit identical aside from the
ABC method-name change.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Dict, List
from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
# Module-level cache for the Exa client so we don't reconstruct it per
# call. Matches the legacy `_exa_client` pattern in tools/web_tools.py.
_exa_client: Any = None
def _get_exa_client() -> Any:
"""Lazy-import and cache an Exa SDK client.
Cache lives on :mod:`tools.web_tools` (as ``_exa_client``) so unit
tests that reset that name between cases keep working. Raises
``ValueError`` when ``EXA_API_KEY`` is unset.
"""
import tools.web_tools as _wt
cached = getattr(_wt, "_exa_client", None)
if cached is not None:
return cached
api_key = os.getenv("EXA_API_KEY")
if not api_key:
raise ValueError(
"EXA_API_KEY environment variable not set. "
"Get your API key at https://exa.ai"
)
try:
from tools.lazy_deps import ensure as _lazy_ensure
_lazy_ensure("search.exa", prompt=False)
except ImportError:
pass
except Exception as exc: # noqa: BLE001 — lazy_deps surfaces install hints
raise ImportError(str(exc))
from exa_py import Exa # noqa: WPS433 — deliberately lazy
client = Exa(api_key=api_key)
client.headers["x-exa-integration"] = "hermes-agent"
_wt._exa_client = client
return client
def _reset_client_for_tests() -> None:
"""Drop the cached Exa client so tests can re-instantiate cleanly."""
import tools.web_tools as _wt
_wt._exa_client = None
class ExaWebSearchProvider(WebSearchProvider):
"""Exa search + extract provider.
Both methods are sync — Exa's SDK is sync-only. The web_extract_tool
dispatcher wraps sync extracts via ``asyncio.to_thread`` when it
needs to keep the event loop responsive.
"""
@property
def name(self) -> str:
return "exa"
@property
def display_name(self) -> str:
return "Exa"
def is_available(self) -> bool:
"""Return True when ``EXA_API_KEY`` is set to a non-empty value."""
return bool(os.getenv("EXA_API_KEY", "").strip())
def supports_search(self) -> bool:
return True
def supports_extract(self) -> bool:
return True
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute an Exa search.
Returns ``{"success": True, "data": {"web": [{...}, ...]}}`` on
success, ``{"success": False, "error": str}`` on failure (incl.
missing API key and SDK install errors).
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return {"success": False, "error": "Interrupted"}
logger.info("Exa search: '%s' (limit=%d)", query, limit)
response = _get_exa_client().search(
query,
num_results=limit,
contents={"highlights": True},
)
web_results = []
for i, result in enumerate(response.results or []):
highlights = result.highlights or []
web_results.append(
{
"url": result.url or "",
"title": result.title or "",
"description": " ".join(highlights) if highlights else "",
"position": i + 1,
}
)
return {"success": True, "data": {"web": web_results}}
except ValueError as exc:
# Raised by _get_exa_client when EXA_API_KEY missing
return {"success": False, "error": str(exc)}
except ImportError as exc:
return {"success": False, "error": f"Exa SDK not installed: {exc}"}
except Exception as exc: # noqa: BLE001 — surface as failure
logger.warning("Exa search error: %s", exc)
return {"success": False, "error": f"Exa search failed: {exc}"}
def extract(self, urls: List[str], **kwargs: Any) -> List[Dict[str, Any]]:
"""Extract content from one or more URLs via Exa.
Returns a list of result dicts shaped for the legacy LLM
post-processing pipeline. On per-URL or whole-batch failure,
results carry an ``error`` field rather than raising.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return [
{"url": u, "error": "Interrupted", "title": ""} for u in urls
]
logger.info("Exa extract: %d URL(s)", len(urls))
response = _get_exa_client().get_contents(urls, text=True)
results: List[Dict[str, Any]] = []
for result in response.results or []:
content = result.text or ""
url = result.url or ""
title = result.title or ""
results.append(
{
"url": url,
"title": title,
"content": content,
"raw_content": content,
"metadata": {"sourceURL": url, "title": title},
}
)
return results
except ValueError as exc:
return [{"url": u, "title": "", "content": "", "error": str(exc)} for u in urls]
except ImportError as exc:
return [
{"url": u, "title": "", "content": "", "error": f"Exa SDK not installed: {exc}"}
for u in urls
]
except Exception as exc: # noqa: BLE001
logger.warning("Exa extract error: %s", exc)
return [
{"url": u, "title": "", "content": "", "error": f"Exa extract failed: {exc}"}
for u in urls
]
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Exa",
"badge": "paid",
"tag": "Semantic + neural web search with content extraction.",
"env_vars": [
{
"key": "EXA_API_KEY",
"prompt": "Exa API key",
"url": "https://exa.ai",
},
],
}