mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
refactor(web): per-capability backend selection for search/extract split
Introduce the foundation for independently selecting web search and extract backends — enabling future combinations like SearXNG for search + Firecrawl for extract. Architecture: - tools/web_providers/base.py: WebSearchProvider and WebExtractProvider ABCs with normalized result contracts (mirrors CloudBrowserProvider) - tools/web_tools.py: _get_search_backend() and _get_extract_backend() read per-capability config keys, fall through to shared web.backend - hermes_cli/config.py: web.search_backend and web.extract_backend in DEFAULT_CONFIG (empty = inherit from web.backend) Behavioral change: - web_search_tool() now dispatches via _get_search_backend() - web_extract_tool() now dispatches via _get_extract_backend() - When per-capability keys are empty (default), behavior is identical to before — _get_search_backend() falls through to _get_backend() This is purely structural — no new backends are added. SearXNG and other search-only/extract-only providers can now be added as simple drop-in modules in follow-up PRs. 12 new tests, 49 existing tests pass with zero regressions. Ref: #19198
This commit is contained in:
parent
6388aafbd6
commit
cd2cbc73b7
6 changed files with 411 additions and 5 deletions
73
tools/web_providers/ARCHITECTURE.md
Normal file
73
tools/web_providers/ARCHITECTURE.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
# Web Tools Provider Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
Web tools (`web_search`, `web_extract`) use a **per-capability backend selection** system that allows different providers for search and extract independently.
|
||||
|
||||
## Config Keys
|
||||
|
||||
```yaml
|
||||
web:
|
||||
backend: "firecrawl" # Shared fallback — applies to both if specific keys not set
|
||||
search_backend: "" # Per-capability override for web_search
|
||||
extract_backend: "" # Per-capability override for web_extract
|
||||
```
|
||||
|
||||
**Selection priority (per capability):**
|
||||
1. `web.search_backend` / `web.extract_backend` (explicit per-capability)
|
||||
2. `web.backend` (shared fallback)
|
||||
3. Auto-detect from environment variables
|
||||
|
||||
When per-capability keys are empty (default), behavior is identical to the legacy single-backend selection.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
web_search_tool()
|
||||
└─ _get_search_backend()
|
||||
├─ web.search_backend (if set + available)
|
||||
└─ _get_backend() fallback
|
||||
|
||||
web_extract_tool()
|
||||
└─ _get_extract_backend()
|
||||
├─ web.extract_backend (if set + available)
|
||||
└─ _get_backend() fallback
|
||||
```
|
||||
|
||||
## Provider ABCs
|
||||
|
||||
New providers implement these interfaces in `tools/web_providers/`:
|
||||
|
||||
```python
|
||||
from tools.web_providers.base import WebSearchProvider, WebExtractProvider
|
||||
|
||||
class MySearchProvider(WebSearchProvider):
|
||||
def provider_name(self) -> str: ...
|
||||
def is_configured(self) -> bool: ...
|
||||
def search(self, query: str, limit: int = 5) -> Dict[str, Any]: ...
|
||||
|
||||
class MyExtractProvider(WebExtractProvider):
|
||||
def provider_name(self) -> str: ...
|
||||
def is_configured(self) -> bool: ...
|
||||
def extract(self, urls: List[str], **kwargs) -> Dict[str, Any]: ...
|
||||
```
|
||||
|
||||
## Adding a New Search Provider
|
||||
|
||||
1. Create `tools/web_providers/your_provider.py` implementing `WebSearchProvider`
|
||||
2. Add availability check to `_is_backend_available()` in `web_tools.py`
|
||||
3. Add dispatch branch in `web_search_tool()`
|
||||
4. Add provider to `hermes tools` picker in `tools_config.py`
|
||||
5. Add env var to `OPTIONAL_ENV_VARS` in `config.py` (if needed)
|
||||
6. Write tests in `tests/tools/`
|
||||
|
||||
Search-only providers (like SearXNG) don't need to implement `WebExtractProvider`.
|
||||
Extract-only providers don't need to implement `WebSearchProvider`.
|
||||
|
||||
## hermes tools UX
|
||||
|
||||
The provider picker uses **progressive disclosure**:
|
||||
- **Default path** (90% of users): Pick one provider → sets `web.backend` for both. One selection, done.
|
||||
- **Advanced path**: "Configure separately" option at bottom → two-step sub-picker for search + extract independently.
|
||||
|
||||
See `.hermes/plans/2026-05-03-web-tools-provider-architecture.md` for the full UX flow diagram.
|
||||
6
tools/web_providers/__init__.py
Normal file
6
tools/web_providers/__init__.py
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
"""Web capability providers — search, extract, crawl.
|
||||
|
||||
Each capability has an ABC in ``base.py`` and vendor implementations in
|
||||
sibling modules. Provider registries in ``web_tools.py`` map config names
|
||||
to provider classes.
|
||||
"""
|
||||
89
tools/web_providers/base.py
Normal file
89
tools/web_providers/base.py
Normal file
|
|
@ -0,0 +1,89 @@
|
|||
"""Abstract base classes for web capability providers."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Dict, List
|
||||
|
||||
|
||||
class WebSearchProvider(ABC):
|
||||
"""Interface for web search backends (Firecrawl, Tavily, Exa, etc.).
|
||||
|
||||
Implementations live in sibling modules. The user selects a provider
|
||||
via ``hermes tools``; the choice is persisted as
|
||||
``config["web"]["search_backend"]`` (falling back to
|
||||
``config["web"]["backend"]``).
|
||||
|
||||
Search providers return results in a normalized format::
|
||||
|
||||
{
|
||||
"success": True,
|
||||
"data": {
|
||||
"web": [
|
||||
{"title": str, "url": str, "description": str, "position": int},
|
||||
...
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
On failure::
|
||||
|
||||
{"success": False, "error": str}
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def provider_name(self) -> str:
|
||||
"""Short, human-readable name shown in logs and diagnostics."""
|
||||
|
||||
@abstractmethod
|
||||
def is_configured(self) -> bool:
|
||||
"""Return True when all required env vars / credentials are present.
|
||||
|
||||
Called at tool-registration time to gate availability.
|
||||
Must be cheap — no network calls.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
|
||||
"""Execute a web search and return normalized results."""
|
||||
|
||||
|
||||
class WebExtractProvider(ABC):
|
||||
"""Interface for web content extraction backends.
|
||||
|
||||
Implementations live in sibling modules. The user selects a provider
|
||||
via ``hermes tools``; the choice is persisted as
|
||||
``config["web"]["extract_backend"]`` (falling back to
|
||||
``config["web"]["backend"]``).
|
||||
|
||||
Extract providers return results in a normalized format::
|
||||
|
||||
{
|
||||
"success": True,
|
||||
"data": [
|
||||
{"url": str, "title": str, "content": str,
|
||||
"raw_content": str, "metadata": dict},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
On failure::
|
||||
|
||||
{"success": False, "error": str}
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def provider_name(self) -> str:
|
||||
"""Short, human-readable name shown in logs and diagnostics."""
|
||||
|
||||
@abstractmethod
|
||||
def is_configured(self) -> bool:
|
||||
"""Return True when all required env vars / credentials are present.
|
||||
|
||||
Called at tool-registration time to gate availability.
|
||||
Must be cheap — no network calls.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def extract(self, urls: List[str], **kwargs) -> Dict[str, Any]:
|
||||
"""Extract content from the given URLs and return normalized results."""
|
||||
Loading…
Add table
Add a link
Reference in a new issue