hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-19 04:52:06 +00:00

Author	SHA1	Message	Date
kshitijk4poor	5e54330e27	fix(web): preserve firecrawl crawl + website-policy gate after migration Two regressions discovered by running the full tests/tools/ suite after the dispatcher cutover, both fixed in this commit: 1. web_crawl_tool incorrectly errored "search-only" for firecrawl --------------------------------------------------------------------- The cutover treated any provider with supports_crawl()==False as a search-only backend and returned the typed search-only error. But firecrawl can crawl via the legacy multi-page-extract path inside web_crawl_tool — it just doesn't expose supports_crawl on the plugin (adding native firecrawl crawl is a clean follow-up). Fix: only emit the search-only error when the provider supports NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the provider supports extract but not crawl (firecrawl), fall through to the legacy firecrawl-via-extract path below. 2. firecrawl plugin's check_website_access wasn't patchable --------------------------------------------------------------------- The plugin imported `from tools.website_policy import check_website_access` INSIDE the extract() function body, so monkeypatching the name on plugins.web.firecrawl.provider had no effect — the inner import re-bound the name on every call. Fix: hoist the import to module level. Cheap (website_policy itself has no heavy deps) and makes the standard monkeypatch.setattr(firecrawl_provider, "check_website_access", ...) pattern work. Test updates (tests/tools/test_website_policy.py — 4 tests): - test_web_extract_short_circuits_blocked_url - test_web_extract_blocks_redirected_final_url Both: patch the gate at plugins.web.firecrawl.provider (where it runs after migration) and force the firecrawl plugin to be the active extract provider via FIRECRAWL_API_KEY. - test_web_crawl_short_circuits_blocked_url - test_web_crawl_blocks_redirected_final_url Both: unchanged — the dispatcher-level gate at tools.web_tools.py line 1651 still uses the imported `check_website_access` name and the firecrawl-fallthrough path is exercised as before. Verified: 22/22 tests/tools/test_website_policy.py pass.	2026-05-13 22:31:28 -07:00
kshitijk4poor	143184e943	feat(web): firecrawl plugin — largest migration (search + async extract + dual auth) Migrates Firecrawl from inline code in tools/web_tools.py to a bundled plugin at plugins/web/firecrawl/. By line count this is the largest of the seven provider migrations: the firecrawl path captured most of the file's vendor-specific complexity. What moved into the plugin (all previously in tools/web_tools.py): Lazy Firecrawl SDK proxy - _load_firecrawl_cls() — caches the imported SDK class - _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK imports until first construction or isinstance check. Client construction (dual auth) - _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path - _get_firecrawl_gateway_url() — managed Nous tool-gateway URL - _is_tool_gateway_ready() — gateway URL + Nous token check - _has_direct_firecrawl_config() — direct config present? - _get_firecrawl_client() — combined client construction honoring web.use_gateway - check_firecrawl_api_key() — top-level "is firecrawl usable" - _firecrawl_backend_help_suffix() — managed-gateway help string - _raise_web_backend_configuration_error() — typed misconfig error Response shape normalization (vendor-specific) - _to_plain_object(), _normalize_result_list() — SDK→dict helpers - _extract_web_search_results() — handles SDK/direct/gateway shapes - _extract_scrape_payload() — nested-data unwrap for scrape Per-URL extract loop - 60s asyncio.wait_for timeout per URL - Pre-scrape website-policy gate - Post-scrape redirect-aware SSRF re-check - Format-aware content selection (markdown / html / auto) - Per-URL errors returned as {"error": str} entries, no raises Extract is declared `async def` — each URL is scraped in asyncio.to_thread(...). This is the second async-extract plugin after parallel. The plugin re-exports `Firecrawl` (the lazy proxy) and `check_firecrawl_api_key()` so existing tests doing `patch("tools.web_tools.Firecrawl")` or `monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep working — tools/web_tools.py re-exports both names in the next dispatcher-cutover commit. Note: web_crawl_tool still has its own Firecrawl crawl path inline (separate from extract); the Firecrawl SDK supports /crawl but we don't expose supports_crawl=True on this plugin yet. Tavily handles crawl today. Adding Firecrawl crawl is a clean follow-up. Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST. E2E verified: - All 7 providers register: brave-free, ddgs, exa, firecrawl, parallel, searxng, tavily - inspect.iscoroutinefunction(firecrawl.extract) -> True - Firecrawl proxy is a callable lazy proxy at module level - check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence	2026-05-13 22:31:28 -07:00
kshitijk4poor	31fcde876c	feat(web): tavily plugin — first three-capability plugin (search + extract + crawl) Migrates Tavily from inline _tavily_request() / _normalize_tavily_* helpers in tools/web_tools.py to a bundled plugin at plugins/web/tavily/. First plugin in the codebase to advertise supports_crawl=True. Tavily is unique among built-in backends in offering a native /crawl endpoint that walks linked pages from a seed URL with optional natural-language instructions and depth ("basic" or "advanced"). Capabilities: - supports_search() -> True (Tavily /search) - supports_extract() -> True (Tavily /extract) - supports_crawl() -> True (Tavily /crawl) All sync (httpx.post under the hood). The crawl method accepts forward-compat kwargs (instructions, depth, limit) and is gated against unsafe URLs/policy by the dispatcher in web_crawl_tool — exactly as before. Behavior preserved: - TAVILY_API_KEY required (ValueError → typed error response) - TAVILY_BASE_URL env override honored - /crawl requires both body auth AND Bearer header — preserved - failed_results[] and failed_urls[] response keys mapped to per-URL items with error fields rather than raising - max_results capped at 20 server-side Adds "tavily" to _WEB_PLUGIN_SKIPLIST. The legacy inline _tavily_request / _normalize_tavily_search_results / _normalize_tavily_documents / _TAVILY_BASE_URL in tools/web_tools.py are NOT deleted yet — search/extract dispatch and the entire web_crawl_tool function still reference them. They go away when those dispatchers are cut over to the registry. E2E verified: - Tavily registers with all 3 capabilities - Provider list now: brave-free, ddgs, exa, parallel, searxng, tavily	2026-05-13 22:31:28 -07:00
kshitijk4poor	4816646109	feat(web): parallel plugin — first async-extract plugin Migrates Parallel.ai from inline `_parallel_search()` / `_parallel_extract()` in tools/web_tools.py to a bundled plugin at plugins/web/parallel/. First plugin in the codebase to expose an async :meth:`extract`: - search() is sync — Parallel.beta.search - extract() is async def — AsyncParallel.beta.extract The ABC's docstring on supports_extract() already permits sync-or-async; this commit is the first to exercise the async path. The web_extract_tool dispatcher (next commit) detects coroutines via inspect.iscoroutinefunction and awaits accordingly. Behavior preserved: - PARALLEL_API_KEY required (raises ValueError if missing → surfaced as {"success": False, "error": "..."} instead) - PARALLEL_SEARCH_MODE env var honored (agentic\|fast\|one-shot, default agentic), validated via _resolve_search_mode() - Limit capped at 20 server-side via min(limit, 20) - Per-URL failure mode preserved: response.errors[] each become a result dict with an "error" field rather than raising - Module-level _parallel_client / _async_parallel_client caches kept (mirrors legacy singleton pattern) Adds "parallel" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the picker doesn't double-list. The legacy inline _parallel_search, _parallel_extract, _get_parallel_client, _get_async_parallel_client in tools/web_tools.py are NOT deleted yet — the dispatcher still calls them. They go away when the dispatcher cuts over. E2E verified: - inspect.iscoroutinefunction(p.search) -> False - inspect.iscoroutinefunction(p.extract) -> True - extract() returns a coroutine (not a list) - 5 providers register correctly (brave-free, ddgs, exa, parallel, searxng)	2026-05-13 22:31:28 -07:00
kshitijk4poor	ec8449e9c6	feat(web): exa plugin — first multi-capability migration (search + extract) Migrates Exa from the inline `_exa_search()` / `_exa_extract()` helpers in tools/web_tools.py to a bundled plugin at plugins/web/exa/. This is the first plugin in this PR to advertise supports_extract=True, exercising the multi-capability ABC path that the initial three migrations (brave_free, ddgs, searxng — all search-only) did not cover. Both Exa methods are sync — the SDK is sync-only. The web_extract_tool dispatcher in tools/web_tools.py will continue to call them inline until Task "dispatch-extract-all" cuts it over to the registry. Behaviour preserved bit-for-bit aside from the ABC method-name change: - is_configured() -> is_available() - provider_name() -> name (property) - "exa" stays as the registered name - Module-level `_exa_client` cache + lazy `from exa_py import Exa` preserved at the new location. - Errors (ValueError for missing API key, ImportError for missing SDK, generic Exception) caught and surfaced as {"success": False, "error": ...} instead of raising. Adds "exa" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the hardcoded TOOL_CATEGORIES["web"] row and the plugin-injected row don't duplicate during the spike. The skip-list goes away in the cleanup phase along with the hardcoded row. The legacy inline `_exa_search` / `_exa_extract` / `_get_exa_client` / `_exa_client` in tools/web_tools.py are NOT deleted yet — the dispatcher still references them. They go away in the next dispatcher-cutover commit. E2E verified: - Plugin discovers + registers - .supports_search/.supports_extract/.supports_crawl = (True, True, False) - .get_setup_schema() returns the picker row shape - resolve(): explicit exa + EXA_API_KEY -> exa; without key -> exa (registered but unavailable, dispatcher surfaces "EXA_API_KEY not set" error)	2026-05-13 22:31:28 -07:00
kshitijk4poor	6b219f5af6	refactor(web): remove legacy in-tree provider modules Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three providers that moved to plugins/web/ in prior commits. tools/web_tools.py no longer imports them (registry dispatch as of `d8735963f`), so removing them is purely a cleanup pass. Also migrates the existing tests to the new import paths: tests/tools/test_web_providers_brave_free.py tests/tools/test_web_providers_ddgs.py tests/tools/test_web_providers_searxng.py Mechanical rewrites: - `from tools.web_providers.X import YSearchProvider` -> `from plugins.web.X.provider import YWebSearchProvider` - `.is_configured()` -> `.is_available()` (legacy method -> new method) - `.provider_name()` -> `.name` (legacy method -> new property) - `from tools.web_providers.base import WebSearchProvider` -> `from agent.web_search_provider import WebSearchProvider` (the subclass-check asserts membership in the new plugin-facing ABC) - `sys.modules.delitem("tools.web_providers.ddgs")` updated to point at `plugins.web.ddgs.provider` (cache-busting for lazy ddgs imports) The TestXBackendWiring / TestXSearchOnlyErrors classes (covering _is_backend_available, _get_backend, check_web_api_key, and the "search-only" error paths in web_extract/web_crawl) are untouched — those still test web_tools.py's backend-selection logic, which continues to recognize the names "brave-free" / "ddgs" / "searxng" even after the modules behind them moved to plugins. tools/web_providers/base.py is intentionally NOT deleted by this commit — it's the parent ABC of the legacy modules and shares its name with agent/web_search_provider.py::WebSearchProvider. Removing it surfaces the naming collision (see PR description Finding 0); the real migration PR deletes it in the same commit that drops the _WEB_PLUGIN_SKIPLIST guards in hermes_cli/tools_config.py. Test results: bash scripts/run_tests.sh tests/tools/test_web_providers_.py -> 65 passed in 3.41s (all rewritten unit tests + unchanged integration tests) bash scripts/run_tests.sh tests/tools/test_web_.py -> 141 passed in 4.70s (full web test set, post-deletion)	2026-05-13 22:31:28 -07:00
kshitijk4poor	0d085d9454	feat(web): searxng plugin (search-only, third migration) Adds plugins/web/searxng/. SearXNG aggregates results from upstream engines via its JSON API (/search?format=json) — search-only, no extract capability (supports_extract() returns False). E2E verified — registry now has ['brave-free', 'ddgs', 'searxng'].	2026-05-13 22:31:28 -07:00
kshitijk4poor	5c7d098bee	feat(web): ddgs plugin (second migration) Adds plugins/web/ddgs/ following the same plugins/image_gen/ pattern as brave_free. DuckDuckGo search via the community ddgs package; no API key, package is an optional dep gated by is_available(). E2E verified — registry now has ['brave-free', 'ddgs'].	2026-05-13 22:31:28 -07:00
kshitijk4poor	d403cf018c	feat(web): brave_free plugin (first migration from tools/web_providers/) Adds plugins/web/brave_free/ as the first plugin built against the new WebSearchProvider ABC. Mirrors the plugins/image_gen/openai/ layout exactly: plugins/web/brave_free/ plugin.yaml kind: backend, provides_web_providers: [brave-free] __init__.py register(ctx) -> ctx.register_web_search_provider(...) provider.py BraveFreeWebSearchProvider(WebSearchProvider) Behavior preserved: same name ("brave-free" with hyphen), same env var (BRAVE_SEARCH_API_KEY), same HTTP request shape, same response normalization. The legacy tools/web_providers/brave_free.py is left in place — the dispatcher in tools/web_tools.py still references it. Task 7 cuts over the dispatcher to the new registry; Task 10 deletes the legacy file. E2E verified: HERMES_PLUGINS_DEBUG=1 python -c " from hermes_cli.plugins import _ensure_plugins_discovered _ensure_plugins_discovered() from agent.web_search_registry import list_providers print([p.name for p in list_providers()]) " # -> ['brave-free']	2026-05-13 22:31:28 -07:00

9 commits