hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-03 12:23:08 +00:00

Author	SHA1	Message	Date
kshitijk4poor	4ca5e72444	fix(web): preserve top-level error envelope on unconfigured systems Surfaced by local E2E behavior-parity testing of PR vs origin/main: the plugin-migrated dispatchers were quietly changing the error envelope shape returned to function-calling models on unconfigured systems. Two findings, both from per-result error wrapping bleeding into the pre-flight configuration error path: 1. search: ``firecrawl.search()`` caught the ``ValueError("Web tools are not configured...")`` from ``_get_firecrawl_client()`` and returned it as ``{"success": False, "error": ...}``, losing the legacy ``{"error": "Error searching web: ..."}`` envelope that ``tool_error()`` emits on main. Models that special-case the ``error`` key still detect the failure, but the prefix is part of the legacy contract some users rely on. 2. crawl: ``firecrawl.crawl()`` caught the same pre-flight ``ValueError`` and wrapped it as a per-page error inside ``results[0]``. Main short-circuits on ``check_firecrawl_api_key()`` BEFORE dispatching, so its unconfigured response is ``{"success": False, "error": "web_crawl requires Firecrawl..."}`` at the top level. The PR's per-page burying hid the failure inside ``results[]`` where models that check ``result.get("error")`` would miss it. Fix: - ``plugins/web/firecrawl/provider.py``: pull ``_get_firecrawl_client()`` outside the broad ``try`` in ``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate to the dispatcher's top-level exception handler. In-flight SDK errors still get wrapped as ``{"success": False, ...}``. - ``tools/web_tools.py``: mirror main's upstream availability gate in ``web_crawl_tool``. When the resolved crawl provider is ``is_available()==False``, short-circuit BEFORE dispatching with the same top-level error shape main emits. - ``tests/tools/test_web_providers.py``: 2 regression tests (``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so future plugin work can't undo this. Verified via local subprocess-based parity test (14/14 scenarios match origin/main shape exactly) and full 210/210 web test suite green.	2026-05-13 22:31:28 -07:00
kshitijk4poor	657e6d87cc	fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup Self-review of the plugin migration surfaced one warning and a handful of doc/dead-code cleanups. None affect production behaviour through the main dispatcher (which always calls `tools.web_tools._get_backend()` first and preserves the full 7-provider walk), but direct callers of `agent.web_search_registry.get_active__provider()` previously diverged from the legacy order and could return `None` for users with credentials but no explicit `web.backend` config key. Changes ------- 1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple `("brave-free", "firecrawl", "searxng", "ddgs")` while the PR description and the legacy `_get_backend()` candidate order both call for the 7-tuple `(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`. Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys and no config, `get_active_search_provider()` now returns tavily (was None); with EXA+PARALLEL it returns parallel (was None); with BRAVE+FIRECRAWL it returns firecrawl (was brave-free). 2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3 docstring, and inline comment all listed the old 4-tuple and claimed "brave-free first because it was the shipped default". The legacy default is `"firecrawl"`. Rewritten to match the new ordering and reference `tools.web_tools._get_backend()` as the source of truth. 3. `agent/web_search_registry.py` — `get_active_crawl_provider` docstring said "only Tavily implements it among built-in providers". Firecrawl also advertises `supports_crawl=True` after the previous commit. Updated to "Tavily and Firecrawl". 4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is the only built-in backend that natively crawls". Updated. 5. `agent/web_search_provider.py` — ABC docstring mentioned only `search` / `extract` capabilities. Added `crawl` for accuracy. 6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level cache globals (`_firecrawl_client`, `_parallel_client`, `_async_parallel_client`, `_exa_client`) were declared but never read (all reads/writes go through `_wt.` per the `extracting-inline- helpers-to-plugins` recipe). Removed the dead declarations; the reset-for-tests helpers in firecrawl + parallel now clear the canonical `_wt._<name>` slots, matching the pattern exa already used. Tests ----- 218/218 web-targeted tests still pass (no test changes needed). 4910/4910 in `tests/tools/` still green.	2026-05-13 22:31:28 -07:00
kshitijk4poor	21e3a863bb	feat(web): firecrawl plugin natively supports crawl; delete legacy inline path The web-provider migration originally left firecrawl crawl as the only provider-specific code remaining inline in tools/web_tools.py (~250 lines of Firecrawl-specific crawl orchestration that didn't fit the plugin's existing surface). This commit closes that gap. What this adds -------------- 1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)`` - Accepts the same kwargs as the dispatcher passes to any crawl provider (``instructions``, ``depth``, ``limit``); Firecrawl's /crawl endpoint ignores ``instructions`` and ``depth`` so we log and drop with a clear info message. - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the gateway event loop isn't blocked on a multi-page crawl. - Preserves the response-shape normalization across pydantic / typed-object / dict variants that the legacy inline code did. - Preserves per-page website-policy re-check (catches blocked redirects after the SDK returns). - Returns the same {"results": [...]} shape so the dispatcher's shared LLM-summarization post-processing path works unchanged. - Sets supports_crawl() to True so the dispatcher routes through the plugin instead of the legacy fallthrough. 2. tools/web_tools.py: delete the entire legacy firecrawl crawl block that used to run after "No registered provider supports crawl" — ~270 lines including: - check_firecrawl_api_key gate + typed error - inline SSRF + website-policy seed-URL gate (dispatcher already does this) - Firecrawl client setup with crawl_params - 100+ lines of pydantic/dict/typed-object normalization - Per-page LLM-processing loop (kept in the dispatcher's shared post-processing path; that's where it always belonged) - trimming + base64 image cleanup (still done in the dispatcher's shared path) Replaced with a single typed-error branch when no crawl-capable provider is available: "web_crawl has no available backend. Set FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set TAVILY_API_KEY for Tavily." Test updates ------------ - tests/tools/test_website_policy.py: - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL gate still runs on web_tools.check_website_access (no change to that patch), but the firecrawl client lockdown moved to the plugin module — patch firecrawl_provider._get_firecrawl_client instead of web_tools._get_firecrawl_client. The dispatcher short-circuits before the plugin runs, so the test still passes. - test_web_crawl_blocks_redirected_final_url: patch the per-page policy gate at plugins.web.firecrawl.provider.check_website_access (where it now runs) AND on web_tools (where the seed-URL gate still runs). Patch firecrawl_provider._get_firecrawl_client for the FakeCrawlClient injection. Both checks flow through the same fake_check function. - tests/plugins/web/test_web_search_provider_plugins.py: - Update parametrized capability-flag spec: firecrawl supports_crawl is now True. - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured — verifies inspect.iscoroutinefunction(p.crawl) is True and that the async crawl returns a per-page error dict (not a raise) when FIRECRAWL_API_KEY is missing. Verified -------- - 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl crawl test from this commit = 218 with the test deduplication). - Compile-clean (py_compile passes on both files). - Provider capabilities matrix confirmed end-to-end: name search extract crawl async-extract? async-crawl? firecrawl True True True True True tavily True True True False False Both crawl-capable providers exercise the dispatcher's inspect.iscoroutinefunction async-or-sync detection. Net diff -------- - tools/web_tools.py: -254 lines (legacy inline crawl gone) - plugins/web/firecrawl/provider.py: +185 lines (crawl method) - test_website_policy.py: +14/-9 lines (patch locations) - test_web_search_provider_plugins.py: +22/-1 lines (capability flag + new firecrawl crawl test) - Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763 before this commit, 2227 before the migration started).	2026-05-13 22:31:28 -07:00
kshitijk4poor	24fe60faa2	refactor(tools): drop hardcoded web picker rows + skiplist; plugins are sole source Removes the seven hardcoded TOOL_CATEGORIES["web"] provider rows that duplicated the plugin-registered providers, and deletes the _WEB_PLUGIN_SKIPLIST that existed to prevent duplicate picker rows during the migration. The Web Search & Extract category now derives its provider rows entirely from agent.web_search_registry via _plugin_web_search_providers(), matching how Spotify, Google Meet, and the image_gen plugins are surfaced. Removed (deduplicated against plugin schemas): - Firecrawl Cloud → plugins.web.firecrawl - Exa → plugins.web.exa - Parallel → plugins.web.parallel - Tavily → plugins.web.tavily - SearXNG → plugins.web.searxng - Brave Search (Free Tier) → plugins.web.brave_free - DuckDuckGo (ddgs) → plugins.web.ddgs (post_setup hook preserved) Retained in TOOL_CATEGORIES["web"]: - Nous Subscription — requires requires_nous_auth + managed_nous_feature + override_env_vars to drive the managed-gateway UX. Not a provider — a different setup flow for the firecrawl backend. - Firecrawl Self-Hosted — points firecrawl at a private Docker URL via FIRECRAWL_API_URL only. Same reason: UX setup-flow row, not a provider. These two rows describe alternative auth/billing paths for the firecrawl backend; they intentionally share web_backend="firecrawl" with the plugin row but light up different env-var prompts. Plugin schema extensions ------------------------ - ddgs plugin's get_setup_schema() now emits `post_setup: "ddgs"` so selection still triggers the pip-install hook in _run_post_setup(). - _plugin_web_search_providers() passes `post_setup` through verbatim when present in the schema (other future plugins like camofox / a hypothetical playwright-web plugin can opt in the same way). - Picker rows now carry both `web_backend` (legacy field consumed by setup + selection helpers) and `web_search_plugin_name` (informational marker), so behavior is identical between hardcoded and plugin-registered rows. Net diff -------- - hermes_cli/tools_config.py: -141/+50 lines (~91 lines net) - plugins/web/ddgs/provider.py: +7/-4 (post_setup field + badge polish) Verified -------- - Compile-clean for both files - Picker shows: 2 hardcoded rows (Nous Subscription, Firecrawl Self-Hosted) + 7 plugin rows (alphabetically: Brave Search, DuckDuckGo, Exa, Firecrawl, Parallel, SearXNG, Tavily). DuckDuckGo row carries post_setup="ddgs" for first-time install. - 173 web-specific tests still pass.	2026-05-13 22:31:28 -07:00
kshitijk4poor	748f3e016b	refactor(web): delete inline vendor helpers, re-export from plugins Removes ~580 lines of dead code from tools/web_tools.py that were superseded by the plugin migration but kept around in the cutover commit to keep the diff focused. Replaces them with thin re-export shims so existing tests and external callers that reach for the legacy ``tools.web_tools.<name>`` paths continue to work transparently. Deleted from tools/web_tools.py -------------------------------- - Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, the Firecrawl singleton) - Firecrawl client section (_get_direct_firecrawl_config, _get_firecrawl_gateway_url, _is_tool_gateway_ready, _has_direct_firecrawl_config, _raise_web_backend_configuration_error, _firecrawl_backend_help_suffix, _get_firecrawl_client) - Parallel client section (_get_parallel_client, _get_async_parallel_client, _parallel_client, _async_parallel_client) - Tavily client section (_TAVILY_BASE_URL, _tavily_request, _normalize_tavily_search_results, _normalize_tavily_documents) - Generic SDK normalizers (_to_plain_object, _normalize_result_list, _extract_web_search_results, _extract_scrape_payload) - Exa client section (_get_exa_client, _exa_client, _exa_search, _exa_extract) - Parallel helpers (_parallel_search, _parallel_extract) - Duplicate inline check_firecrawl_api_key Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines). Re-exports added at top of tools/web_tools.py --------------------------------------------- - From plugins.web.firecrawl.provider: Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls, _get_direct_firecrawl_config, _get_firecrawl_gateway_url, _is_tool_gateway_ready, _has_direct_firecrawl_config, _firecrawl_backend_help_suffix, _raise_web_backend_configuration_error, _get_firecrawl_client, _to_plain_object, _normalize_result_list, _extract_web_search_results, _extract_scrape_payload, check_firecrawl_api_key - From plugins.web.tavily.provider: _tavily_request, _normalize_tavily_search_results, _normalize_tavily_documents - From plugins.web.parallel.provider: _get_parallel_client, _get_async_parallel_client - From plugins.web.exa.provider: _get_exa_client Plus retained module-level imports for backward-compat with tests: - httpx (tests patch tools.web_tools.httpx for tavily request mocking) - build_vendor_gateway_url, _read_nous_access_token, resolve_managed_tool_gateway, managed_nous_tools_enabled, prefers_gateway (tests patch tools.web_tools.<name>) Plugin indirection pattern (key technique) ------------------------------------------ For functions inside the firecrawl/parallel/exa plugins to honor unit-test patches that target ``tools.web_tools.<name>``, the plugin implementations now do ``import tools.web_tools as _wt`` at call time and read helper names through that module (``_wt._read_nous_access_token``, ``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the existing test patches transparently reach the plugin code without any test changes. The cached client globals (_firecrawl_client, _firecrawl_client_config, _parallel_client, _async_parallel_client, _exa_client) also now live on tools.web_tools so existing test setup_method handlers that reset ``tools.web_tools._<vendor>_client = None`` between cases keep working. The plugins read/write the cache via getattr/setattr on the web_tools module. Verified -------- - 173/173 targeted web tests pass: test_web_providers.py, test_web_providers_brave_free.py, test_web_providers_ddgs.py, test_web_providers_searxng.py, test_web_tools_config.py, test_web_tools_tavily.py, test_website_policy.py, test_config_null_guard.py - Compile-clean (py_compile.compile passes) - All inline implementations now exist in exactly one place (plugins.web.<vendor>.provider) Follow-up clean-up ------------------ - Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows (next commit) - Delete tools/web_providers/ directory entirely - Add tests/plugins/web/ coverage - Full tests/tools/ + tests/gateway/ regression sweep before promoting PR	2026-05-13 22:31:28 -07:00
kshitijk4poor	5e54330e27	fix(web): preserve firecrawl crawl + website-policy gate after migration Two regressions discovered by running the full tests/tools/ suite after the dispatcher cutover, both fixed in this commit: 1. web_crawl_tool incorrectly errored "search-only" for firecrawl --------------------------------------------------------------------- The cutover treated any provider with supports_crawl()==False as a search-only backend and returned the typed search-only error. But firecrawl can crawl via the legacy multi-page-extract path inside web_crawl_tool — it just doesn't expose supports_crawl on the plugin (adding native firecrawl crawl is a clean follow-up). Fix: only emit the search-only error when the provider supports NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the provider supports extract but not crawl (firecrawl), fall through to the legacy firecrawl-via-extract path below. 2. firecrawl plugin's check_website_access wasn't patchable --------------------------------------------------------------------- The plugin imported `from tools.website_policy import check_website_access` INSIDE the extract() function body, so monkeypatching the name on plugins.web.firecrawl.provider had no effect — the inner import re-bound the name on every call. Fix: hoist the import to module level. Cheap (website_policy itself has no heavy deps) and makes the standard monkeypatch.setattr(firecrawl_provider, "check_website_access", ...) pattern work. Test updates (tests/tools/test_website_policy.py — 4 tests): - test_web_extract_short_circuits_blocked_url - test_web_extract_blocks_redirected_final_url Both: patch the gate at plugins.web.firecrawl.provider (where it runs after migration) and force the firecrawl plugin to be the active extract provider via FIRECRAWL_API_KEY. - test_web_crawl_short_circuits_blocked_url - test_web_crawl_blocks_redirected_final_url Both: unchanged — the dispatcher-level gate at tools.web_tools.py line 1651 still uses the imported `check_website_access` name and the firecrawl-fallthrough path is exercised as before. Verified: 22/22 tests/tools/test_website_policy.py pass.	2026-05-13 22:31:28 -07:00
kshitijk4poor	143184e943	feat(web): firecrawl plugin — largest migration (search + async extract + dual auth) Migrates Firecrawl from inline code in tools/web_tools.py to a bundled plugin at plugins/web/firecrawl/. By line count this is the largest of the seven provider migrations: the firecrawl path captured most of the file's vendor-specific complexity. What moved into the plugin (all previously in tools/web_tools.py): Lazy Firecrawl SDK proxy - _load_firecrawl_cls() — caches the imported SDK class - _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK imports until first construction or isinstance check. Client construction (dual auth) - _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path - _get_firecrawl_gateway_url() — managed Nous tool-gateway URL - _is_tool_gateway_ready() — gateway URL + Nous token check - _has_direct_firecrawl_config() — direct config present? - _get_firecrawl_client() — combined client construction honoring web.use_gateway - check_firecrawl_api_key() — top-level "is firecrawl usable" - _firecrawl_backend_help_suffix() — managed-gateway help string - _raise_web_backend_configuration_error() — typed misconfig error Response shape normalization (vendor-specific) - _to_plain_object(), _normalize_result_list() — SDK→dict helpers - _extract_web_search_results() — handles SDK/direct/gateway shapes - _extract_scrape_payload() — nested-data unwrap for scrape Per-URL extract loop - 60s asyncio.wait_for timeout per URL - Pre-scrape website-policy gate - Post-scrape redirect-aware SSRF re-check - Format-aware content selection (markdown / html / auto) - Per-URL errors returned as {"error": str} entries, no raises Extract is declared `async def` — each URL is scraped in asyncio.to_thread(...). This is the second async-extract plugin after parallel. The plugin re-exports `Firecrawl` (the lazy proxy) and `check_firecrawl_api_key()` so existing tests doing `patch("tools.web_tools.Firecrawl")` or `monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep working — tools/web_tools.py re-exports both names in the next dispatcher-cutover commit. Note: web_crawl_tool still has its own Firecrawl crawl path inline (separate from extract); the Firecrawl SDK supports /crawl but we don't expose supports_crawl=True on this plugin yet. Tavily handles crawl today. Adding Firecrawl crawl is a clean follow-up. Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST. E2E verified: - All 7 providers register: brave-free, ddgs, exa, firecrawl, parallel, searxng, tavily - inspect.iscoroutinefunction(firecrawl.extract) -> True - Firecrawl proxy is a callable lazy proxy at module level - check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence	2026-05-13 22:31:28 -07:00
kshitijk4poor	31fcde876c	feat(web): tavily plugin — first three-capability plugin (search + extract + crawl) Migrates Tavily from inline _tavily_request() / _normalize_tavily_* helpers in tools/web_tools.py to a bundled plugin at plugins/web/tavily/. First plugin in the codebase to advertise supports_crawl=True. Tavily is unique among built-in backends in offering a native /crawl endpoint that walks linked pages from a seed URL with optional natural-language instructions and depth ("basic" or "advanced"). Capabilities: - supports_search() -> True (Tavily /search) - supports_extract() -> True (Tavily /extract) - supports_crawl() -> True (Tavily /crawl) All sync (httpx.post under the hood). The crawl method accepts forward-compat kwargs (instructions, depth, limit) and is gated against unsafe URLs/policy by the dispatcher in web_crawl_tool — exactly as before. Behavior preserved: - TAVILY_API_KEY required (ValueError → typed error response) - TAVILY_BASE_URL env override honored - /crawl requires both body auth AND Bearer header — preserved - failed_results[] and failed_urls[] response keys mapped to per-URL items with error fields rather than raising - max_results capped at 20 server-side Adds "tavily" to _WEB_PLUGIN_SKIPLIST. The legacy inline _tavily_request / _normalize_tavily_search_results / _normalize_tavily_documents / _TAVILY_BASE_URL in tools/web_tools.py are NOT deleted yet — search/extract dispatch and the entire web_crawl_tool function still reference them. They go away when those dispatchers are cut over to the registry. E2E verified: - Tavily registers with all 3 capabilities - Provider list now: brave-free, ddgs, exa, parallel, searxng, tavily	2026-05-13 22:31:28 -07:00
kshitijk4poor	4816646109	feat(web): parallel plugin — first async-extract plugin Migrates Parallel.ai from inline `_parallel_search()` / `_parallel_extract()` in tools/web_tools.py to a bundled plugin at plugins/web/parallel/. First plugin in the codebase to expose an async :meth:`extract`: - search() is sync — Parallel.beta.search - extract() is async def — AsyncParallel.beta.extract The ABC's docstring on supports_extract() already permits sync-or-async; this commit is the first to exercise the async path. The web_extract_tool dispatcher (next commit) detects coroutines via inspect.iscoroutinefunction and awaits accordingly. Behavior preserved: - PARALLEL_API_KEY required (raises ValueError if missing → surfaced as {"success": False, "error": "..."} instead) - PARALLEL_SEARCH_MODE env var honored (agentic\|fast\|one-shot, default agentic), validated via _resolve_search_mode() - Limit capped at 20 server-side via min(limit, 20) - Per-URL failure mode preserved: response.errors[] each become a result dict with an "error" field rather than raising - Module-level _parallel_client / _async_parallel_client caches kept (mirrors legacy singleton pattern) Adds "parallel" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the picker doesn't double-list. The legacy inline _parallel_search, _parallel_extract, _get_parallel_client, _get_async_parallel_client in tools/web_tools.py are NOT deleted yet — the dispatcher still calls them. They go away when the dispatcher cuts over. E2E verified: - inspect.iscoroutinefunction(p.search) -> False - inspect.iscoroutinefunction(p.extract) -> True - extract() returns a coroutine (not a list) - 5 providers register correctly (brave-free, ddgs, exa, parallel, searxng)	2026-05-13 22:31:28 -07:00
kshitijk4poor	ec8449e9c6	feat(web): exa plugin — first multi-capability migration (search + extract) Migrates Exa from the inline `_exa_search()` / `_exa_extract()` helpers in tools/web_tools.py to a bundled plugin at plugins/web/exa/. This is the first plugin in this PR to advertise supports_extract=True, exercising the multi-capability ABC path that the initial three migrations (brave_free, ddgs, searxng — all search-only) did not cover. Both Exa methods are sync — the SDK is sync-only. The web_extract_tool dispatcher in tools/web_tools.py will continue to call them inline until Task "dispatch-extract-all" cuts it over to the registry. Behaviour preserved bit-for-bit aside from the ABC method-name change: - is_configured() -> is_available() - provider_name() -> name (property) - "exa" stays as the registered name - Module-level `_exa_client` cache + lazy `from exa_py import Exa` preserved at the new location. - Errors (ValueError for missing API key, ImportError for missing SDK, generic Exception) caught and surfaced as {"success": False, "error": ...} instead of raising. Adds "exa" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the hardcoded TOOL_CATEGORIES["web"] row and the plugin-injected row don't duplicate during the spike. The skip-list goes away in the cleanup phase along with the hardcoded row. The legacy inline `_exa_search` / `_exa_extract` / `_get_exa_client` / `_exa_client` in tools/web_tools.py are NOT deleted yet — the dispatcher still references them. They go away in the next dispatcher-cutover commit. E2E verified: - Plugin discovers + registers - .supports_search/.supports_extract/.supports_crawl = (True, True, False) - .get_setup_schema() returns the picker row shape - resolve(): explicit exa + EXA_API_KEY -> exa; without key -> exa (registered but unavailable, dispatcher surfaces "EXA_API_KEY not set" error)	2026-05-13 22:31:28 -07:00
kshitijk4poor	6b219f5af6	refactor(web): remove legacy in-tree provider modules Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three providers that moved to plugins/web/ in prior commits. tools/web_tools.py no longer imports them (registry dispatch as of `d8735963f`), so removing them is purely a cleanup pass. Also migrates the existing tests to the new import paths: tests/tools/test_web_providers_brave_free.py tests/tools/test_web_providers_ddgs.py tests/tools/test_web_providers_searxng.py Mechanical rewrites: - `from tools.web_providers.X import YSearchProvider` -> `from plugins.web.X.provider import YWebSearchProvider` - `.is_configured()` -> `.is_available()` (legacy method -> new method) - `.provider_name()` -> `.name` (legacy method -> new property) - `from tools.web_providers.base import WebSearchProvider` -> `from agent.web_search_provider import WebSearchProvider` (the subclass-check asserts membership in the new plugin-facing ABC) - `sys.modules.delitem("tools.web_providers.ddgs")` updated to point at `plugins.web.ddgs.provider` (cache-busting for lazy ddgs imports) The TestXBackendWiring / TestXSearchOnlyErrors classes (covering _is_backend_available, _get_backend, check_web_api_key, and the "search-only" error paths in web_extract/web_crawl) are untouched — those still test web_tools.py's backend-selection logic, which continues to recognize the names "brave-free" / "ddgs" / "searxng" even after the modules behind them moved to plugins. tools/web_providers/base.py is intentionally NOT deleted by this commit — it's the parent ABC of the legacy modules and shares its name with agent/web_search_provider.py::WebSearchProvider. Removing it surfaces the naming collision (see PR description Finding 0); the real migration PR deletes it in the same commit that drops the _WEB_PLUGIN_SKIPLIST guards in hermes_cli/tools_config.py. Test results: bash scripts/run_tests.sh tests/tools/test_web_providers_.py -> 65 passed in 3.41s (all rewritten unit tests + unchanged integration tests) bash scripts/run_tests.sh tests/tools/test_web_.py -> 141 passed in 4.70s (full web test set, post-deletion)	2026-05-13 22:31:28 -07:00
kshitijk4poor	0d085d9454	feat(web): searxng plugin (search-only, third migration) Adds plugins/web/searxng/. SearXNG aggregates results from upstream engines via its JSON API (/search?format=json) — search-only, no extract capability (supports_extract() returns False). E2E verified — registry now has ['brave-free', 'ddgs', 'searxng'].	2026-05-13 22:31:28 -07:00
kshitijk4poor	5c7d098bee	feat(web): ddgs plugin (second migration) Adds plugins/web/ddgs/ following the same plugins/image_gen/ pattern as brave_free. DuckDuckGo search via the community ddgs package; no API key, package is an optional dep gated by is_available(). E2E verified — registry now has ['brave-free', 'ddgs'].	2026-05-13 22:31:28 -07:00
kshitijk4poor	d403cf018c	feat(web): brave_free plugin (first migration from tools/web_providers/) Adds plugins/web/brave_free/ as the first plugin built against the new WebSearchProvider ABC. Mirrors the plugins/image_gen/openai/ layout exactly: plugins/web/brave_free/ plugin.yaml kind: backend, provides_web_providers: [brave-free] __init__.py register(ctx) -> ctx.register_web_search_provider(...) provider.py BraveFreeWebSearchProvider(WebSearchProvider) Behavior preserved: same name ("brave-free" with hyphen), same env var (BRAVE_SEARCH_API_KEY), same HTTP request shape, same response normalization. The legacy tools/web_providers/brave_free.py is left in place — the dispatcher in tools/web_tools.py still references it. Task 7 cuts over the dispatcher to the new registry; Task 10 deletes the legacy file. E2E verified: HERMES_PLUGINS_DEBUG=1 python -c " from hermes_cli.plugins import _ensure_plugins_discovered _ensure_plugins_discovered() from agent.web_search_registry import list_providers print([p.name for p in list_providers()]) " # -> ['brave-free']	2026-05-13 22:31:28 -07:00

14 commits