hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-27 17:58:07 +00:00

Author	SHA1	Message	Date
Teknium	0a5762c78d	fix(web): genericize free-MCP client identity per telemetry policy Replace the hermes-identifying clientInfo/User-Agent/session-id prefix on the keyless Parallel Search MCP path with a neutral 'mcp-web-client' identity. Project policy forbids third-party usage attribution without an explicit user opt-in (see telemetry PR policy); MCP requires a clientInfo, so a generic one satisfies the spec without attributing traffic. Also adds the contributor AUTHOR_MAP entry and refreshes uv.lock against current main (parallel-web 0.6.0).	2026-06-10 19:54:38 -07:00
Matt Harris	e0e2571711	feat(web): Parallel-backed web search & extract — free Search MCP when keyless, v1 REST when keyed Make Parallel the web search/extract backend with a zero-setup free tier: - Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel becomes the default backend when no other web credentials are configured (ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing web_search/web_extract tools are the only tools registered. - Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints (client.search / client.extract with advanced_settings.full_content) — no beta. Bumps parallel-web 0.4.2 -> 0.6.0. - Attribution: on the free path only, results carry provider/attribution and the CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is unbranded. - Selection/registration: web tools register unconditionally (free MCP backstop) while check_web_api_key remains a real usability probe; explicit per-capability backends are honored (so misconfig surfaces) rather than masked by the fallback. Tested: live web_search/web_extract against search.parallel.ai in keyless and keyed modes; unit suites for the MCP client, backend selection, and display labeling; full agent run shows the "Parallel search" label on the free path.	2026-06-10 19:54:38 -07:00
kshitijk4poor	657e6d87cc	fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup Self-review of the plugin migration surfaced one warning and a handful of doc/dead-code cleanups. None affect production behaviour through the main dispatcher (which always calls `tools.web_tools._get_backend()` first and preserves the full 7-provider walk), but direct callers of `agent.web_search_registry.get_active__provider()` previously diverged from the legacy order and could return `None` for users with credentials but no explicit `web.backend` config key. Changes ------- 1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple `("brave-free", "firecrawl", "searxng", "ddgs")` while the PR description and the legacy `_get_backend()` candidate order both call for the 7-tuple `(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`. Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys and no config, `get_active_search_provider()` now returns tavily (was None); with EXA+PARALLEL it returns parallel (was None); with BRAVE+FIRECRAWL it returns firecrawl (was brave-free). 2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3 docstring, and inline comment all listed the old 4-tuple and claimed "brave-free first because it was the shipped default". The legacy default is `"firecrawl"`. Rewritten to match the new ordering and reference `tools.web_tools._get_backend()` as the source of truth. 3. `agent/web_search_registry.py` — `get_active_crawl_provider` docstring said "only Tavily implements it among built-in providers". Firecrawl also advertises `supports_crawl=True` after the previous commit. Updated to "Tavily and Firecrawl". 4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is the only built-in backend that natively crawls". Updated. 5. `agent/web_search_provider.py` — ABC docstring mentioned only `search` / `extract` capabilities. Added `crawl` for accuracy. 6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level cache globals (`_firecrawl_client`, `_parallel_client`, `_async_parallel_client`, `_exa_client`) were declared but never read (all reads/writes go through `_wt.` per the `extracting-inline- helpers-to-plugins` recipe). Removed the dead declarations; the reset-for-tests helpers in firecrawl + parallel now clear the canonical `_wt._<name>` slots, matching the pattern exa already used. Tests ----- 218/218 web-targeted tests still pass (no test changes needed). 4910/4910 in `tests/tools/` still green.	2026-05-13 22:31:28 -07:00
kshitijk4poor	748f3e016b	refactor(web): delete inline vendor helpers, re-export from plugins Removes ~580 lines of dead code from tools/web_tools.py that were superseded by the plugin migration but kept around in the cutover commit to keep the diff focused. Replaces them with thin re-export shims so existing tests and external callers that reach for the legacy ``tools.web_tools.<name>`` paths continue to work transparently. Deleted from tools/web_tools.py -------------------------------- - Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, the Firecrawl singleton) - Firecrawl client section (_get_direct_firecrawl_config, _get_firecrawl_gateway_url, _is_tool_gateway_ready, _has_direct_firecrawl_config, _raise_web_backend_configuration_error, _firecrawl_backend_help_suffix, _get_firecrawl_client) - Parallel client section (_get_parallel_client, _get_async_parallel_client, _parallel_client, _async_parallel_client) - Tavily client section (_TAVILY_BASE_URL, _tavily_request, _normalize_tavily_search_results, _normalize_tavily_documents) - Generic SDK normalizers (_to_plain_object, _normalize_result_list, _extract_web_search_results, _extract_scrape_payload) - Exa client section (_get_exa_client, _exa_client, _exa_search, _exa_extract) - Parallel helpers (_parallel_search, _parallel_extract) - Duplicate inline check_firecrawl_api_key Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines). Re-exports added at top of tools/web_tools.py --------------------------------------------- - From plugins.web.firecrawl.provider: Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls, _get_direct_firecrawl_config, _get_firecrawl_gateway_url, _is_tool_gateway_ready, _has_direct_firecrawl_config, _firecrawl_backend_help_suffix, _raise_web_backend_configuration_error, _get_firecrawl_client, _to_plain_object, _normalize_result_list, _extract_web_search_results, _extract_scrape_payload, check_firecrawl_api_key - From plugins.web.tavily.provider: _tavily_request, _normalize_tavily_search_results, _normalize_tavily_documents - From plugins.web.parallel.provider: _get_parallel_client, _get_async_parallel_client - From plugins.web.exa.provider: _get_exa_client Plus retained module-level imports for backward-compat with tests: - httpx (tests patch tools.web_tools.httpx for tavily request mocking) - build_vendor_gateway_url, _read_nous_access_token, resolve_managed_tool_gateway, managed_nous_tools_enabled, prefers_gateway (tests patch tools.web_tools.<name>) Plugin indirection pattern (key technique) ------------------------------------------ For functions inside the firecrawl/parallel/exa plugins to honor unit-test patches that target ``tools.web_tools.<name>``, the plugin implementations now do ``import tools.web_tools as _wt`` at call time and read helper names through that module (``_wt._read_nous_access_token``, ``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the existing test patches transparently reach the plugin code without any test changes. The cached client globals (_firecrawl_client, _firecrawl_client_config, _parallel_client, _async_parallel_client, _exa_client) also now live on tools.web_tools so existing test setup_method handlers that reset ``tools.web_tools._<vendor>_client = None`` between cases keep working. The plugins read/write the cache via getattr/setattr on the web_tools module. Verified -------- - 173/173 targeted web tests pass: test_web_providers.py, test_web_providers_brave_free.py, test_web_providers_ddgs.py, test_web_providers_searxng.py, test_web_tools_config.py, test_web_tools_tavily.py, test_website_policy.py, test_config_null_guard.py - Compile-clean (py_compile.compile passes) - All inline implementations now exist in exactly one place (plugins.web.<vendor>.provider) Follow-up clean-up ------------------ - Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows (next commit) - Delete tools/web_providers/ directory entirely - Add tests/plugins/web/ coverage - Full tests/tools/ + tests/gateway/ regression sweep before promoting PR	2026-05-13 22:31:28 -07:00
kshitijk4poor	4816646109	feat(web): parallel plugin — first async-extract plugin Migrates Parallel.ai from inline `_parallel_search()` / `_parallel_extract()` in tools/web_tools.py to a bundled plugin at plugins/web/parallel/. First plugin in the codebase to expose an async :meth:`extract`: - search() is sync — Parallel.beta.search - extract() is async def — AsyncParallel.beta.extract The ABC's docstring on supports_extract() already permits sync-or-async; this commit is the first to exercise the async path. The web_extract_tool dispatcher (next commit) detects coroutines via inspect.iscoroutinefunction and awaits accordingly. Behavior preserved: - PARALLEL_API_KEY required (raises ValueError if missing → surfaced as {"success": False, "error": "..."} instead) - PARALLEL_SEARCH_MODE env var honored (agentic\|fast\|one-shot, default agentic), validated via _resolve_search_mode() - Limit capped at 20 server-side via min(limit, 20) - Per-URL failure mode preserved: response.errors[] each become a result dict with an "error" field rather than raising - Module-level _parallel_client / _async_parallel_client caches kept (mirrors legacy singleton pattern) Adds "parallel" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the picker doesn't double-list. The legacy inline _parallel_search, _parallel_extract, _get_parallel_client, _get_async_parallel_client in tools/web_tools.py are NOT deleted yet — the dispatcher still calls them. They go away when the dispatcher cuts over. E2E verified: - inspect.iscoroutinefunction(p.search) -> False - inspect.iscoroutinefunction(p.extract) -> True - extract() returns a coroutine (not a list) - 5 providers register correctly (brave-free, ddgs, exa, parallel, searxng)	2026-05-13 22:31:28 -07:00

5 commits