hermes-agent

mirrors/hermes-agent

Fork 0

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-20 05:01:30 +00:00

Commit graph

Author	SHA1	Message	Date
kshitijk4poor	5e54330e27	fix(web): preserve firecrawl crawl + website-policy gate after migration Two regressions discovered by running the full tests/tools/ suite after the dispatcher cutover, both fixed in this commit: 1. web_crawl_tool incorrectly errored "search-only" for firecrawl --------------------------------------------------------------------- The cutover treated any provider with supports_crawl()==False as a search-only backend and returned the typed search-only error. But firecrawl can crawl via the legacy multi-page-extract path inside web_crawl_tool — it just doesn't expose supports_crawl on the plugin (adding native firecrawl crawl is a clean follow-up). Fix: only emit the search-only error when the provider supports NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the provider supports extract but not crawl (firecrawl), fall through to the legacy firecrawl-via-extract path below. 2. firecrawl plugin's check_website_access wasn't patchable --------------------------------------------------------------------- The plugin imported `from tools.website_policy import check_website_access` INSIDE the extract() function body, so monkeypatching the name on plugins.web.firecrawl.provider had no effect — the inner import re-bound the name on every call. Fix: hoist the import to module level. Cheap (website_policy itself has no heavy deps) and makes the standard monkeypatch.setattr(firecrawl_provider, "check_website_access", ...) pattern work. Test updates (tests/tools/test_website_policy.py — 4 tests): - test_web_extract_short_circuits_blocked_url - test_web_extract_blocks_redirected_final_url Both: patch the gate at plugins.web.firecrawl.provider (where it runs after migration) and force the firecrawl plugin to be the active extract provider via FIRECRAWL_API_KEY. - test_web_crawl_short_circuits_blocked_url - test_web_crawl_blocks_redirected_final_url Both: unchanged — the dispatcher-level gate at tools.web_tools.py line 1651 still uses the imported `check_website_access` name and the firecrawl-fallthrough path is exercised as before. Verified: 22/22 tests/tools/test_website_policy.py pass.	2026-05-13 22:31:28 -07:00
kshitijk4poor	143184e943	feat(web): firecrawl plugin — largest migration (search + async extract + dual auth) Migrates Firecrawl from inline code in tools/web_tools.py to a bundled plugin at plugins/web/firecrawl/. By line count this is the largest of the seven provider migrations: the firecrawl path captured most of the file's vendor-specific complexity. What moved into the plugin (all previously in tools/web_tools.py): Lazy Firecrawl SDK proxy - _load_firecrawl_cls() — caches the imported SDK class - _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK imports until first construction or isinstance check. Client construction (dual auth) - _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path - _get_firecrawl_gateway_url() — managed Nous tool-gateway URL - _is_tool_gateway_ready() — gateway URL + Nous token check - _has_direct_firecrawl_config() — direct config present? - _get_firecrawl_client() — combined client construction honoring web.use_gateway - check_firecrawl_api_key() — top-level "is firecrawl usable" - _firecrawl_backend_help_suffix() — managed-gateway help string - _raise_web_backend_configuration_error() — typed misconfig error Response shape normalization (vendor-specific) - _to_plain_object(), _normalize_result_list() — SDK→dict helpers - _extract_web_search_results() — handles SDK/direct/gateway shapes - _extract_scrape_payload() — nested-data unwrap for scrape Per-URL extract loop - 60s asyncio.wait_for timeout per URL - Pre-scrape website-policy gate - Post-scrape redirect-aware SSRF re-check - Format-aware content selection (markdown / html / auto) - Per-URL errors returned as {"error": str} entries, no raises Extract is declared `async def` — each URL is scraped in asyncio.to_thread(...). This is the second async-extract plugin after parallel. The plugin re-exports `Firecrawl` (the lazy proxy) and `check_firecrawl_api_key()` so existing tests doing `patch("tools.web_tools.Firecrawl")` or `monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep working — tools/web_tools.py re-exports both names in the next dispatcher-cutover commit. Note: web_crawl_tool still has its own Firecrawl crawl path inline (separate from extract); the Firecrawl SDK supports /crawl but we don't expose supports_crawl=True on this plugin yet. Tavily handles crawl today. Adding Firecrawl crawl is a clean follow-up. Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST. E2E verified: - All 7 providers register: brave-free, ddgs, exa, firecrawl, parallel, searxng, tavily - inspect.iscoroutinefunction(firecrawl.extract) -> True - Firecrawl proxy is a callable lazy proxy at module level - check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence	2026-05-13 22:31:28 -07:00

Author

SHA1

Message

Date

kshitijk4poor

5e54330e27

fix(web): preserve firecrawl crawl + website-policy gate after migration

Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:

1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).

Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.

2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.

Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.

Test updates (tests/tools/test_website_policy.py — 4 tests):
  - test_web_extract_short_circuits_blocked_url
  - test_web_extract_blocks_redirected_final_url
    Both: patch the gate at plugins.web.firecrawl.provider (where it
    runs after migration) and force the firecrawl plugin to be the
    active extract provider via FIRECRAWL_API_KEY.
  - test_web_crawl_short_circuits_blocked_url
  - test_web_crawl_blocks_redirected_final_url
    Both: unchanged — the dispatcher-level gate at tools.web_tools.py
    line 1651 still uses the imported `check_website_access` name and
    the firecrawl-fallthrough path is exercised as before.

Verified: 22/22 tests/tools/test_website_policy.py pass.

2026-05-13 22:31:28 -07:00

kshitijk4poor

143184e943

feat(web): firecrawl plugin — largest migration (search + async extract + dual auth)

Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.

What moved into the plugin (all previously in tools/web_tools.py):

  Lazy Firecrawl SDK proxy
    - _load_firecrawl_cls() — caches the imported SDK class
    - _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
      imports until first construction or isinstance check.

  Client construction (dual auth)
    - _get_direct_firecrawl_config()  — direct FIRECRAWL_API_KEY/URL path
    - _get_firecrawl_gateway_url()    — managed Nous tool-gateway URL
    - _is_tool_gateway_ready()        — gateway URL + Nous token check
    - _has_direct_firecrawl_config()  — direct config present?
    - _get_firecrawl_client()         — combined client construction
                                        honoring web.use_gateway
    - check_firecrawl_api_key()       — top-level "is firecrawl usable"
    - _firecrawl_backend_help_suffix() — managed-gateway help string
    - _raise_web_backend_configuration_error() — typed misconfig error

  Response shape normalization (vendor-specific)
    - _to_plain_object(), _normalize_result_list() — SDK→dict helpers
    - _extract_web_search_results() — handles SDK/direct/gateway shapes
    - _extract_scrape_payload()     — nested-data unwrap for scrape

  Per-URL extract loop
    - 60s asyncio.wait_for timeout per URL
    - Pre-scrape website-policy gate
    - Post-scrape redirect-aware SSRF re-check
    - Format-aware content selection (markdown / html / auto)
    - Per-URL errors returned as {"error": str} entries, no raises

Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.

The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.

Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.

Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.

E2E verified:
  - All 7 providers register: brave-free, ddgs, exa, firecrawl,
    parallel, searxng, tavily
  - inspect.iscoroutinefunction(firecrawl.extract) -> True
  - Firecrawl proxy is a callable lazy proxy at module level
  - check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence

2026-05-13 22:31:28 -07:00

2 commits