hermes-agent/tests
kshitijk4poor 21e3a863bb feat(web): firecrawl plugin natively supports crawl; delete legacy inline path
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.

What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
   - Accepts the same kwargs as the dispatcher passes to any crawl
     provider (``instructions``, ``depth``, ``limit``); Firecrawl's
     /crawl endpoint ignores ``instructions`` and ``depth`` so we log
     and drop with a clear info message.
   - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
     gateway event loop isn't blocked on a multi-page crawl.
   - Preserves the response-shape normalization across pydantic /
     typed-object / dict variants that the legacy inline code did.
   - Preserves per-page website-policy re-check (catches blocked
     redirects after the SDK returns).
   - Returns the same {"results": [...]} shape so the dispatcher's
     shared LLM-summarization post-processing path works unchanged.
   - Sets supports_crawl() to True so the dispatcher routes through
     the plugin instead of the legacy fallthrough.

2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
   that used to run after "No registered provider supports crawl" —
   ~270 lines including:
   - check_firecrawl_api_key gate + typed error
   - inline SSRF + website-policy seed-URL gate (dispatcher already
     does this)
   - Firecrawl client setup with crawl_params
   - 100+ lines of pydantic/dict/typed-object normalization
   - Per-page LLM-processing loop (kept in the dispatcher's shared
     post-processing path; that's where it always belonged)
   - trimming + base64 image cleanup (still done in the dispatcher's
     shared path)

   Replaced with a single typed-error branch when no crawl-capable
   provider is available: "web_crawl has no available backend. Set
   FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
   TAVILY_API_KEY for Tavily."

Test updates
------------
- tests/tools/test_website_policy.py:
  - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
    gate still runs on web_tools.check_website_access (no change to
    that patch), but the firecrawl client lockdown moved to the
    plugin module — patch firecrawl_provider._get_firecrawl_client
    instead of web_tools._get_firecrawl_client. The dispatcher
    short-circuits before the plugin runs, so the test still passes.
  - test_web_crawl_blocks_redirected_final_url: patch the per-page
    policy gate at plugins.web.firecrawl.provider.check_website_access
    (where it now runs) AND on web_tools (where the seed-URL gate
    still runs). Patch firecrawl_provider._get_firecrawl_client for
    the FakeCrawlClient injection. Both checks flow through the same
    fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
  - Update parametrized capability-flag spec: firecrawl supports_crawl
    is now True.
  - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
    verifies inspect.iscoroutinefunction(p.crawl) is True and that
    the async crawl returns a per-page error dict (not a raise) when
    FIRECRAWL_API_KEY is missing.

Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
  crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
    name        search  extract  crawl   async-extract?  async-crawl?
    firecrawl   True    True     True    True            True
    tavily      True    True     True    False           False
  Both crawl-capable providers exercise the dispatcher's
  inspect.iscoroutinefunction async-or-sync detection.

Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
  + new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
  before this commit, 2227 before the migration started).
2026-05-13 22:31:28 -07:00
..
acp fix(acp): preserve assistant reasoning metadata in session persistence 2026-05-05 10:18:28 -07:00
acp_adapter fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
agent fix(compression): keep default protect_first_n at 3 + align ABC 2026-05-13 22:25:16 -07:00
cli fix(cli): preserve startup banner on terminal resize 2026-05-13 13:36:31 -07:00
cron fix(cron): avoid github skill false positives in scanner 2026-05-09 11:11:45 -07:00
e2e fix(dashboard): UI polish — modals, layout, consistency, test fixes 2026-05-12 13:59:22 -04:00
environments/benchmarks
fakes
gateway feat(discord): add thread_require_mention for multi-bot threads 2026-05-13 22:21:43 -07:00
hermes_cli refactor(inventory): extract shared ConfigContext + build_models_payload 2026-05-13 22:31:11 -07:00
hermes_state
honcho_plugin
integration
openviking_plugin
plugins feat(web): firecrawl plugin natively supports crawl; delete legacy inline path 2026-05-13 22:31:28 -07:00
providers feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779) 2026-05-12 20:49:20 -07:00
run_agent test(memory): cover cache-parity + runtime whitelist on background review fork 2026-05-13 22:12:47 -07:00
skills Add unit tests for hyperliquid skill functionality 2026-05-10 22:15:04 -07:00
stress fix(kanban): gate claim + unblock on parent completion 2026-05-09 11:07:37 -07:00
tools feat(web): firecrawl plugin natively supports crawl; delete legacy inline path 2026-05-13 22:31:28 -07:00
tui_gateway fix(tui): close slash parity gaps with CLI (#20339) 2026-05-05 15:42:39 -05:00
website docs(skills): explain restoring bundled skills 2026-05-05 13:46:20 -07:00
__init__.py
conftest.py test(conftest): plug every gateway-kill leak path (#23486) 2026-05-10 18:55:28 -07:00
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py
test_base_url_hostname.py
test_batch_runner_checkpoint.py
test_cli_file_drop.py
test_cli_manual_compress.py
test_cli_skin_integration.py
test_ctx_halving_fix.py fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778) 2026-05-12 20:46:04 -07:00
test_empty_model_fallback.py
test_evidence_store.py
test_get_tool_definitions_cache_isolation.py fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335) 2026-04-30 04:32:06 -07:00
test_hermes_bootstrap.py fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes (#22091) 2026-05-08 14:43:13 -07:00
test_hermes_constants.py test(hermes_constants): cover parse_reasoning_effort() 2026-05-07 09:59:07 -07:00
test_hermes_home_profile_warning.py fix(constants): warn once when get_hermes_home() falls back under an active profile (#18746) 2026-05-02 01:49:55 -07:00
test_hermes_logging.py
test_hermes_state.py fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494) 2026-05-09 17:53:02 -07:00
test_hermes_state_wal_fallback.py fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043) 2026-05-09 02:09:35 -07:00
test_honcho_client_config.py
test_install_sh_browser_install.py fix(install): skip browser download when system chromium exists 2026-05-13 22:07:02 -07:00
test_install_sh_pythonpath_sanitization.py fix: harden install.sh against inherited Python env leakage 2026-05-06 04:02:02 -07:00
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_termux_network_prereqs.py fix: strengthen termux install network prerequisites 2026-05-07 13:04:08 -07:00
test_ipv4_preference.py
test_lazy_session_regressions.py fix: resolve lazy session creation regressions (#18370 fallout) (#20363) 2026-05-06 01:11:49 +05:30
test_lint_config.py lint: enable PLW1514 as a blocking ruff rule 2026-05-08 14:27:40 -07:00
test_live_system_guard_self_test.py test(conftest): plug every gateway-kill leak path (#23486) 2026-05-10 18:55:28 -07:00
test_mcp_serve.py fix(mcp): unwrap platforms key in channels_list 2026-05-07 13:41:16 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py fix(minimax): harden OAuth dashboard and runtime 2026-05-11 22:15:16 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py
test_model_tools.py
test_model_tools_async_bridge.py
test_ollama_num_ctx.py
test_packaging_metadata.py
test_plugin_skills.py fix(skills): support category-qualified local skill names 2026-05-05 10:15:31 -07:00
test_process_loop_event_loop_warning.py fix(cli): replace get_event_loop() with get_running_loop() to silence RuntimeWarning in process_loop thread (#19285) 2026-05-07 06:35:54 -07:00
test_project_metadata.py fix(install): use --extra all not --all-extras; drop lazy-covered extras from [all] (#24515) 2026-05-12 15:06:25 -07:00
test_retry_utils.py
test_sql_injection.py
test_subprocess_home_isolation.py
test_termux_all_extra_compat.py fix: add termux-all install profile and safe fallbacks 2026-05-07 13:04:08 -07:00
test_timezone.py
test_toolset_distributions.py
test_toolsets.py fix: merge plugin tools into builtin toolsets 2026-05-05 10:14:17 -07:00
test_trajectory_compressor.py
test_trajectory_compressor_async.py
test_transform_llm_output_hook.py test+docs: cover transform_llm_output hook + release author map 2026-05-07 05:46:05 -07:00
test_transform_tool_result_hook.py
test_tui_gateway_server.py Merge pull request #20942 from NousResearch/austin/fix/personality 2026-05-07 18:54:29 -04:00
test_utils_truthy_values.py
test_yuanbao_integration.py
test_yuanbao_markdown.py
test_yuanbao_pipeline.py
test_yuanbao_proto.py