Commit graph

2 commits

Author SHA1 Message Date
Teknium
5e1f793430
chore(web): remove web_crawl tool + provider crawl plumbing (#33824)
The web_crawl_tool() function was an orphan — no model schema registered
it, no skill or CLI command called it, and the agent had no way to invoke
it. PR #32608 proposed wiring it up as a model-callable tool; we've
decided not to expose crawl as a separate capability since web_search +
web_extract cover the use cases we want models to have.

Removed:
- tools/web_tools.py: web_crawl_tool() (~230 LOC)
- plugins/web/firecrawl/provider.py: supports_crawl() + crawl()
- plugins/web/tavily/provider.py: supports_crawl() + crawl()
- plugins/web/xai/provider.py: supports_crawl() override
- agent/web_search_provider.py: supports_crawl() + crawl() ABC methods
- agent/web_search_registry.py: get_active_crawl_provider() +
  the 'crawl' branch in _resolve()
- agent/display.py: web_crawl tool-progress rendering
- hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools
- tools/website_policy.py: stale comment reference
- Tests: removed TestWebCrawlTavily class, the two website-policy
  web_crawl tests, the searxng/ddgs/brave-free crawl-error tests,
  the integration test_web_crawl method, and the
  test_unconfigured_crawl_emits_top_level_error test. Trimmed the
  capability-flag parametrize list and the WebSearchProvider ABC
  conformance tests.
- Docs: trimmed the Crawl column from capability tables in both EN
  and zh-Hans, updated the developer-guide ABC table.

Net: 25 files, +115/-1067.

Closes #33762 (the schema-text bug only existed if #32608 landed).
Supersedes #32608.
2026-05-28 04:52:42 -07:00
Jaaneek
a0c031299b feat(web): add xAI Web Search provider plugin
Adds a new bundled web search provider plugin backed by xAI's agentic
Web Search tool (server-side `web_search` on the Responses API). Slots
in alongside the existing Firecrawl / Tavily / Exa / Brave / SearXNG /
DDGS providers; opt in via `web.backend: xai` (or auto-selected by the
registry's single-provider shortcut when it's the only available web
provider, matching every other backend's behavior).

Reuses the existing xAI HTTP credential plumbing (`tools/xai_http.py`)
so it works with both `hermes auth login xai-oauth` (SuperGrok OAuth)
and `XAI_API_KEY` — no new credential paths, no new env vars, no new
setup-wizard prompts. The existing `xai_grok` post_setup hook handles
credential collection.

Reference: https://docs.x.ai/developers/tools/web-search

Provider behavior
-----------------
- Sends a structured prompt to Grok with `tools=[{"type": "web_search"}]`
  enabled and `include=["no_inline_citations"]`, then parses results
  from a `{"results": [...]}` JSON block (primary), falling back to
  `url_citation` annotations (secondary) and the top-level `citations`
  list (last-ditch). Annotation fallback falls through to citations
  when no rows are extractable, so future annotation types xAI may
  add don't silently mask real data.
- HTTP 200 + `{"error": {...}}` envelopes (model-overload, refusal)
  are surfaced as failures rather than masked as success-with-empty-
  results.
- HTTP 401 on the OAuth path triggers a single `force_refresh=True`
  retry — closes two gaps the resolver's proactive JWT-exp shortcut
  doesn't cover: opaque (non-JWT) access tokens and mid-window
  revocation. Env-var (`XAI_API_KEY`) credentials never retry; they
  can't be refreshed and an immediate retry would just burn quota.
- `is_available()` is a cheap probe (env var OR auth.json read), never
  invokes the OAuth resolver — required by the ABC contract because
  it runs on every `hermes tools` repaint and at tool-registration time.
- Class docstring documents the LLM-in-a-trench-coat trust model so
  callers piping untrusted input into `web_search` know returned URLs
  are model-generated and should be validated before fetching.

Config (`config.yaml`):

    web:
      backend: xai
      xai:
        model: grok-4.3         # optional, defaults to grok-4.3
        allowed_domains:        # optional, max 5 — mutex with excluded_domains
          - arxiv.org
        excluded_domains:       # optional, max 5
          - example-spam.com
        timeout: 90             # optional, seconds

Files
-----
- plugins/web/xai/plugin.yaml          (new) plugin manifest
- plugins/web/xai/__init__.py          (new) register(ctx) hook
- plugins/web/xai/provider.py          (new) XAIWebSearchProvider impl
- tools/xai_http.py                    (+47) has_xai_credentials()
                                            cheap-probe helper +
                                            keyword-only force_refresh
                                            arg on resolve_xai_http_
                                            credentials() (backwards
                                            compatible; all 9 other
                                            call sites unaffected)
- tools/web_tools.py                   (+11) "xai" added to configured-
                                            backend set + branch in
                                            _is_backend_available()
- tests/tools/test_web_providers_xai.py (new, 39 tests) covers
                                        identity, cheap-probe semantics,
                                        JSON / annotation / citations
                                        parse paths, request payload
                                        shape, error envelopes, OAuth
                                        force-refresh-on-401 retry,
                                        env-var-no-retry guard, 500-not-
                                        retried guard, refresh-returns-
                                        same-token guard, OAuth runtime
                                        resolution, and backend wiring.

Tests
-----
- 39 xai-suite passes
- 79 sibling web-provider tests (brave-free, ddgs, searxng, base) pass
- 119 cross-suite tests for other xai_http callers (transcription,
  x_search, tts) pass — verifies the new keyword-only arg is BC
- scripts/check-windows-footguns.py: clean on all 5 modified files

No edits to run_agent.py, cli.py, gateway/, toolsets, config schema,
plugin core, or auth core.
2026-05-19 19:27:34 -07:00