docs(web-search): explain auxiliary-model summarization for web_extract (#23211)

web_extract runs returned page content through the web_extract auxiliary
model when pages exceed 5 000 chars (single-pass up to 500k, chunked up
to 2M, refused above that). The user-guide page didn't mention this —
users were surprised that long-page extracts produced summaries instead
of raw markdown, and that those summaries cost main-model tokens by
default.

Adds:
- size-driven behavior table (under 5k / 5k–500k / 500k–2M / over 2M)
- which auxiliary task does the work (auxiliary.web_extract)
- how to route summaries to a cheap model regardless of main
- escape hatch: browser_navigate when you need raw content
- troubleshooting entry for summarization timeouts
This commit is contained in:
Teknium 2026-05-10 06:40:23 -07:00 committed by GitHub
parent 3d4297a59a
commit 9cdcf31cae
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -32,6 +32,44 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription,
---
## How `web_extract` handles long pages
Backends return raw page markdown, which can be huge (forum threads, docs sites, news articles with embedded comments). To keep your context window usable and your costs down, `web_extract` runs returned content through the **`web_extract` auxiliary model** before handing it to the agent. Behavior is purely size-driven:
| Page size (characters) | What happens |
|------------------------|--------------|
| Under 5 000 | Returned as-is — no LLM call, full markdown reaches the agent |
| 5 000 500 000 | Single-pass summary via the `web_extract` auxiliary model, capped at ~5 000 chars of output |
| 500 000 2 000 000 | Chunked: split into 100 k-char chunks, summarize each in parallel, then synthesize a final summary (~5 000 chars) |
| Over 2 000 000 | Refused with a hint to use `web_crawl` with focused extraction instructions or a more specific source |
The summary keeps quotes, code blocks, and key facts in their original formatting — it's a content compressor, not a paraphraser. If summarization fails or times out, Hermes falls back to the first ~5 000 chars of raw content rather than a useless error.
### Which model does the summarizing?
The `web_extract` auxiliary task. By default (`auxiliary.web_extract.provider: "auto"`), this is your **main chat model** — same provider, same model as `hermes model`. That's fine for most setups, but on expensive reasoning models (Opus, MiniMax M2.7, etc.) every long-page extract adds meaningful cost.
To route extraction summaries to a cheap, fast model regardless of your main:
```yaml
# ~/.hermes/config.yaml
auxiliary:
web_extract:
provider: openrouter
model: google/gemini-3-flash-preview
timeout: 360 # seconds; raise if you hit summarization timeouts
```
Or pick interactively: `hermes model`**Configure auxiliary models**`web_extract`.
See [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models) for the full reference and per-task override patterns.
### When summarization gets in the way
If you specifically need raw, unsummarized page content — for example, you're scraping a structured page where the LLM summary would drop important fields — use `browser_navigate` + `browser_snapshot` instead. The browser tool returns the live accessibility tree without auxiliary-model rewriting (subject to its own 8 000-char snapshot cap on huge pages).
---
## Setup
### Quick setup via `hermes tools`
@ -329,6 +367,14 @@ Some public instances disable certain search engines or categories. Try:
Switch to a self-hosted instance (see [Option A](#option-a--self-host-with-docker-recommended) above). With Docker, your own instance has no rate limits.
### `web_extract` returns truncated content with a "summarization timed out" note
The auxiliary model didn't finish summarizing within the configured timeout. Either:
- Raise `auxiliary.web_extract.timeout` in `config.yaml` (default 360s on fresh installs, 30s if the key is missing)
- Switch the `web_extract` auxiliary task to a faster model (e.g. `google/gemini-3-flash-preview`) — see [How `web_extract` handles long pages](#how-web_extract-handles-long-pages)
- For pages where summarization is the wrong tool, use `browser_navigate` instead
---
## Optional skill: `searxng-search`