mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
docs(web-search): explain auxiliary-model summarization for web_extract (#23211)
web_extract runs returned page content through the web_extract auxiliary model when pages exceed 5 000 chars (single-pass up to 500k, chunked up to 2M, refused above that). The user-guide page didn't mention this — users were surprised that long-page extracts produced summaries instead of raw markdown, and that those summaries cost main-model tokens by default. Adds: - size-driven behavior table (under 5k / 5k–500k / 500k–2M / over 2M) - which auxiliary task does the work (auxiliary.web_extract) - how to route summaries to a cheap model regardless of main - escape hatch: browser_navigate when you need raw content - troubleshooting entry for summarization timeouts
This commit is contained in:
parent
3d4297a59a
commit
9cdcf31cae
1 changed files with 46 additions and 0 deletions
|
|
@ -32,6 +32,44 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription,
|
|||
|
||||
---
|
||||
|
||||
## How `web_extract` handles long pages
|
||||
|
||||
Backends return raw page markdown, which can be huge (forum threads, docs sites, news articles with embedded comments). To keep your context window usable and your costs down, `web_extract` runs returned content through the **`web_extract` auxiliary model** before handing it to the agent. Behavior is purely size-driven:
|
||||
|
||||
| Page size (characters) | What happens |
|
||||
|------------------------|--------------|
|
||||
| Under 5 000 | Returned as-is — no LLM call, full markdown reaches the agent |
|
||||
| 5 000 – 500 000 | Single-pass summary via the `web_extract` auxiliary model, capped at ~5 000 chars of output |
|
||||
| 500 000 – 2 000 000 | Chunked: split into 100 k-char chunks, summarize each in parallel, then synthesize a final summary (~5 000 chars) |
|
||||
| Over 2 000 000 | Refused with a hint to use `web_crawl` with focused extraction instructions or a more specific source |
|
||||
|
||||
The summary keeps quotes, code blocks, and key facts in their original formatting — it's a content compressor, not a paraphraser. If summarization fails or times out, Hermes falls back to the first ~5 000 chars of raw content rather than a useless error.
|
||||
|
||||
### Which model does the summarizing?
|
||||
|
||||
The `web_extract` auxiliary task. By default (`auxiliary.web_extract.provider: "auto"`), this is your **main chat model** — same provider, same model as `hermes model`. That's fine for most setups, but on expensive reasoning models (Opus, MiniMax M2.7, etc.) every long-page extract adds meaningful cost.
|
||||
|
||||
To route extraction summaries to a cheap, fast model regardless of your main:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/config.yaml
|
||||
auxiliary:
|
||||
web_extract:
|
||||
provider: openrouter
|
||||
model: google/gemini-3-flash-preview
|
||||
timeout: 360 # seconds; raise if you hit summarization timeouts
|
||||
```
|
||||
|
||||
Or pick interactively: `hermes model` → **Configure auxiliary models** → `web_extract`.
|
||||
|
||||
See [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models) for the full reference and per-task override patterns.
|
||||
|
||||
### When summarization gets in the way
|
||||
|
||||
If you specifically need raw, unsummarized page content — for example, you're scraping a structured page where the LLM summary would drop important fields — use `browser_navigate` + `browser_snapshot` instead. The browser tool returns the live accessibility tree without auxiliary-model rewriting (subject to its own 8 000-char snapshot cap on huge pages).
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
### Quick setup via `hermes tools`
|
||||
|
|
@ -329,6 +367,14 @@ Some public instances disable certain search engines or categories. Try:
|
|||
|
||||
Switch to a self-hosted instance (see [Option A](#option-a--self-host-with-docker-recommended) above). With Docker, your own instance has no rate limits.
|
||||
|
||||
### `web_extract` returns truncated content with a "summarization timed out" note
|
||||
|
||||
The auxiliary model didn't finish summarizing within the configured timeout. Either:
|
||||
|
||||
- Raise `auxiliary.web_extract.timeout` in `config.yaml` (default 360s on fresh installs, 30s if the key is missing)
|
||||
- Switch the `web_extract` auxiliary task to a faster model (e.g. `google/gemini-3-flash-preview`) — see [How `web_extract` handles long pages](#how-web_extract-handles-long-pages)
|
||||
- For pages where summarization is the wrong tool, use `browser_navigate` instead
|
||||
|
||||
---
|
||||
|
||||
## Optional skill: `searxng-search`
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue