docs: document streaming timeout auto-detection for local LLMs (#6990)

Add streaming timeout documentation to three pages:

- guides/local-llm-on-mac.md: New 'Timeouts' section with table of all
  three timeouts, their defaults, local auto-adjustments, and env var
  overrides
- reference/faq.md: Tip box in the local models FAQ section
- user-guide/configuration.md: 'Streaming Timeouts' subsection under
  the agent config section

Follow-up to #6967.
This commit is contained in:
Teknium 2026-04-09 23:28:25 -07:00 committed by GitHub
parent 0602ff8f58
commit d5023d36d8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 39 additions and 0 deletions

View file

@ -84,6 +84,10 @@ This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See
If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 16384`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured.
:::
:::tip Timeouts with local models
Hermes auto-detects local endpoints and relaxes streaming timeouts (read timeout raised from 120s to 1800s, stale stream detection disabled). If you still hit timeouts on very large contexts, set `HERMES_STREAM_READ_TIMEOUT=1800` in your `.env`. See the [Local LLM guide](../guides/local-llm-on-mac.md#timeouts) for details.
:::
### How much does it cost?
Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run.