docs: document streaming timeout auto-detection for local LLMs (#6990)

Add streaming timeout documentation to three pages: - guides/local-llm-on-mac.md: New 'Timeouts' section with table of all three timeouts, their defaults, local auto-adjustments, and env var overrides - reference/faq.md: Tip box in the local models FAQ section - user-guide/configuration.md: 'Streaming Timeouts' subsection under the agent config section Follow-up to #6967.
2026-04-25 00:51:20 +00:00 · 2026-04-09 23:28:25 -07:00 · 2026-04-09 23:28:25 -07:00 · d5023d36d8
commit d5023d36d8
parent 0602ff8f58
3 changed files with 39 additions and 0 deletions
--- a/website/docs/reference/faq.md
+++ b/website/docs/reference/faq.md
@ -84,6 +84,10 @@ This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See
 If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 16384`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured.
 :::

+:::tip Timeouts with local models
+Hermes auto-detects local endpoints and relaxes streaming timeouts (read timeout raised from 120s to 1800s, stale stream detection disabled). If you still hit timeouts on very large contexts, set `HERMES_STREAM_READ_TIMEOUT=1800` in your `.env`. See the [Local LLM guide](../guides/local-llm-on-mac.md#timeouts) for details.
+:::
+
 ### How much does it cost?

 Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run.