docs: document streaming timeout auto-detection for local LLMs (#6990)

Add streaming timeout documentation to three pages: - guides/local-llm-on-mac.md: New 'Timeouts' section with table of all three timeouts, their defaults, local auto-adjustments, and env var overrides - reference/faq.md: Tip box in the local models FAQ section - user-guide/configuration.md: 'Streaming Timeouts' subsection under the agent config section Follow-up to #6967.
2026-05-02 02:01:47 +00:00 · 2026-04-09 23:28:25 -07:00 · 2026-04-09 23:28:25 -07:00 · d5023d36d8
commit d5023d36d8
parent 0602ff8f58
3 changed files with 39 additions and 0 deletions
--- a/website/docs/guides/local-llm-on-mac.md
+++ b/website/docs/guides/local-llm-on-mac.md
@ -217,3 +217,24 @@ hermes model
 ```

 Select **Custom endpoint** and follow the prompts. It will ask for the base URL and model name — use the values from whichever backend you set up above.
+
+---
+
+## Timeouts
+
+Hermes automatically detects local endpoints (localhost, LAN IPs) and relaxes its streaming timeouts. No configuration needed for most setups.
+
+If you still hit timeout errors (e.g. very large contexts on slow hardware), you can override the streaming read timeout:
+
+```bash
+# In your .env — raise from the 120s default to 30 minutes
+HERMES_STREAM_READ_TIMEOUT=1800
+```
+
+| Timeout | Default | Local auto-adjustment | Env var override |
+|---------|---------|----------------------|------------------|
+| Stream read (socket-level) | 120s | Raised to 1800s | `HERMES_STREAM_READ_TIMEOUT` |
+| Stale stream detection | 180s | Disabled entirely | `HERMES_STREAM_STALE_TIMEOUT` |
+| API call (non-streaming) | 1800s | No change needed | `HERMES_API_TIMEOUT` |
+
+The stream read timeout is the one most likely to cause issues — it's the socket-level deadline for receiving the next chunk of data. During prefill on large contexts, local models may produce no output for minutes while processing the prompt. The auto-detection handles this transparently.