docs: document streaming timeout auto-detection for local LLMs (#6990)

Add streaming timeout documentation to three pages:

- guides/local-llm-on-mac.md: New 'Timeouts' section with table of all
  three timeouts, their defaults, local auto-adjustments, and env var
  overrides
- reference/faq.md: Tip box in the local models FAQ section
- user-guide/configuration.md: 'Streaming Timeouts' subsection under
  the agent config section

Follow-up to #6967.
This commit is contained in:
Teknium 2026-04-09 23:28:25 -07:00 committed by GitHub
parent 0602ff8f58
commit d5023d36d8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 39 additions and 0 deletions

View file

@ -217,3 +217,24 @@ hermes model
```
Select **Custom endpoint** and follow the prompts. It will ask for the base URL and model name — use the values from whichever backend you set up above.
---
## Timeouts
Hermes automatically detects local endpoints (localhost, LAN IPs) and relaxes its streaming timeouts. No configuration needed for most setups.
If you still hit timeout errors (e.g. very large contexts on slow hardware), you can override the streaming read timeout:
```bash
# In your .env — raise from the 120s default to 30 minutes
HERMES_STREAM_READ_TIMEOUT=1800
```
| Timeout | Default | Local auto-adjustment | Env var override |
|---------|---------|----------------------|------------------|
| Stream read (socket-level) | 120s | Raised to 1800s | `HERMES_STREAM_READ_TIMEOUT` |
| Stale stream detection | 180s | Disabled entirely | `HERMES_STREAM_STALE_TIMEOUT` |
| API call (non-streaming) | 1800s | No change needed | `HERMES_API_TIMEOUT` |
The stream read timeout is the one most likely to cause issues — it's the socket-level deadline for receiving the next chunk of data. During prefill on large contexts, local models may produce no output for minutes while processing the prompt. The auto-detection handles this transparently.