fix(context): align guidance with 64k minimum

This commit is contained in:
helix4u 2026-05-24 20:31:27 -06:00 committed by Teknium
parent 1d5deac346
commit 3b839f4369
6 changed files with 41 additions and 35 deletions

View file

@ -82,7 +82,7 @@ hermes model
# API base URL: http://localhost:11434/v1
# API key: ollama
# Model name: qwen3.5:27b
# Context length: 32768 ← set this to match your server's actual context window
# Context length: 64000 ← Hermes minimum; set this to match your server's actual context window
```
Or configure it directly in `config.yaml`:
@ -99,7 +99,7 @@ Hermes persists the endpoint, provider, and base URL in `config.yaml` so it surv
This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
:::tip Ollama users
If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 16384`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured.
If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 64000`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured.
:::
:::tip Timeouts with local models
@ -340,7 +340,7 @@ custom_providers:
base_url: "http://localhost:11434/v1"
models:
qwen3.5:27b:
context_length: 32768
context_length: 64000
```
See [Context Length Detection](../integrations/providers.md#context-length-detection) for how auto-detection works and all override options.