mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
docs: add context length detection references to FAQ and quickstart (#2179)
- quickstart.md: mention context length prompt for custom endpoints, link to configuration docs, add Ollama to provider table - faq.md: rewrite local models section with hermes model flow and context length prompt example, add Ollama num_ctx tip, expand context-length-exceeded troubleshooting with detection override options and config.yaml examples Co-authored-by: Test <test@test.com>
This commit is contained in:
parent
c52353cf8a
commit
80e578d3e3
2 changed files with 44 additions and 8 deletions
|
|
@ -54,10 +54,10 @@ hermes setup # Or configure everything at once
|
||||||
| **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` |
|
| **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` |
|
||||||
| **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` |
|
| **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` |
|
||||||
| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
|
| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
|
||||||
| **Custom Endpoint** | VLLM, SGLang, or any OpenAI-compatible API | Set base URL + API key |
|
| **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key |
|
||||||
|
|
||||||
:::tip
|
:::tip
|
||||||
You can switch providers at any time with `hermes model` — no code changes, no lock-in.
|
You can switch providers at any time with `hermes model` — no code changes, no lock-in. When configuring a custom endpoint, Hermes will prompt for the context window size and auto-detect it when possible. See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for details.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
## 3. Start Chatting
|
## 3. Start Chatting
|
||||||
|
|
|
||||||
|
|
@ -42,18 +42,25 @@ API calls go **only to the LLM provider you configure** (e.g., OpenRouter, your
|
||||||
|
|
||||||
### Can I use it offline / with local models?
|
### Can I use it offline / with local models?
|
||||||
|
|
||||||
Yes. Point Hermes at any local OpenAI-compatible server:
|
Yes. Run `hermes model`, select **Custom endpoint**, and enter your server's URL:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
hermes config set OPENAI_BASE_URL http://localhost:11434/v1 # Ollama
|
hermes model
|
||||||
hermes config set OPENAI_API_KEY ollama # Any non-empty value
|
# Select: Custom endpoint (enter URL manually)
|
||||||
hermes config set HERMES_MODEL llama3.1
|
# API base URL: http://localhost:11434/v1
|
||||||
|
# API key: ollama
|
||||||
|
# Model name: qwen3.5:27b
|
||||||
|
# Context length: 32768 ← set this to match your server's actual context window
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also save the endpoint interactively with `hermes model`. Hermes persists that custom endpoint in `config.yaml`, and auxiliary tasks configured with provider `main` follow the same saved endpoint.
|
Hermes persists the endpoint in `config.yaml` and prompts for the context window size so compression triggers at the right time. If you leave context length blank, Hermes auto-detects it from the server's `/models` endpoint or [models.dev](https://models.dev).
|
||||||
|
|
||||||
This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
|
This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
|
||||||
|
|
||||||
|
:::tip Ollama users
|
||||||
|
If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 16384`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured.
|
||||||
|
:::
|
||||||
|
|
||||||
### How much does it cost?
|
### How much does it cost?
|
||||||
|
|
||||||
Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run.
|
Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run.
|
||||||
|
|
@ -200,7 +207,7 @@ hermes chat --model openrouter/meta-llama/llama-3.1-70b-instruct
|
||||||
|
|
||||||
#### Context length exceeded
|
#### Context length exceeded
|
||||||
|
|
||||||
**Cause:** The conversation has grown too long for the model's context window.
|
**Cause:** The conversation has grown too long for the model's context window, or Hermes detected the wrong context length for your model.
|
||||||
|
|
||||||
**Solution:**
|
**Solution:**
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -214,6 +221,35 @@ hermes chat
|
||||||
hermes chat --model openrouter/google/gemini-2.0-flash-001
|
hermes chat --model openrouter/google/gemini-2.0-flash-001
|
||||||
```
|
```
|
||||||
|
|
||||||
|
If this happens on the first long conversation, Hermes may have the wrong context length for your model. Check what it detected:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Look at the status bar — it shows the detected context length
|
||||||
|
/context
|
||||||
|
```
|
||||||
|
|
||||||
|
To fix context detection, set it explicitly:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# In ~/.hermes/config.yaml
|
||||||
|
model:
|
||||||
|
default: your-model-name
|
||||||
|
context_length: 131072 # your model's actual context window
|
||||||
|
```
|
||||||
|
|
||||||
|
Or for custom endpoints, add it per-model:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
custom_providers:
|
||||||
|
- name: "My Server"
|
||||||
|
base_url: "http://localhost:11434/v1"
|
||||||
|
models:
|
||||||
|
qwen3.5:27b:
|
||||||
|
context_length: 32768
|
||||||
|
```
|
||||||
|
|
||||||
|
See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for how auto-detection works and all override options.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Terminal Issues
|
### Terminal Issues
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue