diff --git a/website/docs/getting-started/quickstart.md b/website/docs/getting-started/quickstart.md index 675d2711e..3a5479a28 100644 --- a/website/docs/getting-started/quickstart.md +++ b/website/docs/getting-started/quickstart.md @@ -54,10 +54,10 @@ hermes setup # Or configure everything at once | **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` | | **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` | | **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` | -| **Custom Endpoint** | VLLM, SGLang, or any OpenAI-compatible API | Set base URL + API key | +| **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key | :::tip -You can switch providers at any time with `hermes model` — no code changes, no lock-in. +You can switch providers at any time with `hermes model` — no code changes, no lock-in. When configuring a custom endpoint, Hermes will prompt for the context window size and auto-detect it when possible. See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for details. ::: ## 3. Start Chatting diff --git a/website/docs/reference/faq.md b/website/docs/reference/faq.md index eaa92a064..97051fcee 100644 --- a/website/docs/reference/faq.md +++ b/website/docs/reference/faq.md @@ -42,18 +42,25 @@ API calls go **only to the LLM provider you configure** (e.g., OpenRouter, your ### Can I use it offline / with local models? -Yes. Point Hermes at any local OpenAI-compatible server: +Yes. Run `hermes model`, select **Custom endpoint**, and enter your server's URL: ```bash -hermes config set OPENAI_BASE_URL http://localhost:11434/v1 # Ollama -hermes config set OPENAI_API_KEY ollama # Any non-empty value -hermes config set HERMES_MODEL llama3.1 +hermes model +# Select: Custom endpoint (enter URL manually) +# API base URL: http://localhost:11434/v1 +# API key: ollama +# Model name: qwen3.5:27b +# Context length: 32768 ← set this to match your server's actual context window ``` -You can also save the endpoint interactively with `hermes model`. Hermes persists that custom endpoint in `config.yaml`, and auxiliary tasks configured with provider `main` follow the same saved endpoint. +Hermes persists the endpoint in `config.yaml` and prompts for the context window size so compression triggers at the right time. If you leave context length blank, Hermes auto-detects it from the server's `/models` endpoint or [models.dev](https://models.dev). This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details. +:::tip Ollama users +If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 16384`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured. +::: + ### How much does it cost? Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run. @@ -200,7 +207,7 @@ hermes chat --model openrouter/meta-llama/llama-3.1-70b-instruct #### Context length exceeded -**Cause:** The conversation has grown too long for the model's context window. +**Cause:** The conversation has grown too long for the model's context window, or Hermes detected the wrong context length for your model. **Solution:** ```bash @@ -214,6 +221,35 @@ hermes chat hermes chat --model openrouter/google/gemini-2.0-flash-001 ``` +If this happens on the first long conversation, Hermes may have the wrong context length for your model. Check what it detected: + +```bash +# Look at the status bar — it shows the detected context length +/context +``` + +To fix context detection, set it explicitly: + +```yaml +# In ~/.hermes/config.yaml +model: + default: your-model-name + context_length: 131072 # your model's actual context window +``` + +Or for custom endpoints, add it per-model: + +```yaml +custom_providers: + - name: "My Server" + base_url: "http://localhost:11434/v1" + models: + qwen3.5:27b: + context_length: 32768 +``` + +See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for how auto-detection works and all override options. + --- ### Terminal Issues