diff --git a/website/docs/developer-guide/provider-runtime.md b/website/docs/developer-guide/provider-runtime.md index 08f950509..77832fc92 100644 --- a/website/docs/developer-guide/provider-runtime.md +++ b/website/docs/developer-guide/provider-runtime.md @@ -27,10 +27,12 @@ If you are trying to add a new first-class inference provider, read [Adding Prov At a high level, provider resolution uses: 1. explicit CLI/runtime request -2. environment variables -3. `config.yaml` model/provider config +2. `config.yaml` model/provider config +3. environment variables 4. provider-specific defaults or auto resolution +That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in `hermes model`. + ## Providers Current provider families include: @@ -70,11 +72,17 @@ This resolver is the main reason Hermes can share auth/runtime logic between: Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when both `OPENROUTER_API_KEY` and `OPENAI_API_KEY` exist. +It also distinguishes between: + +- a real custom endpoint selected by the user +- the OpenRouter fallback path used when no custom endpoint is configured + That distinction is especially important for: - local model servers - non-OpenRouter OpenAI-compatible APIs - switching providers without re-running setup +- config-saved custom endpoints that should keep working even when `OPENAI_BASE_URL` is not exported in the current shell ## Native Anthropic path @@ -114,6 +122,12 @@ Auxiliary tasks such as: can use their own provider/model routing rather than the main conversational model. +When an auxiliary task is configured with provider `main`, Hermes resolves that through the same shared runtime path as normal chat. In practice that means: + +- env-driven custom endpoints still work +- custom endpoints saved via `hermes model` / `config.yaml` also work +- auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback + ## Fallback models Hermes also supports a configured fallback model/provider, allowing runtime failover in supported error paths. diff --git a/website/docs/reference/faq.md b/website/docs/reference/faq.md index 02a82dce7..4d7be7aa0 100644 --- a/website/docs/reference/faq.md +++ b/website/docs/reference/faq.md @@ -50,6 +50,8 @@ hermes config set OPENAI_API_KEY ollama # Any non-empty va hermes config set HERMES_MODEL llama3.1 ``` +You can also save the endpoint interactively with `hermes model`. Hermes persists that custom endpoint in `config.yaml`, and auxiliary tasks configured with provider `main` follow the same saved endpoint. + This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details. ### How much does it cost? diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index 71525764e..eb1a84036 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -69,7 +69,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro | **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) | | **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) | | **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) | -| **Custom Endpoint** | `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` | +| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` | :::info Codex Note The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under `~/.hermes/auth.json` and can import existing Codex CLI credentials from `~/.codex/auth.json` when present. No Codex CLI installation is required. @@ -163,10 +163,12 @@ hermes model ```bash # Add to ~/.hermes/.env OPENAI_BASE_URL=http://localhost:8000/v1 -OPENAI_API_KEY=your-key-or-dummy +OPENAI_API_KEY=*** LLM_MODEL=your-model-name ``` +`hermes model` and the manual `.env` approach end up in the same runtime path. If you save a custom endpoint through `hermes model`, Hermes persists the provider + base URL in `config.yaml` so later sessions keep using that endpoint even if `OPENAI_BASE_URL` is not exported in your current shell. + Everything below follows this same pattern — just change the URL, key, and model name. --- @@ -600,7 +602,7 @@ AUXILIARY_VISION_MODEL=openai/gpt-4o | `"openrouter"` | Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | `OPENROUTER_API_KEY` | | `"nous"` | Force Nous Portal | `hermes login` | | `"codex"` | Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | `hermes model` → Codex | -| `"main"` | Use your custom endpoint (`OPENAI_BASE_URL` + `OPENAI_API_KEY`). Works with OpenAI, local models, or any OpenAI-compatible API. | `OPENAI_BASE_URL` + `OPENAI_API_KEY` | +| `"main"` | Use your active custom/main endpoint. This can come from `OPENAI_BASE_URL` + `OPENAI_API_KEY` or from a custom endpoint saved via `hermes model` / `config.yaml`. Works with OpenAI, local models, or any OpenAI-compatible API. | Custom endpoint credentials + base URL | ### Common Setups @@ -636,10 +638,12 @@ auxiliary: ```yaml auxiliary: vision: - provider: "main" # uses your OPENAI_BASE_URL endpoint + provider: "main" # uses your active custom endpoint model: "my-local-model" ``` +`provider: "main"` follows the same custom endpoint Hermes uses for normal chat. That endpoint can be set directly with `OPENAI_BASE_URL`, or saved once through `hermes model` and persisted in `config.yaml`. + :::tip If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision. :::