mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
docs: backfill coverage for recently-merged features (#11942)
Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (#11363) — optional-skills-catalog entry - /gquota (#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
This commit is contained in:
parent
45acd9beb5
commit
11a89cc032
11 changed files with 132 additions and 7 deletions
|
|
@ -289,12 +289,40 @@ Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_
|
|||
When using the Z.AI / GLM provider, Hermes automatically probes multiple endpoints (global, China, coding variants) to find one that accepts your API key. You don't need to set `GLM_BASE_URL` manually — the working endpoint is detected and cached automatically.
|
||||
:::
|
||||
|
||||
### xAI (Grok) Prompt Caching
|
||||
### xAI (Grok) — Responses API + Prompt Caching
|
||||
|
||||
xAI is wired through the Responses API (`codex_responses` transport) for automatic reasoning support on Grok 4 models — no `reasoning_effort` parameter needed, the server reasons by default. Set `XAI_API_KEY` in `~/.hermes/.env` and pick xAI in `hermes model`, or drop `grok` as a shortcut into `/model grok-4-1-fast-reasoning`.
|
||||
|
||||
When using xAI as a provider (any base URL containing `x.ai`), Hermes automatically enables prompt caching by sending the `x-grok-conv-id` header with every API request. This routes requests to the same server within a conversation session, allowing xAI's infrastructure to reuse cached system prompts and conversation history.
|
||||
|
||||
No configuration is needed — caching activates automatically when an xAI endpoint is detected and a session ID is available. This reduces latency and cost for multi-turn conversations.
|
||||
|
||||
xAI also ships a dedicated TTS endpoint (`/v1/tts`). Select **xAI TTS** in `hermes tools` → Voice & TTS, or see the [Voice & TTS](../user-guide/features/tts.md#text-to-speech) page for config.
|
||||
|
||||
### Ollama Cloud — Managed Ollama Models, OAuth + API Key
|
||||
|
||||
[Ollama Cloud](https://ollama.com/cloud) hosts the same open-weight catalog as local Ollama but without the GPU requirement. Pick it in `hermes model` as **Ollama Cloud**, paste your API key from [ollama.com/settings/keys](https://ollama.com/settings/keys), and Hermes auto-discovers the available models.
|
||||
|
||||
```bash
|
||||
hermes model
|
||||
# → pick "Ollama Cloud"
|
||||
# → paste your OLLAMA_API_KEY
|
||||
# → select from discovered models (gpt-oss:120b, glm-4.6:cloud, qwen3-coder:480b-cloud, etc.)
|
||||
```
|
||||
|
||||
Or `config.yaml` directly:
|
||||
```yaml
|
||||
model:
|
||||
provider: "ollama-cloud"
|
||||
default: "gpt-oss:120b"
|
||||
```
|
||||
|
||||
The model catalog is fetched dynamically from `ollama.com/v1/models` and cached for one hour. `model:tag` notation (e.g. `qwen3-coder:480b-cloud`) is preserved through normalization — don't use dashes.
|
||||
|
||||
:::tip Ollama Cloud vs local Ollama
|
||||
Both speak the same OpenAI-compatible API. Cloud is a first-class provider (`--provider ollama-cloud`, `OLLAMA_API_KEY`); local Ollama is reached via the Custom Endpoint flow (base URL `http://localhost:11434/v1`, no key). Use cloud for large models you can't run locally; use local for privacy or offline work.
|
||||
:::
|
||||
|
||||
### NVIDIA NIM
|
||||
|
||||
Nemotron and other open source models via [build.nvidia.com](https://build.nvidia.com) (free API key) or a local NIM endpoint.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue