mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
feat(compression): add summary_base_url + move compression config to YAML-only
- Add summary_base_url config option to compression block for custom OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama) - Remove compression env var bridges from cli.py and gateway/run.py (CONTEXT_COMPRESSION_* env vars no longer set from config) - Switch run_agent.py to read compression config directly from config.yaml instead of env vars - Fix backwards-compat block in _resolve_task_provider_model to also fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG sets this, which was silently preventing the compression section's summary_* keys from being read) - Add test for summary_base_url config-to-client flow - Update docs to show compression as config.yaml-only Closes #1591 Based on PR #1702 by @uzaylisak
This commit is contained in:
parent
867a96c051
commit
d1d17f4f0a
11 changed files with 194 additions and 98 deletions
|
|
@ -681,13 +681,54 @@ node_modules/
|
|||
|
||||
## Context Compression
|
||||
|
||||
Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint.
|
||||
|
||||
All compression settings live in `config.yaml` (no environment variables).
|
||||
|
||||
### Full reference
|
||||
|
||||
```yaml
|
||||
compression:
|
||||
enabled: true # Toggle compression on/off
|
||||
threshold: 0.50 # Compress at this % of context limit
|
||||
summary_model: "google/gemini-3-flash-preview" # Model for summarization
|
||||
summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
|
||||
summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
|
||||
```
|
||||
|
||||
### Common setups
|
||||
|
||||
**Default (auto-detect) — no configuration needed:**
|
||||
```yaml
|
||||
compression:
|
||||
enabled: true
|
||||
threshold: 0.50 # Compress at 50% of context limit by default
|
||||
summary_model: "google/gemini-3-flash-preview" # Model for summarization
|
||||
# summary_provider: "auto" # "auto", "openrouter", "nous", "main"
|
||||
threshold: 0.50
|
||||
```
|
||||
Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Flash.
|
||||
|
||||
**Force a specific provider** (OAuth or API-key based):
|
||||
```yaml
|
||||
compression:
|
||||
summary_provider: nous
|
||||
summary_model: gemini-3-flash
|
||||
```
|
||||
Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc.
|
||||
|
||||
**Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.):
|
||||
```yaml
|
||||
compression:
|
||||
summary_model: glm-4.7
|
||||
summary_base_url: https://api.z.ai/api/coding/paas/v4
|
||||
```
|
||||
Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth.
|
||||
|
||||
### How the three knobs interact
|
||||
|
||||
| `summary_provider` | `summary_base_url` | Result |
|
||||
|---------------------|---------------------|--------|
|
||||
| `auto` (default) | not set | Auto-detect best available provider |
|
||||
| `nous` / `openrouter` / etc. | not set | Force that provider, use its auth |
|
||||
| any | set | Use the custom endpoint directly (provider ignored) |
|
||||
|
||||
The `summary_model` must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.
|
||||
|
||||
|
|
@ -711,17 +752,31 @@ Budget pressure is enabled by default. The agent sees warnings naturally as part
|
|||
|
||||
## Auxiliary Models
|
||||
|
||||
Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via OpenRouter or Nous Portal — you don't need to configure anything.
|
||||
Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via auto-detection — you don't need to configure anything.
|
||||
|
||||
To use a different model, add an `auxiliary` section to `~/.hermes/config.yaml`:
|
||||
### The universal config pattern
|
||||
|
||||
Every model slot in Hermes — auxiliary tasks, compression, fallback — uses the same three knobs:
|
||||
|
||||
| Key | What it does | Default |
|
||||
|-----|-------------|---------|
|
||||
| `provider` | Which provider to use for auth and routing | `"auto"` |
|
||||
| `model` | Which model to request | provider's default |
|
||||
| `base_url` | Custom OpenAI-compatible endpoint (overrides provider) | not set |
|
||||
|
||||
When `base_url` is set, Hermes ignores the provider and calls that endpoint directly (using `api_key` or `OPENAI_API_KEY` for auth). When only `provider` is set, Hermes uses that provider's built-in auth and base URL.
|
||||
|
||||
Available providers: `auto`, `openrouter`, `nous`, `codex`, `anthropic`, `main`, `zai`, `kimi-coding`, `minimax`, and any provider registered in the [provider registry](/docs/reference/environment-variables).
|
||||
|
||||
### Full auxiliary config reference
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
# Image analysis (vision_analyze tool + browser screenshots)
|
||||
vision:
|
||||
provider: "auto" # "auto", "openrouter", "nous", "main"
|
||||
provider: "auto" # "auto", "openrouter", "nous", "codex", "main", etc.
|
||||
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
|
||||
base_url: "" # direct OpenAI-compatible endpoint (takes precedence over provider)
|
||||
base_url: "" # Custom OpenAI-compatible endpoint (overrides provider)
|
||||
api_key: "" # API key for base_url (falls back to OPENAI_API_KEY)
|
||||
|
||||
# Web page summarization + browser page text extraction
|
||||
|
|
@ -730,8 +785,19 @@ auxiliary:
|
|||
model: "" # e.g. "google/gemini-2.5-flash"
|
||||
base_url: ""
|
||||
api_key: ""
|
||||
|
||||
# Dangerous command approval classifier
|
||||
approval:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
base_url: ""
|
||||
api_key: ""
|
||||
```
|
||||
|
||||
:::info
|
||||
Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](#fallback-model) above. All three follow the same provider/model/base_url pattern.
|
||||
:::
|
||||
|
||||
### Changing the Vision Model
|
||||
|
||||
To use GPT-4o instead of Gemini Flash for image analysis:
|
||||
|
|
@ -817,18 +883,22 @@ If you use Codex OAuth as your main model provider, vision works automatically
|
|||
**Vision requires a multimodal model.** If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
|
||||
:::
|
||||
|
||||
### Environment Variables
|
||||
### Environment Variables (legacy)
|
||||
|
||||
You can also configure auxiliary models via environment variables instead of `config.yaml`:
|
||||
Auxiliary models can also be configured via environment variables. However, `config.yaml` is the preferred method — it's easier to manage and supports all options including `base_url` and `api_key`.
|
||||
|
||||
| Setting | Environment Variable |
|
||||
|---------|---------------------|
|
||||
| Vision provider | `AUXILIARY_VISION_PROVIDER` |
|
||||
| Vision model | `AUXILIARY_VISION_MODEL` |
|
||||
| Vision endpoint | `AUXILIARY_VISION_BASE_URL` |
|
||||
| Vision API key | `AUXILIARY_VISION_API_KEY` |
|
||||
| Web extract provider | `AUXILIARY_WEB_EXTRACT_PROVIDER` |
|
||||
| Web extract model | `AUXILIARY_WEB_EXTRACT_MODEL` |
|
||||
| Compression provider | `CONTEXT_COMPRESSION_PROVIDER` |
|
||||
| Compression model | `CONTEXT_COMPRESSION_MODEL` |
|
||||
| Web extract endpoint | `AUXILIARY_WEB_EXTRACT_BASE_URL` |
|
||||
| Web extract API key | `AUXILIARY_WEB_EXTRACT_API_KEY` |
|
||||
|
||||
Compression and fallback model settings are config.yaml-only.
|
||||
|
||||
:::tip
|
||||
Run `hermes config` to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue