mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>
This commit is contained in:
parent
b7b585656b
commit
88643a1ba9
13 changed files with 662 additions and 246 deletions
|
|
@ -416,7 +416,19 @@ LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
|
|||
|
||||
### Context Length Detection
|
||||
|
||||
Hermes automatically detects your model's context length by querying the endpoint's `/v1/models` response. For most setups this works out of the box. If detection fails (the model name doesn't match, the endpoint doesn't expose `/v1/models`, etc.), Hermes falls back to a high default and probes downward on context-length errors.
|
||||
Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
|
||||
|
||||
1. **Config override** — `model.context_length` in config.yaml (highest priority)
|
||||
2. **Custom provider per-model** — `custom_providers[].models.<id>.context_length`
|
||||
3. **Persistent cache** — previously discovered values (survives restarts)
|
||||
4. **Endpoint `/models`** — queries your server's API (local/custom endpoints)
|
||||
5. **Anthropic `/v1/models`** — queries Anthropic's API for `max_input_tokens` (API-key users only)
|
||||
6. **OpenRouter API** — live model metadata from OpenRouter
|
||||
7. **Nous Portal** — suffix-matches Nous model IDs against OpenRouter metadata
|
||||
8. **[models.dev](https://models.dev)** — community-maintained registry with provider-specific context lengths for 3800+ models across 100+ providers
|
||||
9. **Fallback defaults** — broad model family patterns (128K default)
|
||||
|
||||
For most setups this works out of the box. The system is provider-aware — the same model can have different context limits depending on who serves it (e.g., `claude-opus-4.6` is 1M on Anthropic direct but 128K on GitHub Copilot).
|
||||
|
||||
To set the context length explicitly, add `context_length` to your model config:
|
||||
|
||||
|
|
@ -427,10 +439,23 @@ model:
|
|||
context_length: 131072 # tokens
|
||||
```
|
||||
|
||||
This takes highest priority — it overrides auto-detection, cached values, and hardcoded defaults.
|
||||
For custom endpoints, you can also set context length per model:
|
||||
|
||||
```yaml
|
||||
custom_providers:
|
||||
- name: "My Local LLM"
|
||||
base_url: "http://localhost:11434/v1"
|
||||
models:
|
||||
qwen3.5:27b:
|
||||
context_length: 32768
|
||||
deepseek-r1:70b:
|
||||
context_length: 65536
|
||||
```
|
||||
|
||||
`hermes model` will prompt for context length when configuring a custom endpoint. Leave it blank for auto-detection.
|
||||
|
||||
:::tip When to set this manually
|
||||
- Your model shows "2M context" in the status bar (detection failed)
|
||||
- You're using Ollama with a custom `num_ctx` that's lower than the model's maximum
|
||||
- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
|
||||
- You're running behind a proxy that doesn't expose `/v1/models`
|
||||
:::
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue