mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding
Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K,
tripping the 64K minimum-context guard and preventing use of the model on
Ollama Cloud and Kimi Coding / Moonshot providers.
Three fixes in the context-length resolution chain:
1. Ollama Cloud native /api/show query: new _query_ollama_api_show()
queries the Ollama native API for authoritative GGUF model_info
context_length. For hosted Ollama, prefers model_info over num_ctx
since users can't set their own num_ctx on Cloud. Added at step 5e
in get_model_context_length(), before the models.dev fallback.
2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context()
now also tries appending :cloud and -cloud suffixes when the bare
model name doesn't match. models.dev stores 'kimi-k2.6:cloud' but
users and the live API use bare 'kimi-k2.6'.
3. Kimi-family 32K guard: after the OpenRouter metadata step, reject
exactly 32768 for Kimi-named models (kimi-*, moonshot*) and fall
through to hardcoded defaults ('kimi': 262144). OpenRouter reports
32768 for moonshotai/kimi-k2.6 but the model actually supports 262K.
Narrow filter — only 32768, only Kimi-family — becomes dead code
when OpenRouter updates its metadata.
---
This commit is contained in:
parent
271883447e
commit
91eef6255e
2 changed files with 145 additions and 7 deletions
|
|
@ -347,6 +347,28 @@ def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
|
|||
if ctx:
|
||||
return ctx
|
||||
|
||||
# Suffix-aware fallback: some providers (e.g. ollama-cloud) store
|
||||
# model IDs with :cloud / -cloud suffixes in models.dev while the
|
||||
# live API returns bare names. Without this, kimi-k2.6 misses the
|
||||
# kimi-k2.6:cloud entry and falls through to stale OpenRouter metadata
|
||||
# reporting 32768 — tripping the 64k minimum-context guard.
|
||||
# The suffix-stripping in fetch_ollama_cloud_models() handles the
|
||||
# model-picker UX; this handles the context-length lookup path.
|
||||
for suffix in (":cloud", "-cloud"):
|
||||
suffixed_key = model + suffix
|
||||
entry = models.get(suffixed_key)
|
||||
if entry:
|
||||
ctx = _extract_context(entry)
|
||||
if ctx:
|
||||
return ctx
|
||||
# Also try case-insensitive
|
||||
suffixed_lower = model_lower + suffix
|
||||
for mid, mdata in models.items():
|
||||
if mid.lower() == suffixed_lower:
|
||||
ctx = _extract_context(mdata)
|
||||
if ctx:
|
||||
return ctx
|
||||
|
||||
return None
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue