mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-07-04 12:33:08 +00:00
fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.
When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.
Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.
Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.
This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
spurious continuation messages into the conversation — strictly worse
Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
This commit is contained in:
parent
ab1f9b94c5
commit
00a8252b7d
2 changed files with 93 additions and 4 deletions
20
run_agent.py
20
run_agent.py
|
|
@ -1367,14 +1367,26 @@ class AIAgent:
|
|||
return False
|
||||
|
||||
def _is_ollama_glm_backend(self) -> bool:
|
||||
"""Detect the narrow backend family affected by Ollama/GLM stop misreports."""
|
||||
"""Detect the narrow backend family affected by Ollama/GLM stop misreports.
|
||||
|
||||
Only returns True for backends that are known to be Ollama, which
|
||||
can misreport truncated output as finish_reason='stop'. Other local
|
||||
servers (sglang, vLLM, LM Studio, etc.) report finish_reason correctly
|
||||
and must NOT be subjected to the stop->length heuristic.
|
||||
|
||||
Detection relies on explicit Ollama signatures:
|
||||
- Port 11434 (Ollama default)
|
||||
- "ollama" in the base URL (e.g. ollama.local, /ollama/ path)
|
||||
|
||||
The previous is_local_endpoint() fallback was too broad and caused
|
||||
false truncation detection on non-Ollama local servers hosting GLM
|
||||
models (sglang, vLLM, etc.).
|
||||
"""
|
||||
model_lower = (self.model or "").lower()
|
||||
provider_lower = (self.provider or "").lower()
|
||||
if "glm" not in model_lower and provider_lower != "zai":
|
||||
return False
|
||||
if "ollama" in self._base_url_lower or ":11434" in self._base_url_lower:
|
||||
return True
|
||||
return bool(self.base_url and is_local_endpoint(self.base_url))
|
||||
return "ollama" in self._base_url_lower or ":11434" in self._base_url_lower
|
||||
|
||||
def _should_treat_stop_as_truncated(
|
||||
self,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue