fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only

The _is_ollama_glm_backend() function was too broad: any local endpoint running a GLM model was treated as Ollama, triggering the stop->length misreport heuristic introduced in 8011aa3. This caused false truncation detection on sglang, vLLM, LM Studio, and other non-Ollama servers that correctly report finish_reason. When a GLM model on sglang/vLLM returned finish_reason='stop', the agent mistakenly reclassified it as 'length' if the response didn't end with a whitelisted punctuation character (ASCII or CJK). This particularly affected Chinese-language responses and Markdown-formatted text. Root cause: the is_local_endpoint() fallback assumed any local GLM endpoint = Ollama. But many non-Ollama servers also run on localhost. Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via its distinctive signatures (port 11434, 'ollama' in URL). All other local servers are assumed to report finish_reason correctly. This is the correct tradeoff because: - False negatives (Ollama at custom port, heuristic not triggered) only mean the user sees a truncated response — same as having no heuristic - False positives (non-Ollama server, heuristic wrongly triggered) inject spurious continuation messages into the conversation — strictly worse Adds two tests: - sglang GLM response is NOT reclassified as truncated - Ollama GLM on port 11434 still triggers the heuristic as before Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-07-04 12:33:08 +00:00 · 2026-04-21 00:56:28 +08:00 · 2026-04-21 00:56:28 +08:00 · 00a8252b7d
commit 00a8252b7d
parent ab1f9b94c5
2 changed files with 93 additions and 4 deletions
--- a/run_agent.py
+++ b/run_agent.py
@ -1367,14 +1367,26 @@ class AIAgent:
        return False

    def _is_ollama_glm_backend(self) -> bool:
-        """Detect the narrow backend family affected by Ollama/GLM stop misreports."""
+        """Detect the narrow backend family affected by Ollama/GLM stop misreports.
+
+        Only returns True for backends that are known to be Ollama, which
+        can misreport truncated output as finish_reason='stop'.  Other local
+        servers (sglang, vLLM, LM Studio, etc.) report finish_reason correctly
+        and must NOT be subjected to the stop->length heuristic.
+
+        Detection relies on explicit Ollama signatures:
+        - Port 11434 (Ollama default)
+        - "ollama" in the base URL (e.g. ollama.local, /ollama/ path)
+
+        The previous is_local_endpoint() fallback was too broad and caused
+        false truncation detection on non-Ollama local servers hosting GLM
+        models (sglang, vLLM, etc.).
+        """
        model_lower = (self.model or "").lower()
        provider_lower = (self.provider or "").lower()
        if "glm" not in model_lower and provider_lower != "zai":
            return False
-        if "ollama" in self._base_url_lower or ":11434" in self._base_url_lower:
-            return True
-        return bool(self.base_url and is_local_endpoint(self.base_url))
+        return "ollama" in self._base_url_lower or ":11434" in self._base_url_lower

    def _should_treat_stop_as_truncated(
        self,