fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) (#32012)

* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) When a stream stalls mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery used finish_reason='stop' which caused the conversation loop to treat the turn as complete, returning only the warning text. When users said 'continue', the model retried the same large tool call, hit the same stale timeout, and looped indefinitely. Changes: - chat_completion_helpers.py: change _stub_finish_reason from 'stop' to 'length' for mid-tool-call partials. The stub still has tool_calls=None so no tool auto-executes — the model gets a fresh API call through the existing length-continuation machinery (bounded to 3 retries). Also attach _dropped_tool_names to the stub for downstream use. - conversation_loop.py: add a third continuation prompt branch for partial-stream-stubs with dropped tool calls. Instead of the generic 'continue where you left off' (which would retry the same large call), tell the model to break the output into smaller tool calls (~8K tokens each) to avoid stream timeouts. - test_partial_stream_finish_reason.py: update existing test from finish_reason='stop' to 'length', add _dropped_tool_names assertion, add new test_dropped_tool_call_uses_chunking_prompt for the 3-way prompt branching. Safety: tool_calls=None is preserved on the stub, so the conversation loop enters the text-continuation branch (line 1513), NOT the tool-call execution branch (line 3246). No tool auto-executes. The model simply gets another API call with targeted guidance. * refactor: extract constants and continuation prompt helper - Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH) - Extract _get_continuation_prompt() in conversation_loop.py — DRYs the 3-way prompt branching and lets tests import the real function - Trim verbose inline comments in chat_completion_helpers.py - Tests import constants + helper instead of duplicating logic --------- Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-06-07 08:02:23 +00:00 · 2026-05-25 17:43:10 +05:30 · 2026-05-25 17:43:10 +05:30 · ac5359a3f3
commit ac5359a3f3
parent 46d8b5dadf
4 changed files with 116 additions and 88 deletions
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
@ -65,7 +65,7 @@ from agent.prompt_caching import apply_anthropic_cache_control
 from agent.retry_utils import jittered_backoff
 from agent.trajectory import has_incomplete_scratchpad
 from agent.usage_pricing import estimate_usage_cost, normalize_usage
-from hermes_constants import display_hermes_home as _dhh_fn
+from hermes_constants import display_hermes_home as _dhh_fn, PARTIAL_STREAM_STUB_ID
 from hermes_logging import set_session_context
 from tools.schema_sanitizer import strip_pattern_and_format
 from tools.skill_provenance import set_current_write_origin
@ -229,6 +229,37 @@ def _restore_or_build_system_prompt(agent, system_message, conversation_history)
            )


+def _get_continuation_prompt(is_partial_stub: bool, dropped_tools: Optional[List[str]] = None) -> str:
+    if is_partial_stub and dropped_tools:
+        tool_list = ", ".join(dropped_tools[:3])
+        return (
+            "[System: Your previous tool call "
+            f"({tool_list}) was too large and "
+            "the stream timed out before it "
+            "could be delivered. Do NOT retry "
+            "the same tool call with the same "
+            "large content. Instead, break the "
+            "content into multiple smaller tool "
+            "calls (e.g. use multiple patch calls "
+            "or write smaller files). Each tool "
+            "call's arguments must be under ~8K "
+            "tokens to avoid stream timeouts.]"
+        )
+    elif is_partial_stub:
+        return (
+            "[System: The previous response was cut off by a "
+            "network error mid-stream. Continue exactly where "
+            "you left off. Do not restart or repeat prior text. "
+            "Finish the answer directly.]"
+        )
+    else:
+        return (
+            "[System: Your previous response was truncated by the output "
+            "length limit. Continue exactly where you left off. Do not "
+            "restart or repeat prior text. Finish the answer directly.]"
+        )
+
+
 def run_conversation(
    agent,
    user_message: str,
@ -1414,7 +1445,7 @@ def run_conversation(
                        finish_reason = "length"

                if finish_reason == "length":
-                    if getattr(response, "id", "") == "partial-stream-stub":
+                    if getattr(response, "id", "") == PARTIAL_STREAM_STUB_ID:
                        agent._vprint(
                            f"{agent.log_prefix}⚠️  Stream interrupted by network error "
                            f"(finish_reason='length' on partial-stream-stub)",
@ -1518,37 +1549,36 @@ def run_conversation(
                                truncated_response_parts.append(assistant_message.content)

                            if length_continue_retries < 3:
-                                # Distinguish a real output-token truncation
-                                # from a partial-stream-stub network error
-                                # (#30963).  Same continuation machinery,
-                                # but the prompt has to tell the truth or
-                                # the model goes off rails ("I wasn't
-                                # truncated, I'm done").
                                _is_partial_stream_stub = (
-                                    getattr(response, "id", "") == "partial-stream-stub"
+                                    getattr(response, "id", "") == PARTIAL_STREAM_STUB_ID
                                )
-                                if _is_partial_stream_stub:
+                                _dropped_tools = getattr(
+                                    response, "_dropped_tool_names", None
+                                )
+
+                                if _is_partial_stream_stub and _dropped_tools:
+                                    _tool_list = ", ".join(_dropped_tools[:3])
+                                    agent._vprint(
+                                        f"{agent.log_prefix}↻ Stream interrupted mid "
+                                        f"tool-call ({_tool_list}) — requesting "
+                                        f"chunked retry "
+                                        f"({length_continue_retries}/3)..."
+                                    )
+                                elif _is_partial_stream_stub:
                                    agent._vprint(
                                        f"{agent.log_prefix}↻ Stream interrupted — "
                                        f"requesting continuation "
                                        f"({length_continue_retries}/3)..."
                                    )
-                                    _continue_content = (
-                                        "[System: The previous response was cut off by a "
-                                        "network error mid-stream. Continue exactly where "
-                                        "you left off. Do not restart or repeat prior text. "
-                                        "Finish the answer directly.]"
-                                    )
                                else:
                                    agent._vprint(
                                        f"{agent.log_prefix}↻ Requesting continuation "
                                        f"({length_continue_retries}/3)..."
                                    )
-                                    _continue_content = (
-                                        "[System: Your previous response was truncated by the output "
-                                        "length limit. Continue exactly where you left off. Do not "
-                                        "restart or repeat prior text. Finish the answer directly.]"
-                                    )
+
+                                _continue_content = _get_continuation_prompt(
+                                    _is_partial_stream_stub, _dropped_tools
+                                )
                                continue_msg = {
                                    "role": "user",
                                    "content": _continue_content,