fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking

Three root causes of the 'agent stops mid-task' gateway bug: 1. Compression threshold floor (64K tokens minimum) - The 50% threshold on a 100K-context model fired at 50K tokens, causing premature compression that made models lose track of multi-step plans. Now threshold_tokens = max(50% * context, 64K). - Models with <64K context are rejected at startup with a clear error. 2. Budget warning removal — grace call instead - Removed the 70%/90% iteration budget warnings entirely. These injected '[BUDGET WARNING: Provide your final response NOW]' into tool results, causing models to abandon complex tasks prematurely. - Now: no warnings during normal execution. When the budget is actually exhausted (90/90), inject a user message asking the model to summarise, allow one grace API call, and only then fall back to _handle_max_iterations. 3. Activity touches during long terminal execution - _wait_for_process polls every 0.2s but never reported activity. The gateway's inactivity timeout (default 1800s) would fire during long-running commands that appeared 'idle.' - Now: thread-local activity callback fires every 10s during the poll loop, keeping the gateway's activity tracker alive. - Agent wires _touch_activity into the callback before each tool call. Also: docs update noting 64K minimum context requirement. Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-26 01:01:40 +00:00 · 2026-04-11 16:18:57 -07:00 · 2026-04-11 16:18:57 -07:00 · c8aff74632
commit c8aff74632
parent 08f35076c9
7 changed files with 140 additions and 92 deletions
--- a/tools/environments/base.py
+++ b/tools/environments/base.py
@ -23,6 +23,19 @@ from tools.interrupt import is_interrupted

 logger = logging.getLogger(__name__)

+# Thread-local activity callback.  The agent sets this before a tool call so
+# long-running _wait_for_process loops can report liveness to the gateway.
+_activity_callback_local = threading.local()
+
+
+def set_activity_callback(cb: Callable[[str], None] | None) -> None:
+    """Register a callback that _wait_for_process fires periodically."""
+    _activity_callback_local.callback = cb
+
+
+def _get_activity_callback() -> Callable[[str], None] | None:
+    return getattr(_activity_callback_local, "callback", None)
+

 def get_sandbox_dir() -> Path:
    """Return the host-side root for all sandbox storage (Docker workspaces,
@ -370,6 +383,10 @@ class BaseEnvironment(ABC):
        """Poll-based wait with interrupt checking and stdout draining.

        Shared across all backends — not overridden.
+
+        Fires the ``activity_callback`` (if set on this instance) every 10s
+        while the process is running so the gateway's inactivity timeout
+        doesn't kill long-running commands.
        """
        output_chunks: list[str] = []

@ -388,6 +405,8 @@ class BaseEnvironment(ABC):
        drain_thread = threading.Thread(target=_drain, daemon=True)
        drain_thread.start()
        deadline = time.monotonic() + timeout
+        _last_activity_touch = time.monotonic()
+        _ACTIVITY_INTERVAL = 10.0  # seconds between activity touches

        while proc.poll() is None:
            if is_interrupted():
@ -408,6 +427,17 @@ class BaseEnvironment(ABC):
                    else timeout_msg.lstrip(),
                    "returncode": 124,
                }
+            # Periodic activity touch so the gateway knows we're alive
+            _now = time.monotonic()
+            if _now - _last_activity_touch >= _ACTIVITY_INTERVAL:
+                _last_activity_touch = _now
+                _cb = _get_activity_callback()
+                if _cb:
+                    try:
+                        _elapsed = int(_now - (deadline - timeout))
+                        _cb(f"terminal command running ({_elapsed}s elapsed)")
+                    except Exception:
+                        pass
            time.sleep(0.2)

        drain_thread.join(timeout=5)