fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking

Three root causes of the 'agent stops mid-task' gateway bug:

1. Compression threshold floor (64K tokens minimum)
   - The 50% threshold on a 100K-context model fired at 50K tokens,
     causing premature compression that made models lose track of
     multi-step plans.  Now threshold_tokens = max(50% * context, 64K).
   - Models with <64K context are rejected at startup with a clear error.

2. Budget warning removal — grace call instead
   - Removed the 70%/90% iteration budget warnings entirely.  These
     injected '[BUDGET WARNING: Provide your final response NOW]' into
     tool results, causing models to abandon complex tasks prematurely.
   - Now: no warnings during normal execution.  When the budget is
     actually exhausted (90/90), inject a user message asking the model
     to summarise, allow one grace API call, and only then fall back
     to _handle_max_iterations.

3. Activity touches during long terminal execution
   - _wait_for_process polls every 0.2s but never reported activity.
     The gateway's inactivity timeout (default 1800s) would fire during
     long-running commands that appeared 'idle.'
   - Now: thread-local activity callback fires every 10s during the
     poll loop, keeping the gateway's activity tracker alive.
   - Agent wires _touch_activity into the callback before each tool call.

Also: docs update noting 64K minimum context requirement.

Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
This commit is contained in:
Teknium 2026-04-11 16:18:57 -07:00 committed by GitHub
parent 08f35076c9
commit c8aff74632
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 140 additions and 92 deletions

View file

@ -23,6 +23,19 @@ from tools.interrupt import is_interrupted
logger = logging.getLogger(__name__)
# Thread-local activity callback. The agent sets this before a tool call so
# long-running _wait_for_process loops can report liveness to the gateway.
_activity_callback_local = threading.local()
def set_activity_callback(cb: Callable[[str], None] | None) -> None:
"""Register a callback that _wait_for_process fires periodically."""
_activity_callback_local.callback = cb
def _get_activity_callback() -> Callable[[str], None] | None:
return getattr(_activity_callback_local, "callback", None)
def get_sandbox_dir() -> Path:
"""Return the host-side root for all sandbox storage (Docker workspaces,
@ -370,6 +383,10 @@ class BaseEnvironment(ABC):
"""Poll-based wait with interrupt checking and stdout draining.
Shared across all backends not overridden.
Fires the ``activity_callback`` (if set on this instance) every 10s
while the process is running so the gateway's inactivity timeout
doesn't kill long-running commands.
"""
output_chunks: list[str] = []
@ -388,6 +405,8 @@ class BaseEnvironment(ABC):
drain_thread = threading.Thread(target=_drain, daemon=True)
drain_thread.start()
deadline = time.monotonic() + timeout
_last_activity_touch = time.monotonic()
_ACTIVITY_INTERVAL = 10.0 # seconds between activity touches
while proc.poll() is None:
if is_interrupted():
@ -408,6 +427,17 @@ class BaseEnvironment(ABC):
else timeout_msg.lstrip(),
"returncode": 124,
}
# Periodic activity touch so the gateway knows we're alive
_now = time.monotonic()
if _now - _last_activity_touch >= _ACTIVITY_INTERVAL:
_last_activity_touch = _now
_cb = _get_activity_callback()
if _cb:
try:
_elapsed = int(_now - (deadline - timeout))
_cb(f"terminal command running ({_elapsed}s elapsed)")
except Exception:
pass
time.sleep(0.2)
drain_thread.join(timeout=5)