mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
This commit is contained in:
parent
08f35076c9
commit
c8aff74632
7 changed files with 140 additions and 92 deletions
|
|
@ -23,6 +23,19 @@ from tools.interrupt import is_interrupted
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Thread-local activity callback. The agent sets this before a tool call so
|
||||
# long-running _wait_for_process loops can report liveness to the gateway.
|
||||
_activity_callback_local = threading.local()
|
||||
|
||||
|
||||
def set_activity_callback(cb: Callable[[str], None] | None) -> None:
|
||||
"""Register a callback that _wait_for_process fires periodically."""
|
||||
_activity_callback_local.callback = cb
|
||||
|
||||
|
||||
def _get_activity_callback() -> Callable[[str], None] | None:
|
||||
return getattr(_activity_callback_local, "callback", None)
|
||||
|
||||
|
||||
def get_sandbox_dir() -> Path:
|
||||
"""Return the host-side root for all sandbox storage (Docker workspaces,
|
||||
|
|
@ -370,6 +383,10 @@ class BaseEnvironment(ABC):
|
|||
"""Poll-based wait with interrupt checking and stdout draining.
|
||||
|
||||
Shared across all backends — not overridden.
|
||||
|
||||
Fires the ``activity_callback`` (if set on this instance) every 10s
|
||||
while the process is running so the gateway's inactivity timeout
|
||||
doesn't kill long-running commands.
|
||||
"""
|
||||
output_chunks: list[str] = []
|
||||
|
||||
|
|
@ -388,6 +405,8 @@ class BaseEnvironment(ABC):
|
|||
drain_thread = threading.Thread(target=_drain, daemon=True)
|
||||
drain_thread.start()
|
||||
deadline = time.monotonic() + timeout
|
||||
_last_activity_touch = time.monotonic()
|
||||
_ACTIVITY_INTERVAL = 10.0 # seconds between activity touches
|
||||
|
||||
while proc.poll() is None:
|
||||
if is_interrupted():
|
||||
|
|
@ -408,6 +427,17 @@ class BaseEnvironment(ABC):
|
|||
else timeout_msg.lstrip(),
|
||||
"returncode": 124,
|
||||
}
|
||||
# Periodic activity touch so the gateway knows we're alive
|
||||
_now = time.monotonic()
|
||||
if _now - _last_activity_touch >= _ACTIVITY_INTERVAL:
|
||||
_last_activity_touch = _now
|
||||
_cb = _get_activity_callback()
|
||||
if _cb:
|
||||
try:
|
||||
_elapsed = int(_now - (deadline - timeout))
|
||||
_cb(f"terminal command running ({_elapsed}s elapsed)")
|
||||
except Exception:
|
||||
pass
|
||||
time.sleep(0.2)
|
||||
|
||||
drain_thread.join(timeout=5)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue