mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
perf(terminal): adaptive subprocess poll cuts ~195ms off every tool call (#29006)
`_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
This commit is contained in:
parent
a0c031299b
commit
6bd43111d1
1 changed files with 12 additions and 1 deletions
|
|
@ -609,6 +609,7 @@ class BaseEnvironment(ABC):
|
|||
)
|
||||
|
||||
try:
|
||||
_poll_sleep = 0.005
|
||||
while proc.poll() is None:
|
||||
_iter_count += 1
|
||||
if is_interrupted():
|
||||
|
|
@ -662,7 +663,17 @@ class BaseEnvironment(ABC):
|
|||
_last_heartbeat = time.monotonic()
|
||||
_cb_was_none = _cb_now_none
|
||||
|
||||
time.sleep(0.2)
|
||||
# Adaptive poll: start at 5ms so fast commands (echo, pwd,
|
||||
# date, cat short files) return in ~6ms instead of being
|
||||
# stuck waiting for the next 200ms tick. Back off
|
||||
# exponentially toward 200ms so long-running commands
|
||||
# (builds, tests, sleeps) don't pay measurable CPU in the
|
||||
# poll loop. For an `echo` this saves ~195ms per tool call;
|
||||
# for a 10s build the steady-state poll rate is identical
|
||||
# to the old behavior.
|
||||
time.sleep(_poll_sleep)
|
||||
if _poll_sleep < 0.2:
|
||||
_poll_sleep = min(_poll_sleep * 1.5, 0.2)
|
||||
except (KeyboardInterrupt, SystemExit):
|
||||
# Signal arrived (SIGTERM/SIGHUP/SIGINT) or sys.exit() was called
|
||||
# while we were polling. The local backend spawns subprocesses
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue