hermes-agent/scripts
teknium1 b0435cc164 fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback
_run_async() bridges sync tool handlers to async code. When the handler
is invoked from inside a running event loop (gateway / nested async),
it spawns a worker thread and blocks on future.result(timeout=300).

Before this change, a coroutine that ran past 300s leaked its worker
thread:

  - future.cancel() is a no-op on a running ThreadPoolExecutor future
    (cancel only works on not-yet-started work).
  - pool.shutdown(wait=False, cancel_futures=True) let the caller
    proceed but the worker kept running the coroutine until it
    returned on its own.

Every tool timeout leaked one thread. In long-lived gateway / RL
sessions this is cumulative.

The fix replaces bare asyncio.run() with a worker wrapper that
creates its own event loop. On timeout, _run_async schedules
task.cancel() on that loop via call_soon_threadsafe, then shuts the
pool down with wait=False so the caller returns immediately. The
coroutine observes CancelledError at its next await and the worker
thread exits cleanly.

Also switches logger.error() to logger.exception() in the top-level
handle_function_call() except block so tool failures produce full
stack traces in errors.log instead of just the message.

Related: #17420 (contributor flagged the leak; the original fix used
pool.shutdown(wait=True) which would have converted the leak into a
hang — caller blocks forever on the same stuck coroutine). Credit
for identifying the leak goes to the contributor.

Co-authored-by: 0z! <162235745+0z1-ghb@users.noreply.github.com>
2026-04-29 05:00:40 -07:00
..
lib feat: lazy bootstrap node 2026-04-16 10:47:37 -05:00
whatsapp-bridge Update whatsapp-bridge package-lock.json 2026-04-22 18:16:08 -07:00
build_model_catalog.py feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033) 2026-04-26 05:46:43 -07:00
build_skills_index.py feat(skills): centralized skills index — eliminate GitHub API calls for search/install 2026-04-12 16:39:04 -07:00
contributor_audit.py feat(ci): add contributor attribution check on PRs (#9376) 2026-04-13 21:13:08 -07:00
discord-voice-doctor.py fix(scripts): read gateway_voice_mode.json as UTF-8 2026-04-22 17:34:29 -07:00
hermes-gateway fix: prevent systemd restart storm on gateway connection failure 2026-03-21 09:26:39 -07:00
install.cmd feat: Windows native support via Git Bash 2026-03-02 22:03:29 -08:00
install.ps1 chore: defer WhatsApp bridge install to first use (#12992) 2026-04-20 04:55:33 -07:00
install.sh fix(install): widen /dev/tty open-probe to sibling gates (#16746) 2026-04-28 06:45:55 -07:00
kill_modal.sh refactor: replace swe-rex with native Modal SDK for Modal backend (#3538) 2026-03-28 11:21:44 -07:00
profile-tui.py chore(tui): remove dead branch cleanup code 2026-04-26 21:54:24 -05:00
release.py fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback 2026-04-29 05:00:40 -07:00
run_tests.sh test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577) 2026-04-17 06:09:09 -07:00
sample_and_compress.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00