mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
Wraps every sync->async coroutine-scheduling site in the codebase with a new agent.async_utils.safe_schedule_threadsafe() helper that closes the coroutine on scheduling failure (closed loop, shutdown race, etc.) instead of leaking it as 'coroutine was never awaited' RuntimeWarnings plus reference leaks. 22 production call sites migrated across the codebase: - acp_adapter/events.py, acp_adapter/permissions.py - agent/lsp/manager.py - cron/scheduler.py (media + text delivery paths) - gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper which now delegates to safe_schedule_threadsafe) - gateway/run.py (10 sites: telegram rename, agent:step hook, status callback, interim+bg-review, clarify send, exec-approval button+text, temp-bubble cleanup, channel-directory refresh) - plugins/memory/hindsight, plugins/platforms/google_chat - tools/browser_supervisor.py (3), browser_cdp_tool.py, computer_use/cua_backend.py, slash_confirm.py - tools/environments/modal.py (_AsyncWorker) - tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to factory-style so the coroutine is never constructed on a dead loop) - tui_gateway/ws.py Tests: new tests/agent/test_async_utils.py covers helper behavior under live loop, dead loop, None loop, and scheduling exceptions. Regression tests added at three PR-original sites (acp events, acp permissions, mcp loop runner) mirroring contributor's intent. Live-tested end-to-end: - Helper stress test: 1500 schedules across live/dead/race scenarios, zero leaked coroutines - Race exercised: 5000 schedules with loop killed mid-flight, 100 ok / 4900 None returns, zero leaks - hermes chat -q with terminal tool call (exercises step_callback bridge) - MCP probe against failing subprocess servers + factory path - Real gateway daemon boot + SIGINT shutdown across multiple platform adapter inits - WSTransport 100 live + 50 dead-loop writes - Cron delivery path live + dead loop Salvages PR #2657 — adopts contributor's intent over a much wider site list and a single centralized helper instead of inline try/except at each site. 3 of the original PR's 6 sites no longer exist on main (environments/patches.py deleted, DingTalk refactored to native async); the equivalent fix lives in tools/environments/modal.py instead. Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
68 lines
2.5 KiB
Python
68 lines
2.5 KiB
Python
"""Async/sync bridging helpers.
|
|
|
|
The codebase has ~30 sites that schedule a coroutine onto an event loop from a
|
|
worker thread via :func:`asyncio.run_coroutine_threadsafe`. That function can
|
|
raise :class:`RuntimeError` (e.g. the loop was closed during a shutdown race),
|
|
and when it does the coroutine object is never awaited and never closed —
|
|
which triggers a ``"coroutine '<name>' was never awaited"`` RuntimeWarning and
|
|
leaks the coroutine's frame until GC.
|
|
|
|
:func:`safe_schedule_threadsafe` wraps the call, closes the coroutine on
|
|
scheduling failure, and returns ``None`` (instead of a half-formed future) so
|
|
callers can branch cleanly:
|
|
|
|
fut = safe_schedule_threadsafe(coro, loop)
|
|
if fut is None:
|
|
return # or fallback behavior
|
|
fut.result(timeout=5)
|
|
|
|
The helper deliberately does NOT also handle ``future.result()`` failures —
|
|
that is a separate concern. Once the loop has accepted the coroutine, its
|
|
lifecycle belongs to the loop, not the scheduling thread.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import asyncio
|
|
import logging
|
|
from concurrent.futures import Future
|
|
from typing import Any, Coroutine, Optional
|
|
|
|
|
|
_DEFAULT_LOGGER = logging.getLogger(__name__)
|
|
|
|
|
|
def safe_schedule_threadsafe(
|
|
coro: Coroutine[Any, Any, Any],
|
|
loop: Optional[asyncio.AbstractEventLoop],
|
|
*,
|
|
logger: Optional[logging.Logger] = None,
|
|
log_message: str = "Failed to schedule coroutine on loop",
|
|
log_level: int = logging.DEBUG,
|
|
) -> Optional[Future]:
|
|
"""Schedule ``coro`` on ``loop`` from a sync context, leak-safe.
|
|
|
|
Returns the :class:`concurrent.futures.Future` on success, or ``None`` if
|
|
the loop is missing or :func:`asyncio.run_coroutine_threadsafe` raised
|
|
(e.g. the loop was closed during a shutdown race). In all failure paths
|
|
the coroutine is :meth:`close`-d so it does not trigger
|
|
``"coroutine was never awaited"`` warnings or leak its frame.
|
|
|
|
Callers retain full control over what to do with the returned future
|
|
(call ``.result(timeout=...)``, attach ``add_done_callback``, ignore it
|
|
fire-and-forget, etc.).
|
|
"""
|
|
log = logger if logger is not None else _DEFAULT_LOGGER
|
|
|
|
if loop is None:
|
|
if asyncio.iscoroutine(coro):
|
|
coro.close()
|
|
log.log(log_level, "%s: loop is None", log_message)
|
|
return None
|
|
|
|
try:
|
|
return asyncio.run_coroutine_threadsafe(coro, loop)
|
|
except Exception as exc:
|
|
if asyncio.iscoroutine(coro):
|
|
coro.close()
|
|
log.log(log_level, "%s: %s", log_message, exc)
|
|
return None
|