Two mitigations for the CLOSE_WAIT accumulation reported against QQ Bot
+ Feishu on macOS behind Cloudflare Warp.
1. Shared httpx.Limits helper (gateway/platforms/_http_client_limits.py).
Every long-lived platform adapter now constructs httpx.AsyncClient
with max_keepalive_connections=10 and keepalive_expiry=2.0, vs httpx's
default of unbounded keepalive pool and 5.0s expiry. On macOS/Warp the
default 5s window let idle keepalive sockets sit in CLOSE_WAIT long
enough for seven persistent adapters (QQ Bot, WeCom, DingTalk, Signal,
BlueBubbles, WeCom-callback, plus the transient Feishu helper) to
compound to the 256-fd ulimit. Tunable via
HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY and
HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE env vars.
2. whatsapp.send_typing aiohttp leak. The call was
'await self._http_session.post(...)' with no 'async with' and no
variable capture — the ClientResponse went out of scope unclosed,
holding its TCP socket in CLOSE_WAIT until GC. Fixed by wrapping in
'async with'. This was the only bare-await aiohttp leak in the
gateway/tools/plugins tree per audit; all other aiohttp sites use
the context-manager pattern correctly.
The underlying reporter also saw Feishu SDK (lark-oapi) connections in
CLOSE_WAIT — those are inside the SDK and out of our direct control, but
tightening httpx keepalive across adapters reduces the aggregate pool
pressure regardless of which individual adapter leaks.
WecomCallbackAdapter declared a _seen_messages dict and
MESSAGE_DEDUP_TTL_SECONDS constant but never actually checked
them in _handle_callback(). WeCom retries callback deliveries
on timeout, and each retry with the same MsgId was treated as
a fresh message and queued for processing.
Fix: check _seen_messages before enqueuing. Uses the same TTL-
based pattern as MessageDeduplicator (fixed in #10306) — check
age before returning duplicate, prune on overflow.
Closes#10305
Add a second WeCom integration mode for regular enterprise self-built
applications. Unlike the existing bot/websocket adapter (wecom.py),
this handles WeCom's standard callback flow: WeCom POSTs encrypted XML
to an HTTP endpoint, the adapter decrypts, queues for the agent, and
immediately acknowledges. The agent's reply is delivered proactively
via the message/send API.
Key design choice: always acknowledge immediately and use proactive
send — agent sessions take 3-30 minutes, so the 5-second inline reply
window is never useful. The original PR's Future/pending-reply
machinery was removed in favour of this simpler architecture.
Features:
- AES-CBC encrypt/decrypt (BizMsgCrypt-compatible)
- Multi-app routing scoped by corp_id:user_id
- Legacy bare user_id fallback for backward compat
- Access-token management with auto-refresh
- WECOM_CALLBACK_* env var overrides
- Port-in-use pre-check before binding
- Health endpoint at /health
Salvaged from PR #7774 by @chqchshj. Simplified by removing the
inline reply Future system and fixing: secrets.choice for nonce
generation, immediate plain-text acknowledgment (not encrypted XML
containing 'success'), and initial token refresh error handling.