fix(force_close_tcp_sockets): shutdown only, do not release FD (#29507)

The helper used to call ``socket.shutdown(SHUT_RDWR)`` followed by
``socket.close()`` to drop CLOSE-WAIT entries immediately. On its own
``shutdown()`` is safe from any thread — it only sends FIN and breaks
pending ``recv``/``send`` — but ``close()`` releases the FD integer to
the kernel. When the helper runs on a stranger thread (the interrupt
loop, the stale-call detector) the FD release races the owning httpx
worker thread that still has the same integer cached inside the SSL
BIO. The kernel then recycles that integer to the next ``open()`` call
— in production, kanban dispatcher's ``kanban.db`` — and the worker's
delayed TLS flush writes a 24-byte TLS application-data record on top
of the SQLite header.

Restrict the helper to ``shutdown(SHUT_RDWR)`` only. The owning httpx
worker's own unwind will close the underlying socket via the same
Python ``socket.socket`` object, which atomically swaps ``_fd`` to -1
before issuing ``close(2)`` — no FD-aliasing window.

The log field ``tcp_force_closed=N`` is kept (now counts shutdowns) so
existing dashboards / log parsers keep working.
This commit is contained in:
xxxigm 2026-05-21 07:20:01 +07:00 committed by Teknium
parent 53cb6d32be
commit e2a7d73a66
2 changed files with 47 additions and 16 deletions

View file

@ -190,7 +190,13 @@ def test_replace_primary_openai_client_survives_repeated_rebuilds():
def test_force_close_tcp_sockets_descends_httpcore_1_connection_wrapper():
"""httpcore 1.x stores the real stream below conn._connection."""
"""httpcore 1.x stores the real stream below conn._connection.
Post-#29507: the helper must shut sockets down but must NOT release the
FD via ``sock.close()`` that race recycled FDs into unrelated file
descriptors (kanban.db) and let TLS bytes overwrite SQLite headers. The
owning httpx thread is responsible for closing FDs on its own unwind.
"""
from agent.agent_runtime_helpers import force_close_tcp_sockets
class FakeSocket:
@ -215,4 +221,6 @@ def test_force_close_tcp_sockets_descends_httpcore_1_connection_wrapper():
assert force_close_tcp_sockets(openai_client) == 1
assert sock.shutdown_calls == 1
assert sock.close_calls == 1
# #29507: close() must NOT be called from this helper — the owning
# httpx worker thread releases the FD, not us.
assert sock.close_calls == 0