Commit graph

3 commits

Author SHA1 Message Date
Teknium
a06d0198cd
fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190)
The /api/pty handler only closed the PtyBridge in the writer loop's finally.
On child EOF the reader task closes the WebSocket, but if the handler task is
cancelled the instant the socket closes, the writer's finally can be skipped
and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under
dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted.

Reap the bridge in the reader's EOF finally too (close() is idempotent), so
the PTY is reaped independently of the writer-loop cancellation race. Harden
the regression test to poll for teardown instead of asserting on the same
tick. Was flaky on main (2/20); now 25/25.
2026-06-28 03:58:18 -07:00
Teknium
6d879d486b
fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028) (#54123)
* fix(dashboard): close PTY WebSocket on child EOF to stop FD leak

The /api/pty handler's reader task returns on child EOF, but the writer
loop stayed blocked on ws.receive() until the browser sent a disconnect.
When the browser socket is half-open (no FIN delivered — common on
macOS/launchd), that disconnect never arrives, so the handler never
reaches its finally and the PTY master fd + child process leak. With
dashboard auto-reconnect (#52962), every dropped socket then spawns a
fresh PTY on top of the orphaned one, exhausting file descriptors within
hours (EMFILE / Errno 24).

Fix: the reader task now closes the WebSocket in a finally when the child
EOFs or the send side breaks, which unblocks ws.receive() so the existing
finally runs bridge.close(). The writer loop also guards ws.receive()
against the RuntimeError Starlette raises once the socket is closed.

Reported by @fifteenzhang.

Fixes #54028

* docs: add infographic for #54028 PTY FD leak fix
2026-06-28 02:42:21 -07:00
Shannon Sands
a0dc92450b Split dashboard PTY reconnect tests 2026-06-26 01:06:02 -07:00