test(ci): raise per-file timeout 140s → 300s to stop false timeouts (#54143)

* test(ci): raise per-file timeout 140s to 300s to stop false timeouts

The per-file parallel runner caps each test-file subprocess at a flat
wall-clock budget. Combined with per-test subprocess isolation (a fresh
Python process per test), a large-collection file pays N x (interpreter
startup + import) of overhead before any test logic runs. That overhead
dilates under load on shared CI runners, so a file that finishes in
~100s on a quiet box can blow the old 140s cap purely from scheduling
jitter, surfacing as a false 'no tests ran' timeout (rc=124) with zero
actual test failures.

Raise the default to 300s (5 min). The Docker build matrix jobs already
take 7-10 min, so this headroom costs nothing on total CI wall time
while still bounding a genuinely hung file.

* docs: add infographic for CI per-file timeout bump
This commit is contained in:
Teknium 2026-06-28 02:41:07 -07:00 committed by GitHub
parent dcc6cd1b42
commit b508d4296e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 10 additions and 1 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

View file

@ -74,7 +74,16 @@ _SKIP_PARTS = {"integration", "e2e", "docker"}
# Per-file wall-clock cap. Override
# via --file-timeout or HERMES_TEST_FILE_TIMEOUT.
_DEFAULT_FILE_TIMEOUT_SECONDS = 140.0 # set by observing the slowest file at commit time was ~100s in CI and adding some leeway
#
# Set to 300s (5 min) deliberately generous: the per-test subprocess
# isolation plugin spawns a fresh Python process per test, so a
# large-collection file pays N × (interpreter startup + import) of
# overhead before any test logic runs — and that overhead dilates under
# load on shared CI runners, producing false "no tests ran" timeouts on
# files that finish in ~100s on a quiet box. The Docker build matrix jobs
# take 7-10 min anyway, so this headroom costs nothing on total CI wall
# time while keeping a genuinely hung file bounded.
_DEFAULT_FILE_TIMEOUT_SECONDS = 300.0
# Duration cache: maps relative file paths to last-observed subprocess
# wall-clock seconds. Used by ``--slice`` to distribute files across