fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults

Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:

1. The non-stream stale-call detector estimated context tokens from
   ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
   their conversational load in ``input`` (with ``instructions`` and
   ``tools``), so every Codex turn logged ``context=~0 tokens`` and the
   detector never applied its >50k / >100k tier bumps.

2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
   main Codex path. The chat_completions path and the auxiliary Codex
   adapter both forwarded it; the main path skipped it through three
   places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
   ``_preflight_codex_api_kwargs``).

3. The streaming stale detector had the same payload-shape bug for
   ``codex_responses`` requests, which route through the non-streaming
   detector (it's the path that emits the user-facing
   "No response from provider for 300s (non-streaming, ...)" warning that
   reporters keep pasting).

This commit:

- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
  used by both the non-stream and stream detectors. Handles ``messages``
  (Chat Completions), ``input + instructions + tools`` (Responses API),
  bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
  and ``_preflight_codex_api_kwargs`` (with guards against
  zero/negative/inf/bool values), and wires
  ``_resolved_api_call_timeout()`` into the Codex branch of
  ``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
  kick in faster when upstream stalls:
    * base   300s -> 90s
    * >50k   450s -> 150s
    * >100k  600s -> 240s
  These only apply when the user has *not* set
  ``providers.<id>.stale_timeout_seconds`` or
  ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
  context-tier scaling, transport timeout pass-through, and preflight
  timeout pass-through / rejection of invalid values.

Closes #21444
Supersedes #21652 #24126 #31855

Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
This commit is contained in:
Kasun Athaudahetti 2026-05-25 01:36:22 -07:00 committed by Teknium
parent 76135b329d
commit 2d422720b5
10 changed files with 383 additions and 17 deletions

View file

@ -1240,6 +1240,8 @@ AUTHOR_MAP = {
"165905879+davidcampbelldc@users.noreply.github.com": "davidcampbelldc",
"hoangv.pham0803@gmail.com": "hehehe0803", # PR #26212 salvage (codex kanban writable root)
"26063003+hehehe0803@users.noreply.github.com": "hehehe0803",
"kasunvinod@users.noreply.github.com": "kasunvinod", # PR #24126 salvage (codex timeout propagation)
"15059870+kasunvinod@users.noreply.github.com": "kasunvinod",
"38348871+vaddisrinivas@users.noreply.github.com": "vaddisrinivas", # PR #26394 salvage (Docker messaging extra)
# batch salvage (May 2026 LHF run, group 7)
"198679067+02356abc@users.noreply.github.com": "02356abc", # PR #28286 salvage (wecom CLOSING)