mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults
Codex / Responses-API requests had three latent timeout bugs that combined into the long silent hangs reported on #21444: 1. The non-stream stale-call detector estimated context tokens from ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry their conversational load in ``input`` (with ``instructions`` and ``tools``), so every Codex turn logged ``context=~0 tokens`` and the detector never applied its >50k / >100k tier bumps. 2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the main Codex path. The chat_completions path and the auxiliary Codex adapter both forwarded it; the main path skipped it through three places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``, ``_preflight_codex_api_kwargs``). 3. The streaming stale detector had the same payload-shape bug for ``codex_responses`` requests, which route through the non-streaming detector (it's the path that emits the user-facing "No response from provider for 300s (non-streaming, ...)" warning that reporters keep pasting). This commit: - Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``, used by both the non-stream and stream detectors. Handles ``messages`` (Chat Completions), ``input + instructions + tools`` (Responses API), bare lists, and an unknown-dict fallback. - Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs`` and ``_preflight_codex_api_kwargs`` (with guards against zero/negative/inf/bool values), and wires ``_resolved_api_call_timeout()`` into the Codex branch of ``build_api_kwargs``. - Lowers the implicit non-stream stale defaults so fallback providers kick in faster when upstream stalls: * base 300s -> 90s * >50k 450s -> 150s * >100k 600s -> 240s These only apply when the user has *not* set ``providers.<id>.stale_timeout_seconds`` or ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins. - Adds regression tests for the estimator shapes, the new defaults, the context-tier scaling, transport timeout pass-through, and preflight timeout pass-through / rejection of invalid values. Closes #21444 Supersedes #21652 #24126 #31855 Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
This commit is contained in:
parent
76135b329d
commit
2d422720b5
10 changed files with 383 additions and 17 deletions
25
run_agent.py
25
run_agent.py
|
|
@ -885,7 +885,11 @@ class AIAgent:
|
|||
1. ``providers.<id>.models.<model>.stale_timeout_seconds``
|
||||
2. ``providers.<id>.stale_timeout_seconds``
|
||||
3. ``HERMES_API_CALL_STALE_TIMEOUT`` env var
|
||||
4. 300.0s default
|
||||
4. 90.0s default (time-to-first-byte for non-streaming / Codex
|
||||
internal-streaming requests; lowered from 300s in May 2026 so
|
||||
fallback providers kick in faster when upstream providers
|
||||
stall). The detector still scales up for large contexts in
|
||||
``_compute_non_stream_stale_timeout``.
|
||||
|
||||
Returns ``(timeout_seconds, uses_implicit_default)`` so the caller can
|
||||
preserve legacy behaviors that only apply when the user has *not*
|
||||
|
|
@ -900,20 +904,27 @@ class AIAgent:
|
|||
if env_timeout is not None:
|
||||
return float(env_timeout), False
|
||||
|
||||
return 300.0, True
|
||||
return 90.0, True
|
||||
|
||||
def _compute_non_stream_stale_timeout(self, messages: list[dict[str, Any]]) -> float:
|
||||
"""Compute the effective non-stream stale timeout for this request."""
|
||||
def _compute_non_stream_stale_timeout(self, api_payload: Any) -> float:
|
||||
"""Compute the effective non-stream stale timeout for this request.
|
||||
|
||||
Accepts either the full ``api_kwargs`` dict (Chat Completions or
|
||||
Responses API) or a legacy ``messages`` list. Context-size scaling
|
||||
applies the same way to both shapes via
|
||||
:func:`agent.chat_completion_helpers.estimate_request_context_tokens`.
|
||||
"""
|
||||
stale_base, uses_implicit_default = self._resolved_api_call_stale_timeout_base()
|
||||
base_url = getattr(self, "_base_url", None) or self.base_url or ""
|
||||
if uses_implicit_default and base_url and is_local_endpoint(base_url):
|
||||
return float("inf")
|
||||
|
||||
est_tokens = sum(len(str(v)) for v in messages) // 4
|
||||
from agent.chat_completion_helpers import estimate_request_context_tokens
|
||||
est_tokens = estimate_request_context_tokens(api_payload)
|
||||
if est_tokens > 100_000:
|
||||
return max(stale_base, 600.0)
|
||||
return max(stale_base, 240.0)
|
||||
if est_tokens > 50_000:
|
||||
return max(stale_base, 450.0)
|
||||
return max(stale_base, 150.0)
|
||||
return stale_base
|
||||
|
||||
def _is_openrouter_url(self) -> bool:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue