feat(providers): enforce request_timeout_seconds on OpenAI-wire primary calls

Live test with timeout_seconds: 0.5 on claude-sonnet-4.6 proved the initial wiring was insufficient: run_agent.py was overriding the client-level timeout on every call via hardcoded per-request kwargs. Root cause: run_agent.py had two sites that pass an explicit timeout= kwarg into chat.completions.create() — api_kwargs['timeout'] at line 7075 (HERMES_API_TIMEOUT=1800s default) and the streaming path's _httpx.Timeout(..., read=HERMES_STREAM_READ_TIMEOUT=120s, ...) at line 5760. Both override the per-provider config value the client was constructed with, so a 0.5s config timeout would silently not enforce. This commit: - Adds AIAgent._resolved_api_call_timeout() — config > HERMES_API_TIMEOUT env > 1800s default. - Uses it for the non-streaming api_kwargs['timeout'] field. - Uses it for the streaming path's httpx.Timeout(connect, read, write, pool) so both connect and read respect the configured value when set. Local-provider auto-bump (Ollama/vLLM cold-start) only applies when no explicit config value is set. - New test: test_resolved_api_call_timeout_priority covers all three precedence cases (config, env, default). Live verified: 0.5s config on claude-sonnet-4.6 now triggers APITimeoutError at ~3s per retry, exhausts 3 retries in ~15s total (was: 29-47s success with timeout ignored). Positive case (60s config + gpt-4o-mini) still succeeds at 1.3s.
2026-04-25 00:51:20 +00:00 · 2026-04-19 11:10:47 -07:00 · 2026-04-19 11:10:47 -07:00 · c11ab6f64d
commit c11ab6f64d
parent f1fe29d1c3
4 changed files with 115 additions and 16 deletions
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@ -69,8 +69,12 @@ model:
 # Use this for per-provider request timeouts and per-model exceptions.
 # Applies to the primary turn client on every api_mode (OpenAI-wire, native
 # Anthropic, and Anthropic-compatible providers), the fallback chain, and
-# client rebuilds during credential rotation.  Leaving these unset keeps the
-# SDK defaults (OpenAI ≈ 600s, native Anthropic 900s).
+# client rebuilds during credential rotation.  For OpenAI-wire chat
+# completions (streaming and non-streaming) the configured value is also
+# used as the per-request ``timeout=`` kwarg so it wins over the legacy
+# HERMES_API_TIMEOUT env var (which still applies when no config is set).
+# Leaving these unset keeps the legacy defaults (HERMES_API_TIMEOUT=1800s,
+# native Anthropic 900s).
 #
 # providers:
 #   ollama-local: