Layer-2 defense for the FD-recycling race: even with
``force_close_tcp_sockets`` reduced to shutdown-only, the followup
``client.close()`` in ``_close_openai_client`` still walks the httpx
pool and closes sockets — and if called from a stranger thread (the
interrupt-check loop, the stale-call detector) it has the same
FD-recycling exposure that wrote a TLS record on top of ``kanban.db``.
Stamp the request_client_holder with the owning thread's ident at
``_set_request_client`` time. In ``_close_request_client_once``:
* Owning thread (the worker's ``finally``) → pop + ``client.close()``
via ``_close_request_openai_client``, exactly as before.
* Stranger thread → ``_abort_request_openai_client`` (new): only
``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close
marker. The holder stays populated so the worker's eventual
``finally`` performs the real close from its own thread context,
where the FD release races nothing.
Applied symmetrically to both the non-streaming
``interruptible_api_call`` and the streaming variant — both routinely
get hit by stranger-thread interrupts.
The log field ``tcp_force_closed=N`` keeps its existing shape; the new
abort path adds ``deferred_close=stranger_thread`` so production
triage can distinguish the two close kinds.
xAI's /v1/responses and /v1/chat/completions endpoints reject tool schemas
whose enum values contain a forward slash with a generic HTTP 400 'Invalid
arguments passed to the model.' before any token is emitted — the schema
compiler trips on the '/' character regardless of where it appears.
Most commonly hit by MCP-derived tools whose enum lists HuggingFace model
IDs ('Qwen/Qwen3.5-0.8B', 'openai/gpt-oss-20b') or owner/name environment
identifiers.
Mirrors the existing strip_pattern_and_format sanitizer (PR for #27197).
The new strip_slash_enum walks tool parameters and drops the entire enum
keyword when any value contains '/' — keeping it partial would still 400
since xAI's failure is all-or-nothing on the enum. The field description
still reaches the model so the prompting hint is preserved.
Wired in at both code paths for parity:
- agent/chat_completion_helpers.py (main agent xAI Responses path)
- agent/auxiliary_client.py (aux client xAI Responses path, matching
the same parity guarantee 2fae8fba9 established for pattern/format)
Salvaged from #28021 by @Slimydog21 — contributor's branch was severely
stale (would have reverted ~5000 LOC across azure/kanban/i18n); fix
re-applied surgically on current main with their sanitizer + 9 tests
preserved verbatim. Author noreply email used (original was a Mac
hostname leak).
Port of the run_agent.py changes from #27219 to current main: the
_build_api_kwargs body was extracted into agent/chat_completion_helpers.
build_api_kwargs, so wire the xAI tool-schema sanitization there
(provider in {'xai', 'xai-oauth'} or base_url=api.x.ai). Logs a warning
instead of silently swallowing exceptions, matching the contributor's
review-followup fix.
Co-authored-by: zccyman <zccyman@163.com>
Closes#25249 (and supersedes PR #25260) in spirit.
Two bugs in the streaming chat-completions path caused provider timeout
configuration to be silently ignored:
1. Hardcoded connect/pool timeout. The httpx.Timeout for streaming
calls used hardcoded connect=30.0 and pool=30.0 regardless of the
user's providers.<id>.request_timeout_seconds config. If the custom
provider (e.g. Ollama) was unreachable, the call always waited
exactly 30s before failing, ignoring any configured timeout.
Fix: use min(_base_timeout, 60.0) for connect and pool when a
provider timeout is configured, falling back to 30.0 otherwise.
The 60s cap addresses review feedback (TCP handshake shouldn't
wait the inference timeout — connect/pool cover the connection
layer, not model latency).
2. Streaming stale-stream detector ignored provider config. The
stale detector read only HERMES_STREAM_STALE_TIMEOUT (env default
180s). The providers.<id>.stale_timeout_seconds key (correctly
used in the non-streaming path) was never consulted.
Fix: check get_provider_stale_timeout(provider, model) first,
then fall back to the env var. Aligns the streaming path with
the non-streaming path's priority chain (config > env > default).
Salvage shape diverged from PR #25260: the function moved to
agent/chat_completion_helpers.py and the contributor's two commits
(initial fix + 60s-cap review follow-up) are squashed into one final
commit applied at the new location.
Original diagnosis, fix shape, AND the 60s-cap review response from
@zccyman in PR #25260; credited via Co-authored-by.
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Original commit 75e5d0f6b by hueilau targeted _build_api_kwargs in
pre-refactor run_agent.py. The body now lives in
agent/chat_completion_helpers.build_api_kwargs — re-applied there.
Also: switch the custom_providers forward (from 21078ebce) to use
getattr() — tests build a bare AIAgent via __new__ and would otherwise
hit AttributeError on _custom_providers.
Co-authored-by: hueilau <33933019+hueilau@users.noreply.github.com>
Original commit 21078ebce by PaTTeeL targeted _try_activate_fallback in
pre-refactor run_agent.py. The body now lives in
agent/chat_completion_helpers.try_activate_fallback — re-applied there.
Co-authored-by: PaTTeeL <9150277+PaTTeeL@users.noreply.github.com>
Original commit 9c304a7f5 by helix4u targeted _flatten_exception_chain,
_summarize_api_error, and the _call streaming retry loop in pre-refactor
run_agent.py. Re-applied to:
- New _is_provider_stream_parse_error helper → run_agent.py (next
to _flatten_exception_chain in the AIAgent class)
- _summarize_api_error early-return for the malformed-streaming
ValueError → run_agent.py (kept method body)
- _call streaming retry: _is_stream_parse_err flag wired into
_is_transient AND the post-exhaustion branch + dedicated
malformed-streaming user-status string → agent/chat_completion_helpers.py
(the _call body now lives there)
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Original commit b62c99797 by Jaaneek targeted six locations in
pre-refactor run_agent.py. Re-applied to the extracted post-PR locations:
- api_mode dispatch → agent/agent_init.py
- is_xai_responses build_api_kwargs → agent/chat_completion_helpers.py
- codex_auth_retry block + 401 hint → agent/conversation_loop.py
- _try_refresh_codex_client_credentials body → run_agent.py (kept)
The non-run_agent.py portions of the commit (auxiliary_client, codex
transport, hermes_cli/auth, tools/xai_http, tests, docs) merged cleanly
from main via the prior merge commit.
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Move _interruptible_streaming_api_call out of run_agent.py — the biggest
single method in the file. Body lives next to interruptible_api_call
in agent/chat_completion_helpers.py so streaming + non-streaming code
share one home.
Nested closures (_call_chat_completions, _call_anthropic, the codex
stream branch) all come along with the body and still capture the
parent function's locals as expected.
AIAgent keeps a thin forwarder method. is_local_endpoint added to
the import block (used by the stream stale-timeout disable logic).
One source-introspection test in TestAnthropicInterruptHandler is
updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call
instead of AIAgent._interruptible_streaming_api_call.
tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing
test_auxiliary_client failure).
run_agent.py: 12277 -> 11385 lines (-892).