Two-part fix:
Part 1 (classifier override at agent/error_classifier.py:720-738):
A transport disconnect on a reasoning model — even on a large session —
now routes to FailoverReason.timeout instead of context_overflow. Without
this, large-session reasoning-model disconnects route to the compression
branch and silently delete conversation history on a phantom
context-length error. The override is strictly targeted: non-reasoning
models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to
context_overflow on large sessions — the existing intentional behavior
for chat models whose proxy doesn't idle-kill during prefill/generation.
Part 2 (new agent/thinking_timeout_guidance.py + integration at
agent/conversation_loop.py:3488-3567):
New is_thinking_timeout() and build_thinking_timeout_guidance() helpers.
When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3,
Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning)
hits a transport-kill on a small session (classifier says timeout
directly) or after Part 1 routes correctly (large session), the user
now sees reasoning-specific guidance with three actionable workarounds
in priority order:
1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900
in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s
for known reasoning models; raise further if upstream is even
tighter).
2. Lower reasoning_budget or set reasoning_effort: medium on this
model if the provider supports it.
3. Use a smaller / faster reasoning model if the task doesn't
require deep thinking.
The new guidance takes precedence via if/elif over the existing
_is_stream_drop block, so a reasoning-model user with a transport-kill
message sees actionable advice instead of the misleading "try
execute_code with Python's open() for large files" advice (which is
correct for the unrelated large-file-write stream-drop case but
actively wrong for the thinking-timeout case).
Verified:
- 478 tests passing across 9 directly-relevant files (49 new + 429
existing, zero regressions).
- Ruff lint clean on all 4 modified/new files.
- Negative test: 6 parametrized regression guards confirm non-reasoning
models still route to context_overflow on large sessions; 4
parametrized gates confirm non-timeout classifier reasons never
trigger the guidance; 5 parametrized cases confirm non-transport
messages never trigger it.
- Regression guard: new guidance message does NOT contain
"execute_code" or "open()" — the misleading advice is fully
replaced, not appended alongside.
- Cross-vendor dual review via agy -p:
- Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one
SHOULD-FIX (vprint block duplication — fixed by extracting
detection into a helper module).
- GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits
(test placement — adopted at tests/agent/test_thinking_timeout_guidance.py;
primary-model capture — accepted as non-issue per Flash's nit).
Dependency note for maintainers:
This PR includes agent/reasoning_timeouts.py (the reasoning-model
allowlist module from PR #52238) because the Layer 1 override is
load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238
lands on main, this PR's duplicate agent/reasoning_timeouts.py should
be rebased away. Either PR can land first; the other rebase is
mechanical.
Fixes#52271.