Based on PR #7285 by @kshitijk4poor.
Two bugs affecting Qwen OAuth users:
1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M.
Added specific entries before the generic qwen catch-all:
- qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per
official Alibaba Cloud docs and OpenRouter)
- qwen3-coder: 262,144
2. Random stopping — max_tokens was suppressed for Qwen Portal, so the
server applied its own low default. Reasoning models exhaust that on
thinking tokens. Now: honor explicit max_tokens, default to 65536
when unset.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
When models return empty responses (no content, no tool calls, no
reasoning), Hermes previously retried 3 times silently then fell through
to '(empty)' — without ever trying the fallback provider chain. Users on
GLM-4.5-Air and similar models experienced what appeared to be a
complete hang, especially in gateway (Telegram/Discord) contexts where
the silent retries produced zero feedback.
Changes:
- After exhausting 3 empty retries, attempt _try_activate_fallback()
before giving up with '(empty)'. If fallback succeeds, reset retry
counter and continue the conversation loop with the new provider.
- Replace all _vprint() calls in recovery paths with _emit_status(),
which surfaces messages through both CLI (_vprint with force=True)
and gateway (status_callback -> adapter.send). Users now see:
* '⚠️ Empty response from model — retrying (N/3)' during retries
* '⚠️ Model returning empty responses — switching to fallback...'
* '↻ Switched to fallback: <model> (<provider>)' on success
* '❌ Model returned no content after all retries [and fallback]'
- Add logger.warning() throughout empty response paths for log file
visibility (model name, provider, retry counts).
- Upgrade _last_content_with_tools fallback from logger.debug to
logger.info + _emit_status so recovery is visible.
- Upgrade thinking-only prefill continuation to use _emit_status.
Tests:
- test_empty_response_triggers_fallback_provider: verifies fallback
activation after 3 empty retries produces content from fallback model
- test_empty_response_fallback_also_empty_returns_empty: verifies
graceful degradation when fallback also returns empty
- test_empty_response_emits_status_for_gateway: verifies _emit_status
is called during retries so gateway users see feedback
Addresses #7180.
When _build_api_kwargs() throws an exception, the except handler in
the retry loop referenced api_kwargs before it was assigned. This
caused an UnboundLocalError that masked the real error, making
debugging impossible for the user.
Two _dump_api_request_debug() calls in the except block (non-retryable
client error path and max-retries-exhausted path) both accessed
api_kwargs without checking if it was assigned.
Fix: initialize api_kwargs = None before the retry loop and guard both
dump calls. Now the real error surfaces instead of the masking
UnboundLocalError.
Reported by Discord user gruman0.
When a streaming response is cut mid-tool-call (connection drop, timeout),
the accumulated function.arguments is invalid JSON. The mock response
builder defaulted finish_reason to 'stop', so the agent loop treated it
as a valid completed turn and tried to execute tools with broken args.
Fix: validate tool call arguments with json.loads() during mock response
reconstruction. If any are invalid JSON, override finish_reason to
'length'. In the main loop's length handler, if tool calls are present,
refuse to execute and return partial=True with a clear error instead of
silently failing or wasting retries.
Also fixes _thinking_exhausted to not short-circuit when tool calls are
present — truncated tool calls are not thinking exhaustion.
Original cherry-picked from PR #6776 by AIandI0x1.
Closes#6638.
The test was mocking _vprint entirely, bypassing the suppress guard.
Switch to capturing _print_fn output so the real _vprint runs and
the guard suppresses retry noise as intended.
When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).
Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant. Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.
This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)
Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text. That differs from truly empty.
Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.
Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
(DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
5 inline 'portal.qwen.ai' string checks and share headers between init
and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion
New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
max_tokens suppression)
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock