docs(providers): clarify vllm qwen reasoning output

Signed-off-by: HwangJohn <angelic805@gmail.com>

Co-authored-by: OpenAI Codex <codex@openai.com>
This commit is contained in:
HwangJohn 2026-06-17 18:34:40 +09:00 committed by Teknium
parent fe5c8d2316
commit 242962e1f5
2 changed files with 14 additions and 0 deletions

View file

@ -483,6 +483,10 @@ prompt_caching:
# # reasoning controls:
# # extra_body:
# # enable_thinking: false
# # Some vLLM/Qwen deployments expect this nested:
# # extra_body:
# # chat_template_kwargs:
# # enable_thinking: false
# =============================================================================
# Persistent Memory

View file

@ -792,6 +792,8 @@ hermes model
Supported parsers: `hermes` (Qwen 2.5, Hermes 2/3), `llama3_json` (Llama 3.x), `mistral`, `deepseek_v3`, `deepseek_v31`, `xlam`, `pythonic`. Without these flags, tool calls won't work — the model will output tool calls as text.
**Qwen reasoning parsers:** Hermes preserves structured reasoning metadata such as `reasoning`, `reasoning_content`, and streamed reasoning deltas when OpenAI-compatible servers return them. That metadata is treated as reasoning/thinking trace data, not as a replacement for the assistant's visible answer. For Qwen reasoning models served by vLLM, make sure the final user-visible response still appears in `content`. If `--reasoning-parser qwen3` leaves `content` empty in your deployment, either disable that parser or pass a server-supported request option such as `chat_template_kwargs.enable_thinking: false` through `extra_body`.
:::tip
vLLM supports human-readable sizes: `--max-model-len 64k` (lowercase k = 1000, uppercase K = 1024).
:::
@ -1272,6 +1274,14 @@ extra_body:
enable_thinking: true
```
For Qwen reasoning models served by vLLM, this same shape can be used to disable thinking when a reasoning parser separates all generated text into reasoning fields and leaves the assistant `content` empty:
```yaml
extra_body:
chat_template_kwargs:
enable_thinking: false
```
The `hermes model` → Custom Endpoint wizard now prompts for `api_mode` explicitly and persists your answer to `config.yaml`. URL-based auto-detection (e.g. `/anthropic` paths → `anthropic_messages`) still happens as a fallback when the field is left blank.
**Native vision for custom-provider models.** If your custom endpoint serves a vision-capable model that isn't in models.dev, set `model.supports_vision: true` so Hermes routes attached images natively (as `image_url` parts) instead of pre-processing them through `vision_analyze`. Single knob — no need to also set `agent.image_input_mode: native`.