mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-23 10:42:00 +00:00
docs(providers): clarify vllm qwen reasoning output
Signed-off-by: HwangJohn <angelic805@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com>
This commit is contained in:
parent
fe5c8d2316
commit
242962e1f5
2 changed files with 14 additions and 0 deletions
|
|
@ -483,6 +483,10 @@ prompt_caching:
|
|||
# # reasoning controls:
|
||||
# # extra_body:
|
||||
# # enable_thinking: false
|
||||
# # Some vLLM/Qwen deployments expect this nested:
|
||||
# # extra_body:
|
||||
# # chat_template_kwargs:
|
||||
# # enable_thinking: false
|
||||
|
||||
# =============================================================================
|
||||
# Persistent Memory
|
||||
|
|
|
|||
|
|
@ -792,6 +792,8 @@ hermes model
|
|||
|
||||
Supported parsers: `hermes` (Qwen 2.5, Hermes 2/3), `llama3_json` (Llama 3.x), `mistral`, `deepseek_v3`, `deepseek_v31`, `xlam`, `pythonic`. Without these flags, tool calls won't work — the model will output tool calls as text.
|
||||
|
||||
**Qwen reasoning parsers:** Hermes preserves structured reasoning metadata such as `reasoning`, `reasoning_content`, and streamed reasoning deltas when OpenAI-compatible servers return them. That metadata is treated as reasoning/thinking trace data, not as a replacement for the assistant's visible answer. For Qwen reasoning models served by vLLM, make sure the final user-visible response still appears in `content`. If `--reasoning-parser qwen3` leaves `content` empty in your deployment, either disable that parser or pass a server-supported request option such as `chat_template_kwargs.enable_thinking: false` through `extra_body`.
|
||||
|
||||
:::tip
|
||||
vLLM supports human-readable sizes: `--max-model-len 64k` (lowercase k = 1000, uppercase K = 1024).
|
||||
:::
|
||||
|
|
@ -1272,6 +1274,14 @@ extra_body:
|
|||
enable_thinking: true
|
||||
```
|
||||
|
||||
For Qwen reasoning models served by vLLM, this same shape can be used to disable thinking when a reasoning parser separates all generated text into reasoning fields and leaves the assistant `content` empty:
|
||||
|
||||
```yaml
|
||||
extra_body:
|
||||
chat_template_kwargs:
|
||||
enable_thinking: false
|
||||
```
|
||||
|
||||
The `hermes model` → Custom Endpoint wizard now prompts for `api_mode` explicitly and persists your answer to `config.yaml`. URL-based auto-detection (e.g. `/anthropic` paths → `anthropic_messages`) still happens as a fallback when the field is left blank.
|
||||
|
||||
**Native vision for custom-provider models.** If your custom endpoint serves a vision-capable model that isn't in models.dev, set `model.supports_vision: true` so Hermes routes attached images natively (as `image_url` parts) instead of pre-processing them through `vision_analyze`. Single knob — no need to also set `agent.image_input_mode: native`.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue