mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix: remove misleading model.max_tokens suggestion from thinking-exhausted error (#11626)
The 'Thinking Budget Exhausted' user-facing error message advised users to 'set model.max_tokens in config.yaml'. That config key is documented but intentionally not wired through to the API call in CLI/gateway paths — we omit max_tokens by default so the inference server uses its full output budget (llama-server -1=infinity, vLLM max_model_len-prompt_len, etc.). Users followed the suggestion, saw no change, and kept filing bugs (see closed #4404, #10917, #6955 and PRs #5001/#6080/#6446/#6707/#7075/#8804/ #10924/#11173/#11268 — all reporting the same misdirection). Replace the misleading suggestion with an actionable one: switch models via /model. Lowering reasoning effort remains the primary remediation.
This commit is contained in:
parent
d49126b987
commit
1229d8855c
1 changed files with 1 additions and 2 deletions
|
|
@ -9303,8 +9303,7 @@ class AIAgent:
|
|||
"and had none left for the actual response.\n\n"
|
||||
"To fix this:\n"
|
||||
"→ Lower reasoning effort: `/thinkon low` or `/thinkon minimal`\n"
|
||||
"→ Increase the output token limit: "
|
||||
"set `model.max_tokens` in config.yaml"
|
||||
"→ Or switch to a larger/non-reasoning model with `/model`"
|
||||
)
|
||||
self._cleanup_task_resources(effective_task_id)
|
||||
self._persist_session(messages, conversation_history)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue