mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-09 08:21:50 +00:00
The custom/Ollama provider profile had no default_max_tokens, so no max_tokens was sent on requests and Ollama fell back to its internal num_predict=128 — truncating responses after a few tokens with finish_reason='length' (#39281, e.g. gemma4). max_tokens resolution is ephemeral > user model.max_tokens > profile default, so this is only a floor used when the user hasn't set their own cap. Set it to 65536 (matching the qwen-oauth tier) rather than a conservative value, since users can always override per-model. Fixes #39281 |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| plugin.yaml | ||