mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-06 07:51:53 +00:00
fix(custom): pass custom provider extra body
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.
This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.
Example config:
```yaml
custom_providers:
- name: gemma-local
base_url: http://localhost:8080/v1
model: google/gemma-4-31b-it
extra_body:
enable_thinking: true
reasoning_effort: high
```
For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:
```yaml
extra_body:
chat_template_kwargs:
enable_thinking: true
```
Changes:
- `hermes_cli/config.py`: preserve `extra_body` in normalized
`custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
as `request_overrides.extra_body` for named custom runtime resolution,
including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
entry by `base_url` (+ optional model) and merge its `extra_body` into
`AIAgent.request_overrides`, with caller-provided overrides winning on
conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
`extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
trailing-slash equivalence, fallback when no `model` field is set, and
caller-override merging precedence.
Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.
Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.
Closes #29022
This commit is contained in:
parent
2fdefca570
commit
ba9964ff0d
7 changed files with 286 additions and 3 deletions
|
|
@ -1228,6 +1228,26 @@ custom_providers:
|
|||
api_mode: anthropic_messages # for Anthropic-compatible proxies
|
||||
```
|
||||
|
||||
Some OpenAI-compatible endpoints need provider-specific request body fields. Add an `extra_body` map to the matching custom provider and Hermes will merge it into each chat-completions request for that endpoint:
|
||||
|
||||
```yaml
|
||||
custom_providers:
|
||||
- name: gemma-local
|
||||
base_url: http://localhost:8080/v1
|
||||
model: google/gemma-4-31b-it
|
||||
extra_body:
|
||||
enable_thinking: true
|
||||
reasoning_effort: high
|
||||
```
|
||||
|
||||
Use the shape your server documents. For example, vLLM Gemma deployments and some NVIDIA NIM endpoints expect `enable_thinking` under `chat_template_kwargs` instead of as a top-level `extra_body` field:
|
||||
|
||||
```yaml
|
||||
extra_body:
|
||||
chat_template_kwargs:
|
||||
enable_thinking: true
|
||||
```
|
||||
|
||||
The `hermes model` → Custom Endpoint wizard now prompts for `api_mode` explicitly and persists your answer to `config.yaml`. URL-based auto-detection (e.g. `/anthropic` paths → `anthropic_messages`) still happens as a fallback when the field is left blank.
|
||||
|
||||
Switch between them mid-session with the triple syntax:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue