mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
docs(config): document session_search auxiliary controls
This commit is contained in:
parent
6ab78401c9
commit
afba54364e
3 changed files with 64 additions and 0 deletions
|
|
@ -667,6 +667,8 @@ auxiliary:
|
|||
base_url: ""
|
||||
api_key: ""
|
||||
timeout: 30
|
||||
max_concurrency: 3 # Limit parallel summaries to reduce request-burst 429s
|
||||
extra_body: {} # Provider-specific OpenAI-compatible request fields
|
||||
|
||||
# Skills hub — skill matching and search
|
||||
skills_hub:
|
||||
|
|
@ -701,6 +703,34 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
|
|||
Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
|
||||
:::
|
||||
|
||||
### Session Search Tuning
|
||||
|
||||
If you use a reasoning-heavy model for `auxiliary.session_search`, Hermes now gives you two built-in controls:
|
||||
|
||||
- `auxiliary.session_search.max_concurrency`: limits how many matched sessions Hermes summarizes at once
|
||||
- `auxiliary.session_search.extra_body`: forwards provider-specific OpenAI-compatible request fields on the summarization calls
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
session_search:
|
||||
provider: "main"
|
||||
model: "glm-4.5-air"
|
||||
timeout: 60
|
||||
max_concurrency: 2
|
||||
extra_body:
|
||||
enable_thinking: false
|
||||
```
|
||||
|
||||
Use `max_concurrency` when your provider rate-limits request bursts and you want `session_search` to trade some parallelism for stability.
|
||||
|
||||
Use `extra_body` only when your provider documents OpenAI-compatible request-body fields you want Hermes to pass through for that task. Hermes forwards the object as-is.
|
||||
|
||||
:::warning
|
||||
`extra_body` is only effective when your provider actually supports the field you send. If the provider does not expose a native OpenAI-compatible reasoning-off flag, Hermes cannot synthesize one on its behalf.
|
||||
:::
|
||||
|
||||
### Changing the Vision Model
|
||||
|
||||
To use GPT-4o instead of Gemini Flash for image analysis:
|
||||
|
|
|
|||
|
|
@ -215,6 +215,9 @@ auxiliary:
|
|||
session_search:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
timeout: 30
|
||||
max_concurrency: 3
|
||||
extra_body: {}
|
||||
|
||||
skills_hub:
|
||||
provider: "auto"
|
||||
|
|
@ -248,6 +251,25 @@ fallback_model:
|
|||
# base_url: http://localhost:8000/v1 # Optional custom endpoint
|
||||
```
|
||||
|
||||
For `auxiliary.session_search`, Hermes also supports:
|
||||
|
||||
- `max_concurrency` to limit how many session summaries run at once
|
||||
- `extra_body` to pass provider-specific OpenAI-compatible request fields through on the summarization calls
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
session_search:
|
||||
provider: main
|
||||
model: glm-4.5-air
|
||||
max_concurrency: 2
|
||||
extra_body:
|
||||
enable_thinking: false
|
||||
```
|
||||
|
||||
If your provider does not support a native OpenAI-compatible reasoning-control field, `extra_body` will not help for that part; in that case `max_concurrency` is still useful for reducing request-burst 429s.
|
||||
|
||||
All three — auxiliary, compression, fallback — work the same way: set `provider` to pick who handles the request, `model` to pick which model, and `base_url` to point at a custom endpoint (overrides provider).
|
||||
|
||||
### Provider Options for Auxiliary Tasks
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue