diff --git a/cli-config.yaml.example b/cli-config.yaml.example index 8e4ef34263..1712077fe6 100644 --- a/cli-config.yaml.example +++ b/cli-config.yaml.example @@ -366,6 +366,18 @@ compression: # web_extract: # provider: "auto" # model: "" +# +# # Session search — summarizes matching past sessions +# session_search: +# provider: "auto" +# model: "" +# timeout: 30 +# max_concurrency: 3 # Limit parallel summaries to reduce request-burst 429s +# extra_body: {} # Provider-specific OpenAI-compatible request fields +# # Example for providers that support request-body +# # reasoning controls: +# # extra_body: +# # enable_thinking: false # ============================================================================= # Persistent Memory diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index 59ac078bfa..9bcde0cdba 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -667,6 +667,8 @@ auxiliary: base_url: "" api_key: "" timeout: 30 + max_concurrency: 3 # Limit parallel summaries to reduce request-burst 429s + extra_body: {} # Provider-specific OpenAI-compatible request fields # Skills hub — skill matching and search skills_hub: @@ -701,6 +703,34 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern. ::: +### Session Search Tuning + +If you use a reasoning-heavy model for `auxiliary.session_search`, Hermes now gives you two built-in controls: + +- `auxiliary.session_search.max_concurrency`: limits how many matched sessions Hermes summarizes at once +- `auxiliary.session_search.extra_body`: forwards provider-specific OpenAI-compatible request fields on the summarization calls + +Example: + +```yaml +auxiliary: + session_search: + provider: "main" + model: "glm-4.5-air" + timeout: 60 + max_concurrency: 2 + extra_body: + enable_thinking: false +``` + +Use `max_concurrency` when your provider rate-limits request bursts and you want `session_search` to trade some parallelism for stability. + +Use `extra_body` only when your provider documents OpenAI-compatible request-body fields you want Hermes to pass through for that task. Hermes forwards the object as-is. + +:::warning +`extra_body` is only effective when your provider actually supports the field you send. If the provider does not expose a native OpenAI-compatible reasoning-off flag, Hermes cannot synthesize one on its behalf. +::: + ### Changing the Vision Model To use GPT-4o instead of Gemini Flash for image analysis: diff --git a/website/docs/user-guide/features/fallback-providers.md b/website/docs/user-guide/features/fallback-providers.md index 01e5524f6a..de89acc711 100644 --- a/website/docs/user-guide/features/fallback-providers.md +++ b/website/docs/user-guide/features/fallback-providers.md @@ -215,6 +215,9 @@ auxiliary: session_search: provider: "auto" model: "" + timeout: 30 + max_concurrency: 3 + extra_body: {} skills_hub: provider: "auto" @@ -248,6 +251,25 @@ fallback_model: # base_url: http://localhost:8000/v1 # Optional custom endpoint ``` +For `auxiliary.session_search`, Hermes also supports: + +- `max_concurrency` to limit how many session summaries run at once +- `extra_body` to pass provider-specific OpenAI-compatible request fields through on the summarization calls + +Example: + +```yaml +auxiliary: + session_search: + provider: main + model: glm-4.5-air + max_concurrency: 2 + extra_body: + enable_thinking: false +``` + +If your provider does not support a native OpenAI-compatible reasoning-control field, `extra_body` will not help for that part; in that case `max_concurrency` is still useful for reducing request-burst 429s. + All three — auxiliary, compression, fallback — work the same way: set `provider` to pick who handles the request, `model` to pick which model, and `base_url` to point at a custom endpoint (overrides provider). ### Provider Options for Auxiliary Tasks