docs(config): document session_search auxiliary controls

2026-04-25 00:51:20 +00:00 · 2026-04-19 13:11:22 -06:00 · 2026-04-19 13:11:22 -06:00 · afba54364e
commit afba54364e
parent 6ab78401c9
3 changed files with 64 additions and 0 deletions
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@ -667,6 +667,8 @@ auxiliary:
    base_url: ""
    api_key: ""
    timeout: 30
+    max_concurrency: 3       # Limit parallel summaries to reduce request-burst 429s
+    extra_body: {}           # Provider-specific OpenAI-compatible request fields

  # Skills hub — skill matching and search
  skills_hub:
@ -701,6 +703,34 @@ Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision
 Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
 :::

+### Session Search Tuning
+
+If you use a reasoning-heavy model for `auxiliary.session_search`, Hermes now gives you two built-in controls:
+
+- `auxiliary.session_search.max_concurrency`: limits how many matched sessions Hermes summarizes at once
+- `auxiliary.session_search.extra_body`: forwards provider-specific OpenAI-compatible request fields on the summarization calls
+
+Example:
+
+```yaml
+auxiliary:
+  session_search:
+    provider: "main"
+    model: "glm-4.5-air"
+    timeout: 60
+    max_concurrency: 2
+    extra_body:
+      enable_thinking: false
+```
+
+Use `max_concurrency` when your provider rate-limits request bursts and you want `session_search` to trade some parallelism for stability.
+
+Use `extra_body` only when your provider documents OpenAI-compatible request-body fields you want Hermes to pass through for that task. Hermes forwards the object as-is.
+
+:::warning
+`extra_body` is only effective when your provider actually supports the field you send. If the provider does not expose a native OpenAI-compatible reasoning-off flag, Hermes cannot synthesize one on its behalf.
+:::
+
 ### Changing the Vision Model

 To use GPT-4o instead of Gemini Flash for image analysis:
--- a/website/docs/user-guide/features/fallback-providers.md
+++ b/website/docs/user-guide/features/fallback-providers.md
@ -215,6 +215,9 @@ auxiliary:
  session_search:
    provider: "auto"
    model: ""
+    timeout: 30
+    max_concurrency: 3
+    extra_body: {}

  skills_hub:
    provider: "auto"
@ -248,6 +251,25 @@ fallback_model:
  # base_url: http://localhost:8000/v1               # Optional custom endpoint
 ```

+For `auxiliary.session_search`, Hermes also supports:
+
+- `max_concurrency` to limit how many session summaries run at once
+- `extra_body` to pass provider-specific OpenAI-compatible request fields through on the summarization calls
+
+Example:
+
+```yaml
+auxiliary:
+  session_search:
+    provider: main
+    model: glm-4.5-air
+    max_concurrency: 2
+    extra_body:
+      enable_thinking: false
+```
+
+If your provider does not support a native OpenAI-compatible reasoning-control field, `extra_body` will not help for that part; in that case `max_concurrency` is still useful for reducing request-burst 429s.
+
 All three — auxiliary, compression, fallback — work the same way: set `provider` to pick who handles the request, `model` to pick which model, and `base_url` to point at a custom endpoint (overrides provider).

 ### Provider Options for Auxiliary Tasks