docs(session_search): update all docs for the single-shape rewrite (#27840)

Companion PR to #27590. Sweeps remaining stale references to the
LLM-summary path that landed in main with #27590 but weren't fully
caught in the followup cleanup commit.

Real rewrites:
- user-guide/sessions.md: 'Session Search Tool' section rewritten to
  describe the three calling shapes (discovery / scroll / browse) with
  worked examples. Adds the 'Optional parameters' subsection covering
  sort and role_filter.
- user-guide/features/memory.md: 'Session Search' overview rewritten,
  comparison table updated (speed: ms instead of LLM summarization,
  added explicit free-cost row, link to sessions.md for details).

Stale-claim sweeps:
- user-guide/configuring-models.md: drop the 'Session Search' row from
  the aux-model override table (no aux model anymore), drop session
  search from the auxiliary-models list.
- user-guide/features/codex-app-server-runtime.md: drop session_search
  from the ChatGPT-subscription cost note, drop the session_search
  block from the per-task override config example.
- developer-guide/provider-runtime.md: drop 'session search
  summarization' from the auxiliary tasks list.
- developer-guide/agent-loop.md: drop session search from the
  auxiliary fallback chain list.
- user-guide/skills/.../autonomous-ai-agents-hermes-agent.md: drop
  session_search from the 'auxiliary models not working' debug step.

Untouched (still accurate as tool-name mentions, not behavioral claims):
- features/tools.md, features/honcho.md, features/acp.md
- cli.md, sessions.md (other sections)
- developer-guide/tools-runtime.md, agent-loop.md (line 157)
- acp-internals.md, adding-tools.md, prompt-assembly.md
- reference/toolsets-reference.md, reference/tools-reference.md
This commit is contained in:
Teknium 2026-05-18 00:36:17 -07:00 committed by GitHub
parent ff078738ea
commit 94c523f0c5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 59 additions and 19 deletions

View file

@ -194,7 +194,7 @@ When the primary model fails (429 rate limit, 5xx server error, 401/403 auth err
3. On success, continue the conversation with the new provider
4. On 401/403, attempt credential refresh before failing over
The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section.
The fallback system also covers auxiliary tasks independently — vision, compression, and web extraction each have their own fallback chain configurable via the `auxiliary.*` config section.
## Compression and Persistence

View file

@ -150,7 +150,6 @@ Auxiliary tasks such as:
- vision
- web extraction summarization
- context compression summaries
- session search summarization
- skills hub operations
- MCP helper operations
- memory flushes

View file

@ -7,7 +7,7 @@ sidebar_position: 3
Hermes uses two kinds of model slots:
- **Main model** — what the agent thinks with. Every user message, every tool-call loop, every streamed response goes through this model.
- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, session search, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently.
- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently.
This page covers configuring both from the dashboard. If you prefer config files or the CLI, jump to [Alternative methods](#alternative-methods) at the bottom.
@ -52,7 +52,6 @@ Every auxiliary task defaults to `auto` — meaning Hermes uses your main model
| **Title Gen** | Almost always. A $0.10/M flash model writes session titles as well as Opus. Default config sets this to `google/gemini-3-flash-preview` on OpenRouter. |
| **Vision** | When your main model is a coding model without vision (e.g. Kimi, DeepSeek). Point it at `google/gemini-2.5-flash` or `gpt-4o-mini`. |
| **Compression** | When you're burning reasoning tokens on Opus/M2.7 just to summarize context. A fast chat model does the job at 1/50th the cost. |
| **Session Search** | When recall queries fan out — default max_concurrency is 3. A cheap model keeps the bill predictable. |
| **Approval** | For `approval_mode: smart` — a fast/cheap model (haiku, flash, gpt-5-mini) decides whether to auto-approve low-risk commands. Expensive models here are waste. |
| **Web Extract** | When you use `web_extract` heavily. Same logic as compression — summarization doesn't need reasoning. |
| **Skills Hub** | `hermes skills search` uses this. Usually fine at `auto`. |

View file

@ -242,7 +242,7 @@ default_permissions = ":read-only"
## Auxiliary tasks and ChatGPT subscription token cost
When this runtime is on with the `openai-codex` provider, **auxiliary tasks (title generation, context compression, vision auto-detect, session search summarization, the background self-improvement review fork) also flow through your ChatGPT subscription by default**, because Hermes' auxiliary client uses the main provider/model when no per-task override is set.
When this runtime is on with the `openai-codex` provider, **auxiliary tasks (title generation, context compression, vision auto-detect, the background self-improvement review fork) also flow through your ChatGPT subscription by default**, because Hermes' auxiliary client uses the main provider/model when no per-task override is set.
This isn't specific to `codex_app_server` — it's true for the existing `codex_responses` path too — but it's more visible here because you're explicitly opting in for the subscription billing.
@ -259,9 +259,6 @@ auxiliary:
vision_detect:
provider: openrouter
model: google/gemini-3-flash-preview
session_search:
provider: openrouter
model: google/gemini-3-flash-preview
goal_judge:
provider: openrouter
model: google/gemini-3-flash-preview

View file

@ -177,19 +177,23 @@ Memory entries are scanned for injection and exfiltration patterns before being
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
- Search queries return relevant past conversations with Gemini Flash summarization
- Search queries return actual messages from the DB — no LLM summarization, no truncation
- The agent can find things it discussed weeks ago, even if they're not in its active memory
- The agent can also scroll forward/backward inside any session it finds
```bash
hermes sessions list # Browse past sessions
```
See [Session Search Tool](/docs/user-guide/sessions#session-search-tool) for the three calling shapes (discovery / scroll / browse) and the response format.
### session_search vs memory
| Feature | Persistent Memory | Session Search |
|---------|------------------|----------------|
| **Capacity** | ~1,300 tokens total | Unlimited (all sessions) |
| **Speed** | Instant (in system prompt) | Requires search + LLM summarization |
| **Speed** | Instant (in system prompt) | ~20ms FTS5 query, ~1ms scroll |
| **Cost** | Token cost in every prompt | Free — no LLM calls |
| **Use case** | Key facts always available | Finding specific past conversations |
| **Management** | Manually curated by agent | Automatic — all sessions stored |
| **Token cost** | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |

View file

@ -366,25 +366,66 @@ For deeper analytics — token usage, cost estimates, tool breakdown, and activi
## Session Search Tool
The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine.
The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine — and lets the agent scroll through any session it finds. No LLM calls, no summarization, no truncation. Every shape returns actual messages from the DB.
### How It Works
### Three calling shapes
1. FTS5 searches matching messages ranked by relevance
2. Groups results by session, takes the top N unique sessions (default 3)
3. Loads each session's conversation, truncates to ~100K chars centered on matches
4. Sends to a fast summarization model for focused summaries
5. Returns per-session summaries with metadata and surrounding context
The tool infers what you want from which arguments you set. There's no `mode` parameter.
**1. Discovery — pass `query`:**
```python
session_search(query="auth refactor", limit=3)
```
Runs FTS5, dedupes hits by session lineage, returns the top N sessions. Each result carries:
- `session_id`, `title`, `when`, `source`
- `snippet` — FTS5-highlighted match excerpt
- `bookend_start` — first 3 user+assistant messages of the session (the goal/kickoff)
- `messages` — ±5 messages around the FTS5 match, with the anchor message flagged (the hit in context)
- `bookend_end` — last 3 user+assistant messages of the session (the resolution/decisions)
- `match_message_id`, `messages_before`, `messages_after`
Bookends + window together reconstruct goal → match → resolution without paying for the whole transcript. Typical wall time: 1550ms on a real session DB.
**2. Scroll — pass `session_id` + `around_message_id`:**
```python
session_search(session_id="20260510_174648_805cc2", around_message_id=590803, window=10)
```
Returns a window of ±`window` messages centered on the anchor. No FTS5, no bookends — just the slice. Use after a discovery call when you need more context than the ±5 default window.
- To scroll **forward**: pass `messages[-1].id` back as `around_message_id`
- To scroll **backward**: pass `messages[0].id` back as `around_message_id`
- The boundary message appears in both windows as an orientation marker
- When `messages_before` or `messages_after` is less than `window`, you're at the start or end of the session
Typical wall time: 12ms per scroll call.
**3. Browse — no args:**
```python
session_search()
```
Returns recent sessions chronologically (titles, previews, timestamps). Useful when the user asks "what was I working on" without naming a topic.
### FTS5 Query Syntax
The search supports standard FTS5 query syntax:
- Simple keywords: `docker deployment`
- Simple keywords: `docker deployment` (FTS5 defaults to AND)
- Phrases: `"exact phrase"`
- Boolean: `docker OR kubernetes`, `python NOT java`
- Prefix: `deploy*`
### Optional parameters
- `sort``newest` or `oldest`, on top of FTS5 ranking. Omit for relevance-only ordering (the default; suitable for exploratory recall). Use `newest` for "where did we leave X" questions, `oldest` for "how did X start" questions.
- `role_filter` — comma-separated roles to include. Discovery defaults to `user,assistant` (tool output is usually noise). Pass `user,assistant,tool` to include tool output (debugging tool behaviour) or `tool` to search tool output only.
### When It's Used
The agent is prompted to use session search automatically:

View file

@ -853,7 +853,7 @@ Common gateway problems:
- **Windows-specific issues** (`Alt+Enter` newline, WinError 10106, UTF-8 BOM config, test suite, line endings): see the dedicated **Windows-Specific Quirks** section above.
### Auxiliary models not working
If `auxiliary` tasks (vision, compression, session_search) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider:
If `auxiliary` tasks (vision, compression) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider:
```bash
hermes config set auxiliary.vision.provider <your_provider>
hermes config set auxiliary.vision.model <model_name>