mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-04 07:31:58 +00:00
docs(session_search): update all docs for the single-shape rewrite (#27840)
Companion PR to #27590. Sweeps remaining stale references to the LLM-summary path that landed in main with #27590 but weren't fully caught in the followup cleanup commit. Real rewrites: - user-guide/sessions.md: 'Session Search Tool' section rewritten to describe the three calling shapes (discovery / scroll / browse) with worked examples. Adds the 'Optional parameters' subsection covering sort and role_filter. - user-guide/features/memory.md: 'Session Search' overview rewritten, comparison table updated (speed: ms instead of LLM summarization, added explicit free-cost row, link to sessions.md for details). Stale-claim sweeps: - user-guide/configuring-models.md: drop the 'Session Search' row from the aux-model override table (no aux model anymore), drop session search from the auxiliary-models list. - user-guide/features/codex-app-server-runtime.md: drop session_search from the ChatGPT-subscription cost note, drop the session_search block from the per-task override config example. - developer-guide/provider-runtime.md: drop 'session search summarization' from the auxiliary tasks list. - developer-guide/agent-loop.md: drop session search from the auxiliary fallback chain list. - user-guide/skills/.../autonomous-ai-agents-hermes-agent.md: drop session_search from the 'auxiliary models not working' debug step. Untouched (still accurate as tool-name mentions, not behavioral claims): - features/tools.md, features/honcho.md, features/acp.md - cli.md, sessions.md (other sections) - developer-guide/tools-runtime.md, agent-loop.md (line 157) - acp-internals.md, adding-tools.md, prompt-assembly.md - reference/toolsets-reference.md, reference/tools-reference.md
This commit is contained in:
parent
ff078738ea
commit
94c523f0c5
7 changed files with 59 additions and 19 deletions
|
|
@ -194,7 +194,7 @@ When the primary model fails (429 rate limit, 5xx server error, 401/403 auth err
|
|||
3. On success, continue the conversation with the new provider
|
||||
4. On 401/403, attempt credential refresh before failing over
|
||||
|
||||
The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section.
|
||||
The fallback system also covers auxiliary tasks independently — vision, compression, and web extraction each have their own fallback chain configurable via the `auxiliary.*` config section.
|
||||
|
||||
## Compression and Persistence
|
||||
|
||||
|
|
|
|||
|
|
@ -150,7 +150,6 @@ Auxiliary tasks such as:
|
|||
- vision
|
||||
- web extraction summarization
|
||||
- context compression summaries
|
||||
- session search summarization
|
||||
- skills hub operations
|
||||
- MCP helper operations
|
||||
- memory flushes
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ sidebar_position: 3
|
|||
Hermes uses two kinds of model slots:
|
||||
|
||||
- **Main model** — what the agent thinks with. Every user message, every tool-call loop, every streamed response goes through this model.
|
||||
- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, session search, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently.
|
||||
- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently.
|
||||
|
||||
This page covers configuring both from the dashboard. If you prefer config files or the CLI, jump to [Alternative methods](#alternative-methods) at the bottom.
|
||||
|
||||
|
|
@ -52,7 +52,6 @@ Every auxiliary task defaults to `auto` — meaning Hermes uses your main model
|
|||
| **Title Gen** | Almost always. A $0.10/M flash model writes session titles as well as Opus. Default config sets this to `google/gemini-3-flash-preview` on OpenRouter. |
|
||||
| **Vision** | When your main model is a coding model without vision (e.g. Kimi, DeepSeek). Point it at `google/gemini-2.5-flash` or `gpt-4o-mini`. |
|
||||
| **Compression** | When you're burning reasoning tokens on Opus/M2.7 just to summarize context. A fast chat model does the job at 1/50th the cost. |
|
||||
| **Session Search** | When recall queries fan out — default max_concurrency is 3. A cheap model keeps the bill predictable. |
|
||||
| **Approval** | For `approval_mode: smart` — a fast/cheap model (haiku, flash, gpt-5-mini) decides whether to auto-approve low-risk commands. Expensive models here are waste. |
|
||||
| **Web Extract** | When you use `web_extract` heavily. Same logic as compression — summarization doesn't need reasoning. |
|
||||
| **Skills Hub** | `hermes skills search` uses this. Usually fine at `auto`. |
|
||||
|
|
|
|||
|
|
@ -242,7 +242,7 @@ default_permissions = ":read-only"
|
|||
|
||||
## Auxiliary tasks and ChatGPT subscription token cost
|
||||
|
||||
When this runtime is on with the `openai-codex` provider, **auxiliary tasks (title generation, context compression, vision auto-detect, session search summarization, the background self-improvement review fork) also flow through your ChatGPT subscription by default**, because Hermes' auxiliary client uses the main provider/model when no per-task override is set.
|
||||
When this runtime is on with the `openai-codex` provider, **auxiliary tasks (title generation, context compression, vision auto-detect, the background self-improvement review fork) also flow through your ChatGPT subscription by default**, because Hermes' auxiliary client uses the main provider/model when no per-task override is set.
|
||||
|
||||
This isn't specific to `codex_app_server` — it's true for the existing `codex_responses` path too — but it's more visible here because you're explicitly opting in for the subscription billing.
|
||||
|
||||
|
|
@ -259,9 +259,6 @@ auxiliary:
|
|||
vision_detect:
|
||||
provider: openrouter
|
||||
model: google/gemini-3-flash-preview
|
||||
session_search:
|
||||
provider: openrouter
|
||||
model: google/gemini-3-flash-preview
|
||||
goal_judge:
|
||||
provider: openrouter
|
||||
model: google/gemini-3-flash-preview
|
||||
|
|
|
|||
|
|
@ -177,19 +177,23 @@ Memory entries are scanned for injection and exfiltration patterns before being
|
|||
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
|
||||
|
||||
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
|
||||
- Search queries return relevant past conversations with Gemini Flash summarization
|
||||
- Search queries return actual messages from the DB — no LLM summarization, no truncation
|
||||
- The agent can find things it discussed weeks ago, even if they're not in its active memory
|
||||
- The agent can also scroll forward/backward inside any session it finds
|
||||
|
||||
```bash
|
||||
hermes sessions list # Browse past sessions
|
||||
```
|
||||
|
||||
See [Session Search Tool](/docs/user-guide/sessions#session-search-tool) for the three calling shapes (discovery / scroll / browse) and the response format.
|
||||
|
||||
### session_search vs memory
|
||||
|
||||
| Feature | Persistent Memory | Session Search |
|
||||
|---------|------------------|----------------|
|
||||
| **Capacity** | ~1,300 tokens total | Unlimited (all sessions) |
|
||||
| **Speed** | Instant (in system prompt) | Requires search + LLM summarization |
|
||||
| **Speed** | Instant (in system prompt) | ~20ms FTS5 query, ~1ms scroll |
|
||||
| **Cost** | Token cost in every prompt | Free — no LLM calls |
|
||||
| **Use case** | Key facts always available | Finding specific past conversations |
|
||||
| **Management** | Manually curated by agent | Automatic — all sessions stored |
|
||||
| **Token cost** | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
|
||||
|
|
|
|||
|
|
@ -366,25 +366,66 @@ For deeper analytics — token usage, cost estimates, tool breakdown, and activi
|
|||
|
||||
## Session Search Tool
|
||||
|
||||
The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine.
|
||||
The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine — and lets the agent scroll through any session it finds. No LLM calls, no summarization, no truncation. Every shape returns actual messages from the DB.
|
||||
|
||||
### How It Works
|
||||
### Three calling shapes
|
||||
|
||||
1. FTS5 searches matching messages ranked by relevance
|
||||
2. Groups results by session, takes the top N unique sessions (default 3)
|
||||
3. Loads each session's conversation, truncates to ~100K chars centered on matches
|
||||
4. Sends to a fast summarization model for focused summaries
|
||||
5. Returns per-session summaries with metadata and surrounding context
|
||||
The tool infers what you want from which arguments you set. There's no `mode` parameter.
|
||||
|
||||
**1. Discovery — pass `query`:**
|
||||
|
||||
```python
|
||||
session_search(query="auth refactor", limit=3)
|
||||
```
|
||||
|
||||
Runs FTS5, dedupes hits by session lineage, returns the top N sessions. Each result carries:
|
||||
|
||||
- `session_id`, `title`, `when`, `source`
|
||||
- `snippet` — FTS5-highlighted match excerpt
|
||||
- `bookend_start` — first 3 user+assistant messages of the session (the goal/kickoff)
|
||||
- `messages` — ±5 messages around the FTS5 match, with the anchor message flagged (the hit in context)
|
||||
- `bookend_end` — last 3 user+assistant messages of the session (the resolution/decisions)
|
||||
- `match_message_id`, `messages_before`, `messages_after`
|
||||
|
||||
Bookends + window together reconstruct goal → match → resolution without paying for the whole transcript. Typical wall time: 15–50ms on a real session DB.
|
||||
|
||||
**2. Scroll — pass `session_id` + `around_message_id`:**
|
||||
|
||||
```python
|
||||
session_search(session_id="20260510_174648_805cc2", around_message_id=590803, window=10)
|
||||
```
|
||||
|
||||
Returns a window of ±`window` messages centered on the anchor. No FTS5, no bookends — just the slice. Use after a discovery call when you need more context than the ±5 default window.
|
||||
|
||||
- To scroll **forward**: pass `messages[-1].id` back as `around_message_id`
|
||||
- To scroll **backward**: pass `messages[0].id` back as `around_message_id`
|
||||
- The boundary message appears in both windows as an orientation marker
|
||||
- When `messages_before` or `messages_after` is less than `window`, you're at the start or end of the session
|
||||
|
||||
Typical wall time: 1–2ms per scroll call.
|
||||
|
||||
**3. Browse — no args:**
|
||||
|
||||
```python
|
||||
session_search()
|
||||
```
|
||||
|
||||
Returns recent sessions chronologically (titles, previews, timestamps). Useful when the user asks "what was I working on" without naming a topic.
|
||||
|
||||
### FTS5 Query Syntax
|
||||
|
||||
The search supports standard FTS5 query syntax:
|
||||
|
||||
- Simple keywords: `docker deployment`
|
||||
- Simple keywords: `docker deployment` (FTS5 defaults to AND)
|
||||
- Phrases: `"exact phrase"`
|
||||
- Boolean: `docker OR kubernetes`, `python NOT java`
|
||||
- Prefix: `deploy*`
|
||||
|
||||
### Optional parameters
|
||||
|
||||
- `sort` — `newest` or `oldest`, on top of FTS5 ranking. Omit for relevance-only ordering (the default; suitable for exploratory recall). Use `newest` for "where did we leave X" questions, `oldest` for "how did X start" questions.
|
||||
- `role_filter` — comma-separated roles to include. Discovery defaults to `user,assistant` (tool output is usually noise). Pass `user,assistant,tool` to include tool output (debugging tool behaviour) or `tool` to search tool output only.
|
||||
|
||||
### When It's Used
|
||||
|
||||
The agent is prompted to use session search automatically:
|
||||
|
|
|
|||
|
|
@ -853,7 +853,7 @@ Common gateway problems:
|
|||
- **Windows-specific issues** (`Alt+Enter` newline, WinError 10106, UTF-8 BOM config, test suite, line endings): see the dedicated **Windows-Specific Quirks** section above.
|
||||
|
||||
### Auxiliary models not working
|
||||
If `auxiliary` tasks (vision, compression, session_search) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider:
|
||||
If `auxiliary` tasks (vision, compression) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider:
|
||||
```bash
|
||||
hermes config set auxiliary.vision.provider <your_provider>
|
||||
hermes config set auxiliary.vision.model <model_name>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue