diff --git a/website/docs/developer-guide/agent-loop.md b/website/docs/developer-guide/agent-loop.md index cf9cb1c1efd..fdc0cc3c8f9 100644 --- a/website/docs/developer-guide/agent-loop.md +++ b/website/docs/developer-guide/agent-loop.md @@ -194,7 +194,7 @@ When the primary model fails (429 rate limit, 5xx server error, 401/403 auth err 3. On success, continue the conversation with the new provider 4. On 401/403, attempt credential refresh before failing over -The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section. +The fallback system also covers auxiliary tasks independently — vision, compression, and web extraction each have their own fallback chain configurable via the `auxiliary.*` config section. ## Compression and Persistence diff --git a/website/docs/developer-guide/provider-runtime.md b/website/docs/developer-guide/provider-runtime.md index 830382479ff..67c86b01c29 100644 --- a/website/docs/developer-guide/provider-runtime.md +++ b/website/docs/developer-guide/provider-runtime.md @@ -150,7 +150,6 @@ Auxiliary tasks such as: - vision - web extraction summarization - context compression summaries -- session search summarization - skills hub operations - MCP helper operations - memory flushes diff --git a/website/docs/user-guide/configuring-models.md b/website/docs/user-guide/configuring-models.md index 4c12fa7e7d1..a4ce79eea3f 100644 --- a/website/docs/user-guide/configuring-models.md +++ b/website/docs/user-guide/configuring-models.md @@ -7,7 +7,7 @@ sidebar_position: 3 Hermes uses two kinds of model slots: - **Main model** — what the agent thinks with. Every user message, every tool-call loop, every streamed response goes through this model. -- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, session search, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently. +- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently. This page covers configuring both from the dashboard. If you prefer config files or the CLI, jump to [Alternative methods](#alternative-methods) at the bottom. @@ -52,7 +52,6 @@ Every auxiliary task defaults to `auto` — meaning Hermes uses your main model | **Title Gen** | Almost always. A $0.10/M flash model writes session titles as well as Opus. Default config sets this to `google/gemini-3-flash-preview` on OpenRouter. | | **Vision** | When your main model is a coding model without vision (e.g. Kimi, DeepSeek). Point it at `google/gemini-2.5-flash` or `gpt-4o-mini`. | | **Compression** | When you're burning reasoning tokens on Opus/M2.7 just to summarize context. A fast chat model does the job at 1/50th the cost. | -| **Session Search** | When recall queries fan out — default max_concurrency is 3. A cheap model keeps the bill predictable. | | **Approval** | For `approval_mode: smart` — a fast/cheap model (haiku, flash, gpt-5-mini) decides whether to auto-approve low-risk commands. Expensive models here are waste. | | **Web Extract** | When you use `web_extract` heavily. Same logic as compression — summarization doesn't need reasoning. | | **Skills Hub** | `hermes skills search` uses this. Usually fine at `auto`. | diff --git a/website/docs/user-guide/features/codex-app-server-runtime.md b/website/docs/user-guide/features/codex-app-server-runtime.md index 575250d9b01..130e790f06e 100644 --- a/website/docs/user-guide/features/codex-app-server-runtime.md +++ b/website/docs/user-guide/features/codex-app-server-runtime.md @@ -242,7 +242,7 @@ default_permissions = ":read-only" ## Auxiliary tasks and ChatGPT subscription token cost -When this runtime is on with the `openai-codex` provider, **auxiliary tasks (title generation, context compression, vision auto-detect, session search summarization, the background self-improvement review fork) also flow through your ChatGPT subscription by default**, because Hermes' auxiliary client uses the main provider/model when no per-task override is set. +When this runtime is on with the `openai-codex` provider, **auxiliary tasks (title generation, context compression, vision auto-detect, the background self-improvement review fork) also flow through your ChatGPT subscription by default**, because Hermes' auxiliary client uses the main provider/model when no per-task override is set. This isn't specific to `codex_app_server` — it's true for the existing `codex_responses` path too — but it's more visible here because you're explicitly opting in for the subscription billing. @@ -259,9 +259,6 @@ auxiliary: vision_detect: provider: openrouter model: google/gemini-3-flash-preview - session_search: - provider: openrouter - model: google/gemini-3-flash-preview goal_judge: provider: openrouter model: google/gemini-3-flash-preview diff --git a/website/docs/user-guide/features/memory.md b/website/docs/user-guide/features/memory.md index 77f74d28a8b..5c07df63578 100644 --- a/website/docs/user-guide/features/memory.md +++ b/website/docs/user-guide/features/memory.md @@ -177,19 +177,23 @@ Memory entries are scanned for injection and exfiltration patterns before being Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool: - All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search -- Search queries return relevant past conversations with Gemini Flash summarization +- Search queries return actual messages from the DB — no LLM summarization, no truncation - The agent can find things it discussed weeks ago, even if they're not in its active memory +- The agent can also scroll forward/backward inside any session it finds ```bash hermes sessions list # Browse past sessions ``` +See [Session Search Tool](/docs/user-guide/sessions#session-search-tool) for the three calling shapes (discovery / scroll / browse) and the response format. + ### session_search vs memory | Feature | Persistent Memory | Session Search | |---------|------------------|----------------| | **Capacity** | ~1,300 tokens total | Unlimited (all sessions) | -| **Speed** | Instant (in system prompt) | Requires search + LLM summarization | +| **Speed** | Instant (in system prompt) | ~20ms FTS5 query, ~1ms scroll | +| **Cost** | Token cost in every prompt | Free — no LLM calls | | **Use case** | Key facts always available | Finding specific past conversations | | **Management** | Manually curated by agent | Automatic — all sessions stored | | **Token cost** | Fixed per session (~1,300 tokens) | On-demand (searched when needed) | diff --git a/website/docs/user-guide/sessions.md b/website/docs/user-guide/sessions.md index e90c3f60bcb..2a663bf5ace 100644 --- a/website/docs/user-guide/sessions.md +++ b/website/docs/user-guide/sessions.md @@ -366,25 +366,66 @@ For deeper analytics — token usage, cost estimates, tool breakdown, and activi ## Session Search Tool -The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine. +The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine — and lets the agent scroll through any session it finds. No LLM calls, no summarization, no truncation. Every shape returns actual messages from the DB. -### How It Works +### Three calling shapes -1. FTS5 searches matching messages ranked by relevance -2. Groups results by session, takes the top N unique sessions (default 3) -3. Loads each session's conversation, truncates to ~100K chars centered on matches -4. Sends to a fast summarization model for focused summaries -5. Returns per-session summaries with metadata and surrounding context +The tool infers what you want from which arguments you set. There's no `mode` parameter. + +**1. Discovery — pass `query`:** + +```python +session_search(query="auth refactor", limit=3) +``` + +Runs FTS5, dedupes hits by session lineage, returns the top N sessions. Each result carries: + +- `session_id`, `title`, `when`, `source` +- `snippet` — FTS5-highlighted match excerpt +- `bookend_start` — first 3 user+assistant messages of the session (the goal/kickoff) +- `messages` — ±5 messages around the FTS5 match, with the anchor message flagged (the hit in context) +- `bookend_end` — last 3 user+assistant messages of the session (the resolution/decisions) +- `match_message_id`, `messages_before`, `messages_after` + +Bookends + window together reconstruct goal → match → resolution without paying for the whole transcript. Typical wall time: 15–50ms on a real session DB. + +**2. Scroll — pass `session_id` + `around_message_id`:** + +```python +session_search(session_id="20260510_174648_805cc2", around_message_id=590803, window=10) +``` + +Returns a window of ±`window` messages centered on the anchor. No FTS5, no bookends — just the slice. Use after a discovery call when you need more context than the ±5 default window. + +- To scroll **forward**: pass `messages[-1].id` back as `around_message_id` +- To scroll **backward**: pass `messages[0].id` back as `around_message_id` +- The boundary message appears in both windows as an orientation marker +- When `messages_before` or `messages_after` is less than `window`, you're at the start or end of the session + +Typical wall time: 1–2ms per scroll call. + +**3. Browse — no args:** + +```python +session_search() +``` + +Returns recent sessions chronologically (titles, previews, timestamps). Useful when the user asks "what was I working on" without naming a topic. ### FTS5 Query Syntax The search supports standard FTS5 query syntax: -- Simple keywords: `docker deployment` +- Simple keywords: `docker deployment` (FTS5 defaults to AND) - Phrases: `"exact phrase"` - Boolean: `docker OR kubernetes`, `python NOT java` - Prefix: `deploy*` +### Optional parameters + +- `sort` — `newest` or `oldest`, on top of FTS5 ranking. Omit for relevance-only ordering (the default; suitable for exploratory recall). Use `newest` for "where did we leave X" questions, `oldest` for "how did X start" questions. +- `role_filter` — comma-separated roles to include. Discovery defaults to `user,assistant` (tool output is usually noise). Pass `user,assistant,tool` to include tool output (debugging tool behaviour) or `tool` to search tool output only. + ### When It's Used The agent is prompted to use session search automatically: diff --git a/website/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent.md b/website/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent.md index 5f2c8d16a2a..ec0a4a92503 100644 --- a/website/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent.md +++ b/website/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent.md @@ -853,7 +853,7 @@ Common gateway problems: - **Windows-specific issues** (`Alt+Enter` newline, WinError 10106, UTF-8 BOM config, test suite, line endings): see the dedicated **Windows-Specific Quirks** section above. ### Auxiliary models not working -If `auxiliary` tasks (vision, compression, session_search) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider: +If `auxiliary` tasks (vision, compression) fail silently, the `auto` provider can't find a backend. Either set `OPENROUTER_API_KEY` or `GOOGLE_API_KEY`, or explicitly configure each auxiliary task's provider: ```bash hermes config set auxiliary.vision.provider hermes config set auxiliary.vision.model