hermes-agent/website/docs/user-guide/features/tool-search.md
teknium1 7427b9d581 fix(tool-search): scope bridge catalog + dispatch to the session's toolsets
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:

  1. tool_search the entire process registry, not just its granted tools, and
  2. tool_call any registered plugin/MCP tool it was never given, because
     registry.dispatch() has no enabled_tools gate for non-execute_code tools.

A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.

Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
  dispatch scopes get_tool_definitions to them (also stops polluting the
  process-global _last_resolved_tool_names with out-of-scope tools, which
  leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
  deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
  same scope before dispatch, since it unwraps tool_call -> underlying name
  and bypasses the bridge branch. New _tool_search_scoped_names() helper,
  cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.

Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
2026-05-29 02:04:12 -07:00

159 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Tool Search
sidebar_position: 95
---
# Tool Search
When you have many MCP servers or non-core plugin tools attached to a
session, their JSON schemas can consume a substantial fraction of the
context window on every turn — even when only a few of them are relevant
to what the user actually asked for.
**Tool Search** is Hermes' opt-in progressive-disclosure layer for that
problem. When activated, MCP and plugin tools are replaced in the
model-visible tools array by three bridge tools, and the model loads each
specific tool's schema on demand.
:::info Built-in Hermes tools never defer
The tools that make up Hermes' core capability set (`terminal`,
`read_file`, `write_file`, `patch`, `search_files`, `todo`, `memory`,
`browser_*`, `web_search`, `web_extract`, `clarify`, `execute_code`,
`delegate_task`, `session_search`, `send_message`, and the rest of
`_HERMES_CORE_TOOLS`) are *always* loaded directly. Only MCP tools and
non-core plugin tools are eligible for deferral.
:::
## How it works
When Tool Search activates for a turn, the model sees three new tools in
place of the deferred ones:
```
tool_search(query, limit?) — search the deferred-tool catalog
tool_describe(name) — load the full schema for one tool
tool_call(name, arguments) — invoke a deferred tool
```
A typical interaction looks like:
```
Model: tool_search("create a github issue")
→ { matches: [{ name: "mcp_github_create_issue", ... }, ...] }
Model: tool_describe("mcp_github_create_issue")
→ { parameters: { type: "object", properties: { ... } } }
Model: tool_call("mcp_github_create_issue", { title: "...", body: "..." })
→ { ok: true, issue_number: 42 }
```
When the model invokes `tool_call`, Hermes **unwraps the bridge** and
dispatches the underlying tool exactly as if the model had called it
directly. Pre-tool-call hooks, guardrails, approval prompts, and
post-tool-call hooks all run against the real tool name — not against
`tool_call`. The activity feed in the CLI and gateway also unwraps so you
see the underlying tool, not the bridge.
## When does it activate?
By default Tool Search runs in `auto` mode: it activates only when the
deferrable tool schemas would consume at least 10% of the active model's
context window. Below that, the tools-array assembly is a pure
pass-through and you pay no overhead.
This decision is re-evaluated every time the tools array is built, so:
- A session with just a few MCP tools and a long context model never
activates Tool Search.
- A session with many MCP servers attached (15+ tools typically) starts
activating it.
- Removing MCP servers mid-session correctly returns to direct exposure
on the next assembly.
## Configuration
```yaml
tools:
tool_search:
enabled: auto # auto (default), on, or off
threshold_pct: 10 # percentage of context — only used in auto mode
search_default_limit: 5
max_search_limit: 20
```
| Key | Default | Meaning |
| --- | --- | --- |
| `enabled` | `auto` | `auto` activates above threshold; `on` always activates if there's at least one deferrable tool; `off` disables entirely. |
| `threshold_pct` | `10` | Percentage of context length at which `auto` mode kicks in. Range 0100. |
| `search_default_limit` | `5` | Hits returned when the model calls `tool_search` without a `limit`. |
| `max_search_limit` | `20` | Hard upper bound the model can request via `limit`. Range 150. |
You can also flip the legacy boolean shape:
```yaml
tools:
tool_search: true # equivalent to {enabled: auto}
```
## When NOT to use it
Tool Search trades a fixed per-turn token cost (the three bridge tool
schemas, ~300 tokens) and at least one extra round trip (search →
describe → call) for the savings on the deferred schemas. It's a clear
win when you have many tools and use few per turn; it's overhead when
you have few tools total.
The `auto` default handles this for you. If you set `enabled: on`
unconditionally, expect a slight per-turn cost on small toolsets.
## Trade-offs that don't go away
These come from the prompt-cache integrity invariant — they are inherent
to any progressive-disclosure design, not specific to this implementation:
- **One extra round trip on cold tools.** The first time the model needs
a deferred tool, it spends one or two extra model calls to find and
load the schema. The token savings on the static side are real, but a
portion is paid back at runtime.
- **No cache benefit on deferred schemas.** A loaded `tool_describe`
result enters the conversation history (so it does get cached on
subsequent turns) but it never benefits from the system-prompt cache
prefix.
- **Model-quality dependence.** Tool Search assumes the model can write a
reasonable search query for the tool it wants. Smaller models do this
less well; the published Anthropic numbers (49% → 74% on Opus 4 with
vs. without tool search) show the upside but also that ~26 points of
accuracy is still retrieval failure.
- **Toolset edits invalidate cache.** Adding or removing a tool mid-
session changes the bridge tools' descriptions (which include the
count of deferred tools) and the catalog, so the prompt cache is
invalidated. This is the same trade-off as any toolset edit.
## Implementation details
- **Retrieval:** BM25 over tokenized tool name + description + parameter
names. Falls back to a literal substring match on the tool name when
BM25 returns no positive-score hits, which protects against
zero-IDF degenerate cases (e.g. searching `"github"` against a
catalog where every tool name contains "github").
- **Catalog is stateless across turns.** It rebuilds from the current
tool-defs list every assembly — no session-keyed `Map`. This avoids
the class of bug where a stored catalog drifts out of sync with the
live tool registry.
- **The catalog is scoped to the session's toolsets.** `tool_search`,
`tool_describe`, and `tool_call` only ever see and invoke tools the
session was actually granted. A subagent, kanban worker, or gateway
session restricted to a subset of toolsets cannot use the bridge to
discover or call a tool outside that subset — the deferred catalog is
the deferrable slice of the session's own enabled/disabled toolsets,
not the whole process registry.
- **No JS sandbox.** Hermes uses the simpler "structured tools" mode
(search / describe / call as plain functions). The JS-sandbox "code
mode" some other implementations offer is a large surface area; we
skip it.
## See also
- `tools/tool_search.py` — the implementation
- `tests/tools/test_tool_search.py` — the regression suite
- The `openclaw-tool-search-report` PDF in the original implementation
PR for the research that shaped the design