Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.
Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.
Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:
- Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
'tools silently missing from isolated cron turns' regression class
(openclaw#84141) by construction: there is no code path that can
drop a core tool.
- Catalog is stateless across turns — rebuilt from the live tool-defs
list on every assembly. No session-keyed Map that can drift out of
sync with the registry.
- tool_call unwraps the bridge call before any hook fires, so plugin
pre/post hooks, guardrails, approval flows, and the activity feed
all see the underlying tool name, not the bridge (addresses
openclaw#85588 and the verbose-mode complaint on openclaw#79823).
- The unwrap happens in both the parallel and sequential paths of
agent/tool_executor.py and also in handle_function_call, so direct
callers (sandboxed code, eval harnesses) are covered too.
- Bridge tools cannot invoke each other (recursion guard) and cannot
invoke core tools (those must be called directly).
- Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
- Token estimation via cheap char/4 heuristic; precision isn't needed
for the threshold decision.
Files:
- tools/tool_search.py — new module (BM25 retrieval, classification,
threshold gate, bridge dispatch, unwrap helper).
- tests/tools/test_tool_search.py — 35 tests including the OpenClaw
#84141 regression guard.
- model_tools.py — wires assembly into _compute_tool_definitions as the
final step, adds skip_tool_search_assembly kwarg so the bridge can
see the real catalog, dispatches the three bridge tools.
- agent/tool_executor.py — unwraps tool_call in both parallel and
sequential parsing loops so checkpointing, guardrails, plugin hooks,
and tool-progress callbacks all observe the underlying tool name.
- hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
- website/docs/user-guide/features/tool-search.md — user docs.
Validation:
- 35/35 new tests pass.
- Existing tool/registry/model_tools/config/coercion/executor tests
(82 + 74 + small adjacents) green.
- Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
3 bridges, tool_search returns top 3 hits, tool_describe returns
full schema, tool_call dispatches to the real underlying handler
and the underlying result is what the model sees.
- Reserved-name recursion guard verified live.
- Core-tool refusal via tool_call verified live.
6.2 KiB
| title | sidebar_position |
|---|---|
| Tool Search | 95 |
Tool Search
When you have many MCP servers or non-core plugin tools attached to a session, their JSON schemas can consume a substantial fraction of the context window on every turn — even when only a few of them are relevant to what the user actually asked for.
Tool Search is Hermes' opt-in progressive-disclosure layer for that problem. When activated, MCP and plugin tools are replaced in the model-visible tools array by three bridge tools, and the model loads each specific tool's schema on demand.
:::info Built-in Hermes tools never defer
The tools that make up Hermes' core capability set (terminal,
read_file, write_file, patch, search_files, todo, memory,
browser_*, web_search, web_extract, clarify, execute_code,
delegate_task, session_search, send_message, and the rest of
_HERMES_CORE_TOOLS) are always loaded directly. Only MCP tools and
non-core plugin tools are eligible for deferral.
:::
How it works
When Tool Search activates for a turn, the model sees three new tools in place of the deferred ones:
tool_search(query, limit?) — search the deferred-tool catalog
tool_describe(name) — load the full schema for one tool
tool_call(name, arguments) — invoke a deferred tool
A typical interaction looks like:
Model: tool_search("create a github issue")
→ { matches: [{ name: "mcp_github_create_issue", ... }, ...] }
Model: tool_describe("mcp_github_create_issue")
→ { parameters: { type: "object", properties: { ... } } }
Model: tool_call("mcp_github_create_issue", { title: "...", body: "..." })
→ { ok: true, issue_number: 42 }
When the model invokes tool_call, Hermes unwraps the bridge and
dispatches the underlying tool exactly as if the model had called it
directly. Pre-tool-call hooks, guardrails, approval prompts, and
post-tool-call hooks all run against the real tool name — not against
tool_call. The activity feed in the CLI and gateway also unwraps so you
see the underlying tool, not the bridge.
When does it activate?
By default Tool Search runs in auto mode: it activates only when the
deferrable tool schemas would consume at least 10% of the active model's
context window. Below that, the tools-array assembly is a pure
pass-through and you pay no overhead.
This decision is re-evaluated every time the tools array is built, so:
- A session with just a few MCP tools and a long context model never activates Tool Search.
- A session with many MCP servers attached (15+ tools typically) starts activating it.
- Removing MCP servers mid-session correctly returns to direct exposure on the next assembly.
Configuration
tools:
tool_search:
enabled: auto # auto (default), on, or off
threshold_pct: 10 # percentage of context — only used in auto mode
search_default_limit: 5
max_search_limit: 20
| Key | Default | Meaning |
|---|---|---|
enabled |
auto |
auto activates above threshold; on always activates if there's at least one deferrable tool; off disables entirely. |
threshold_pct |
10 |
Percentage of context length at which auto mode kicks in. Range 0–100. |
search_default_limit |
5 |
Hits returned when the model calls tool_search without a limit. |
max_search_limit |
20 |
Hard upper bound the model can request via limit. Range 1–50. |
You can also flip the legacy boolean shape:
tools:
tool_search: true # equivalent to {enabled: auto}
When NOT to use it
Tool Search trades a fixed per-turn token cost (the three bridge tool schemas, ~300 tokens) and at least one extra round trip (search → describe → call) for the savings on the deferred schemas. It's a clear win when you have many tools and use few per turn; it's overhead when you have few tools total.
The auto default handles this for you. If you set enabled: on
unconditionally, expect a slight per-turn cost on small toolsets.
Trade-offs that don't go away
These come from the prompt-cache integrity invariant — they are inherent to any progressive-disclosure design, not specific to this implementation:
- One extra round trip on cold tools. The first time the model needs a deferred tool, it spends one or two extra model calls to find and load the schema. The token savings on the static side are real, but a portion is paid back at runtime.
- No cache benefit on deferred schemas. A loaded
tool_describeresult enters the conversation history (so it does get cached on subsequent turns) but it never benefits from the system-prompt cache prefix. - Model-quality dependence. Tool Search assumes the model can write a reasonable search query for the tool it wants. Smaller models do this less well; the published Anthropic numbers (49% → 74% on Opus 4 with vs. without tool search) show the upside but also that ~26 points of accuracy is still retrieval failure.
- Toolset edits invalidate cache. Adding or removing a tool mid- session changes the bridge tools' descriptions (which include the count of deferred tools) and the catalog, so the prompt cache is invalidated. This is the same trade-off as any toolset edit.
Implementation details
- Retrieval: BM25 over tokenized tool name + description + parameter
names. Falls back to a literal substring match on the tool name when
BM25 returns no positive-score hits, which protects against
zero-IDF degenerate cases (e.g. searching
"github"against a catalog where every tool name contains "github"). - Catalog is stateless across turns. It rebuilds from the current
tool-defs list every assembly — no session-keyed
Map. This avoids the class of bug where a stored catalog drifts out of sync with the live tool registry. - No JS sandbox. Hermes uses the simpler "structured tools" mode (search / describe / call as plain functions). The JS-sandbox "code mode" some other implementations offer is a large surface area; we skip it.
See also
tools/tool_search.py— the implementationtests/tools/test_tool_search.py— the regression suite- The
openclaw-tool-search-reportPDF in the original implementation PR for the research that shaped the design