security: sanitize tool error strings before injecting into model context (#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of #3838's three-feature scout).
The truncated-tool-call (#1632) and empty-response-recovery (#1677,
#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
This commit is contained in:
Teknium 2026-05-16 00:57:39 -07:00 committed by GitHub
parent 70b663504f
commit 627f8a5f1d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 191 additions and 2 deletions

View file

@ -404,7 +404,16 @@ class ToolRegistry:
return entry.handler(args, **kwargs)
except Exception as e:
logger.exception("Tool %s dispatch error: %s", name, e)
return json.dumps({"error": f"Tool execution failed: {type(e).__name__}: {e}"})
# Route through the sanitizer so framing tokens / CDATA / fences
# in exception strings don't reach the model as structural noise.
# See model_tools._sanitize_tool_error for rationale.
raw = f"Tool execution failed: {type(e).__name__}: {e}"
try:
from model_tools import _sanitize_tool_error
sanitized = _sanitize_tool_error(raw)
except Exception:
sanitized = raw # defensive: never let the sanitizer block error propagation
return json.dumps({"error": sanitized})
# ------------------------------------------------------------------
# Query helpers (replace redundant dicts in model_tools.py)