fix: strip ANSI at the source — clean terminal output before it reaches the model

Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.

Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.

Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
  (incl. private-mode, colon-separated, intermediate bytes), OSC (both
  terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)

Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
This commit is contained in:
Teknium 2026-03-23 06:50:39 -07:00
parent 6302e56e7c
commit 934fbe3c06
No known key found for this signature in database
7 changed files with 236 additions and 21 deletions

View file

@ -309,3 +309,6 @@ class TestSearchHints:
raw = search_tool(pattern="foo", offset=50, limit=50)
assert "[Hint:" in raw
assert "offset=100" in raw