The existing ASCII codec handler only sanitized conversation messages,
leaving tool schemas, system prompts, ephemeral prompts, prefill messages,
and HTTP headers as unhandled sources of non-ASCII content. On systems
with LANG=C or non-UTF-8 locale, Unicode symbols in tool descriptions
(e.g. arrows, em-dashes from prompt_builder) and system prompt content
would cause UnicodeEncodeError that fell through to the error path.
Changes:
- Add _sanitize_structure_non_ascii() generic recursive walker for
nested dict/list payloads
- Add _sanitize_tools_non_ascii() thin wrapper for tool schemas
- Add _force_ascii_payload flag: once ASCII locale is detected, all
subsequent API calls get proactively sanitized (prevents recurring
failures from new tool results bringing fresh Unicode each turn)
- Extend the ASCII codec error handler to sanitize: prefill_messages,
tool schemas (self.tools), system prompt, ephemeral system prompt,
and default HTTP headers
- Update stale comment that acknowledged the gap
Cherry-picked from PR #8834 (credential pool changes dropped as
separate concern).
Broaden the UnicodeEncodeError recovery to handle systems with ASCII-only
locale (LANG=C, Chromebooks) where ANY non-ASCII character causes encoding
failure, not just lone surrogates.
Changes:
- Add _strip_non_ascii() and _sanitize_messages_non_ascii() helpers that
strip all non-ASCII characters from message content, name, and tool_calls
- Update the UnicodeEncodeError handler to detect ASCII codec errors and
fall back to non-ASCII sanitization after surrogate check fails
- Sanitize tool_calls arguments and name fields (not just content)
- Fix bare .encode() in cli.py suspend handler to use explicit utf-8
- Add comprehensive test suite (17 tests)