fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization

The recovery block previously only retried (continue) when one of the
per-component sanitization checks (messages, tools, system prompt,
headers, credentials) found and stripped non-ASCII content.  When the
non-ASCII lived only in api_messages' reasoning_content field (which
is built from messages['reasoning'] and not checked by the original
_sanitize_messages_non_ascii), all checks returned False and the
recovery fell through to the normal error path — burning a retry
attempt despite _force_ascii_payload being set.

Now the recovery always continues (retries) when _is_ascii_codec is
detected.  The _force_ascii_payload flag guarantees the next iteration
runs _sanitize_structure_non_ascii(api_kwargs) on the full API payload,
catching any remaining non-ASCII regardless of where it lives.

Also adds test for the 'reasoning' field on canonical messages.

Fixes #6843
This commit is contained in:
Teknium 2026-04-15 14:56:55 -07:00 committed by Teknium
parent 902f1e6ede
commit 93b6f45224
2 changed files with 36 additions and 9 deletions

View file

@ -9186,22 +9186,34 @@ class AIAgent:
force=True,
)
if (
# Always retry on ASCII codec detection —
# _force_ascii_payload guarantees the full
# api_kwargs payload is sanitized on the
# next iteration (line ~8475). Even when
# per-component checks above find nothing
# (e.g. non-ASCII only in api_messages'
# reasoning_content), the flag catches it.
# Bounded by _unicode_sanitization_passes < 2.
self._unicode_sanitization_passes += 1
_any_sanitized = (
_messages_sanitized
or _prefill_sanitized
or _tools_sanitized
or _system_sanitized
or _headers_sanitized
or _credential_sanitized
):
self._unicode_sanitization_passes += 1
)
if _any_sanitized:
self._vprint(
f"{self.log_prefix}⚠️ System encoding is ASCII — stripped non-ASCII characters from request payload. Retrying...",
force=True,
)
continue
# Nothing to sanitize in any payload component.
# Fall through to normal error path.
else:
self._vprint(
f"{self.log_prefix}⚠️ System encoding is ASCII — enabling full-payload sanitization for retry...",
force=True,
)
continue
status_code = getattr(api_error, "status_code", None)
error_context = self._extract_api_error_context(api_error)