fix: exclude json.JSONDecodeError and UnicodeError from local validation error check

json.JSONDecodeError inherits from ValueError. When the OpenAI SDK fails
to parse a provider response (empty body, truncated JSON), it raises
json.JSONDecodeError. The is_local_validation_error check in run_agent.py
catches it as isinstance(api_error, ValueError) and triggers a
non-retryable abort -- but the error is transient (provider-side), not a
programming bug.

Similarly, UnicodeDecodeError (also ValueError subclass) from garbled
provider responses was incorrectly flagged. The existing code only
excluded UnicodeEncodeError.

The error classifier already correctly returns retryable=True for these
errors; the bug was the isinstance check overriding the classifier.

Fix: Exclude json.JSONDecodeError and UnicodeError (parent of
Encode/Decode/Translate) from the is_local_validation_error check,
so they fall through to the classifier's retryable path.

Closes #14271

Co-authored-by: AJ <aj@users.noreply.github.com>
This commit is contained in:
AJ 2026-04-23 00:33:17 -04:00
parent 4fade39c90
commit 91e3dca8ce
2 changed files with 154 additions and 17 deletions

View file

@ -11335,25 +11335,30 @@ class AIAgent:
# already accounts for 413, 429, 529 (transient), context
# overflow, and generic-400 heuristics. Local validation
# errors (ValueError, TypeError) are programming bugs.
# Exclude UnicodeEncodeError — it's a ValueError subclass
# but is handled separately by the surrogate sanitization
# path above. Exclude json.JSONDecodeError — also a
# ValueError subclass, but it indicates a transient
# provider/network failure (malformed response body,
# truncated stream, routing layer corruption), not a
# local programming bug, and should be retried (#14782).
#
# Exclusions — ValueError subclasses that originate from
# the *provider*, not our code:
# - json.JSONDecodeError: provider returned empty/malformed
# JSON body; the OpenAI SDK raises this during response
# parsing. Transient — retry or failover.
# - UnicodeError (parent of Encode/Decode/Translate):
# garbled bytes from the provider. The original code
# only excluded UnicodeEncodeError but a garbled
# response body can also cause UnicodeDecodeError.
#
# ssl.SSLError (and its subclass SSLCertVerificationError)
# inherits from OSError *and* ValueError via Python MRO,
# so the isinstance(ValueError) check above would
# misclassify a TLS transport failure as a local
# programming bug and abort without retrying. Exclude
# ssl.SSLError explicitly so the error classifier's
# retryable=True mapping takes effect instead.
#
# COUPLING: tests/test_json_decode_error_misclassification.py
# mirrors this predicate — update both if changing.
is_local_validation_error = (
isinstance(api_error, (ValueError, TypeError))
and not isinstance(
api_error, (UnicodeEncodeError, json.JSONDecodeError)
)
# ssl.SSLError (and its subclass SSLCertVerificationError)
# inherits from OSError *and* ValueError via Python MRO,
# so the isinstance(ValueError) check above would
# misclassify a TLS transport failure as a local
# programming bug and abort without retrying. Exclude
# ssl.SSLError explicitly so the error classifier's
# retryable=True mapping takes effect instead.
and not isinstance(api_error, (UnicodeError, json.JSONDecodeError))
and not isinstance(api_error, ssl.SSLError)
)
is_client_error = (