fix(agent): exclude ssl.SSLError from is_local_validation_error to prevent non-retryable abort

ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from
OSError *and* ValueError via Python's MRO. The is_local_validation_error
check used isinstance(api_error, (ValueError, TypeError)) to detect
programming bugs that should abort immediately — but this inadvertently
caught ssl.SSLError, treating a TLS transport failure as a non-retryable
client error.

The error classifier already maps SSLCertVerificationError to
FailoverReason.timeout with retryable=True (its type name is in
_TRANSPORT_ERROR_TYPES), but the inline isinstance guard was overriding
that classification and triggering an unnecessary abort.

Fix: add ssl.SSLError to the exclusion list alongside the existing
UnicodeEncodeError carve-out so TLS errors fall through to the
classifier's retryable path.

Closes #14367
This commit is contained in:
Bartok9 2026-04-23 14:36:55 +07:00 committed by Teknium
parent ba44a3d256
commit 4e27e498f1

View file

@ -31,6 +31,7 @@ logger = logging.getLogger(__name__)
import os
import random
import re
import ssl
import sys
import tempfile
import time
@ -10923,6 +10924,14 @@ class AIAgent:
and not isinstance(
api_error, (UnicodeEncodeError, json.JSONDecodeError)
)
# ssl.SSLError (and its subclass SSLCertVerificationError)
# inherits from OSError *and* ValueError via Python MRO,
# so the isinstance(ValueError) check above would
# misclassify a TLS transport failure as a local
# programming bug and abort without retrying. Exclude
# ssl.SSLError explicitly so the error classifier's
# retryable=True mapping takes effect instead.
and not isinstance(api_error, ssl.SSLError)
)
is_client_error = (
is_local_validation_error