fix(error-classifier): treat 5xx request-validation errors as non-retryable

Standard OpenAI returns request-validation failures (unknown/
unsupported parameter, malformed request) as 4xx. Some
OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me
returns 502 for an unknown parameter.

The generic '5xx -> retryable server_error' rule then misfires: the
error is deterministic (every retry gets the identical rejection), so
the retry loop burns all 3 attempts, the transport-recovery path
resets the counter and burns 3 more, and the result is a request
flood against a request that can never succeed.

Fix: when a 500/502 body carries an unambiguous request-validation
signal — 'unknown parameter' / 'unsupported parameter' /
'invalid_request_error' in the message text, or invalid_request_error
/ unknown_parameter / unsupported_parameter as the structured error
code — classify as a non-retryable format_error so the loop fails
fast and falls back. Genuine 502 Bad Gateway with no such signal
stays retryable as before.

Origin: local-author
Upstream-PR: none
Patch-State: local-only
This commit is contained in:
Soju 2026-05-15 11:26:58 +09:00 committed by Teknium
parent 775a17284f
commit 6212e9ade8
2 changed files with 93 additions and 0 deletions

View file

@ -240,6 +240,24 @@ _MODEL_NOT_FOUND_PATTERNS = [
"unsupported model",
]
# Request-validation patterns — the request is malformed and will fail
# identically on every retry. Some OpenAI-compatible gateways (notably
# codex.nekos.me) return these as 5xx instead of the standard 4xx, which
# makes the generic "5xx → retryable server_error" rule misfire: the retry
# loop hammers the same deterministic rejection 3+ times, then the
# transport-recovery path resets the counter and does it again, producing
# a request flood. When a 5xx body carries one of these unambiguous
# request-validation signals, classify as a non-retryable format_error so
# the loop fails fast and falls back instead of looping.
_REQUEST_VALIDATION_PATTERNS = [
"unknown parameter",
"unsupported parameter",
"unrecognized request argument",
"invalid_request_error",
"unknown_parameter",
"unsupported_parameter",
]
# OpenRouter aggregator policy-block patterns.
#
# When a user's OpenRouter account privacy setting (or a per-request
@ -745,6 +763,23 @@ def _classify_by_status(
)
if status_code in {500, 502}:
# Some OpenAI-compatible gateways return request-validation errors
# with a 5xx status (codex.nekos.me returns 502 for unknown/
# unsupported parameters). These are deterministic — every retry
# gets the identical rejection — so the generic "5xx → retryable
# server_error" rule turns one bad request into a retry flood.
# Detect the unambiguous request-validation signals (in either the
# message text or the structured error code) and fail fast.
if (
any(p in error_msg for p in _REQUEST_VALIDATION_PATTERNS)
or error_code.lower() in {"invalid_request_error", "unknown_parameter",
"unsupported_parameter"}
):
return result_fn(
FailoverReason.format_error,
retryable=False,
should_fallback=True,
)
return result_fn(FailoverReason.server_error, retryable=True)
if status_code in {503, 529}: