fix: align auth-by-message classification with status-code path, decode URLs before secret check

error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True.  The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback.  Align the message-based path to
match: retryable=False, should_fallback=True.

web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196).  URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII.  Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
aaronagent 2026-04-10 12:00:31 +08:00 committed by Teknium
parent 37bb4f807b
commit 738f0bac13
2 changed files with 5 additions and 2 deletions

View file

@ -1190,10 +1190,12 @@ async def web_extract_tool(
Raises:
Exception: If extraction fails or API key is not set
"""
# Block URLs containing embedded secrets (exfiltration prevention)
# Block URLs containing embedded secrets (exfiltration prevention).
# URL-decode first so percent-encoded secrets (%73k- = sk-) are caught.
from agent.redact import _PREFIX_RE
from urllib.parse import unquote
for _url in urls:
if _PREFIX_RE.search(_url):
if _PREFIX_RE.search(_url) or _PREFIX_RE.search(unquote(_url)):
return json.dumps({
"success": False,
"error": "Blocked: URL contains what appears to be an API key or token. "