fix(error_classifier): don't classify generic 404 as model_not_found (#14013)

The 404 branch in _classify_by_status had dead code: the generic fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the exact same classification (model_not_found + should_fallback=True), so every 404 — regardless of message — was treated as a missing model. This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s usually mean a wrong endpoint path, proxy routing glitch, or transient backend issue — not a missing model. Claiming 'model not found' misleads the next turn and silently falls back to another provider when the real problem was a URL typo the user should see. Fix: only classify 404 as model_not_found when the message actually matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found", etc.). Otherwise fall through as unknown (retryable) so the real error surfaces in the retry loop. Test updated to match the new behavior. 103 error_classifier tests pass.
2026-04-25 00:51:20 +00:00 · 2026-04-22 06:11:47 -07:00 · 2026-04-22 06:11:47 -07:00 · 77e04a29d5
commit 77e04a29d5
parent 40619b393f
2 changed files with 16 additions and 5 deletions
--- a/agent/error_classifier.py
+++ b/agent/error_classifier.py
@ -470,11 +470,16 @@ def _classify_by_status(
                retryable=False,
                should_fallback=True,
            )
-        # Generic 404 — could be model or endpoint
+        # Generic 404 with no "model not found" signal — could be a wrong
+        # endpoint path (common with local llama.cpp / Ollama / vLLM when
+        # the URL is slightly misconfigured), a proxy routing glitch, or
+        # a transient backend issue.  Classifying these as model_not_found
+        # silently falls back to a different provider and tells the model
+        # the model is missing, which is wrong and wastes a turn.  Treat
+        # as unknown so the retry loop surfaces the real error instead.
        return result_fn(
-            FailoverReason.model_not_found,
-            retryable=False,
-            should_fallback=True,
+            FailoverReason.unknown,
+            retryable=True,
        )

    if status_code == 413: