fix(fallback): skip chain entries matching current provider/model/base_url (#22780)

_try_activate_fallback() walked the chain by index without comparing
the candidate entry against the currently-failing backend. So a
misconfigured chain that listed the same provider+model as the primary,
or two custom_providers entries pointing at the same shim URL, would
loop the same failure 3x for the same backend.

After the fix, advance() skips:
  - entries where (provider, model) match the current agent's
  - entries with a base_url + model matching the current backend
    (catches two custom_providers names pointing at the same shim)

Recursing through self._try_activate_fallback() continues to the next
chain entry; if everything matches, returns False and the caller
moves on without retrying the same broken path.

3 regression tests covering same-provider-same-model skip, same-base_url-
same-model skip, and the all-self-matching-returns-False exhaustion path.

Closes #22548 (the Hermes-side portion). The 120s timeout itself in
the downstream claude-cli shim is a deployment concern documented in
that issue's wherewolf87 comment.
This commit is contained in:
Teknium 2026-05-09 12:48:19 -07:00 committed by GitHub
parent 70bc52e408
commit e7c0d6ee53
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 111 additions and 0 deletions

View file

@ -8042,6 +8042,32 @@ class AIAgent:
if not fb_provider or not fb_model:
return self._try_activate_fallback() # skip invalid, try next
# Skip entries that resolve to the current (provider, model) — falling
# back to the same backend that just failed loops the failure. Compare
# base_url too so two distinct custom_providers entries pointing at the
# same shim/proxy URL also dedup. See issue #22548.
current_provider = (getattr(self, "provider", "") or "").strip().lower()
current_model = (getattr(self, "model", "") or "").strip()
current_base_url = str(getattr(self, "base_url", "") or "").rstrip("/").lower()
fb_base_url_for_dedup = (fb.get("base_url") or "").strip().rstrip("/").lower()
if fb_provider == current_provider and fb_model == current_model:
logging.warning(
"Fallback skip: chain entry %s/%s matches current provider/model",
fb_provider, fb_model,
)
return self._try_activate_fallback()
if (
fb_base_url_for_dedup
and current_base_url
and fb_base_url_for_dedup == current_base_url
and fb_model == current_model
):
logging.warning(
"Fallback skip: chain entry base_url %s matches current backend",
fb_base_url_for_dedup,
)
return self._try_activate_fallback()
# Use centralized router for client construction.
# raw_codex=True because the main agent needs direct responses.stream()
# access for Codex providers.