fix(mcp): reset circuit breaker on successful OAuth reconnect

Previously the breaker was only cleared when the post-reconnect retry call itself succeeded (via _reset_server_error at the end of the try block). If OAuth recovery succeeded but the retry call happened to fail for a different reason, control fell through to the needs_reauth path which called _bump_server_error — adding to an already-tripped count instead of the fresh count the reconnect justified. With fix #1 in place this would still self-heal on the next cooldown, but we should not pay a 60s stall when we already have positive evidence the server is viable. Move _reset_server_error(server_name) up to immediately after the reconnect-and-ready-wait block, before the retry_call. The subsequent retry still goes through _bump_server_error on failure, so a genuinely broken server re-trips the breaker as normal — but the retry starts from a clean count (1 after a failure), not a stale one.
2026-06-17 09:41:58 +00:00 · 2026-04-21 19:20:15 +10:00 · 2026-04-21 19:20:15 +10:00 · 484d151e99
commit 484d151e99
parent 8cc3cebca2
1 changed files with 10 additions and 0 deletions
--- a/tools/mcp_tool.py
+++ b/tools/mcp_tool.py
@ -1429,6 +1429,16 @@ def _handle_auth_error_and_retry(
                        break
                    time.sleep(0.25)

+        # A successful OAuth recovery is independent evidence that the
+        # server is viable again, so close the circuit breaker here —
+        # not only on retry success. Without this, a reconnect
+        # followed by a failing retry would leave the breaker pinned
+        # above threshold forever (the retry-exception branch below
+        # bumps the count again).  The post-reset retry still goes
+        # through _bump_server_error on failure, so a genuinely broken
+        # server will re-trip the breaker as normal.
+        _reset_server_error(server_name)
+
        try:
            result = retry_call()
            try: