mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix(mcp): reset circuit breaker on successful OAuth reconnect
Previously the breaker was only cleared when the post-reconnect retry call itself succeeded (via _reset_server_error at the end of the try block). If OAuth recovery succeeded but the retry call happened to fail for a different reason, control fell through to the needs_reauth path which called _bump_server_error — adding to an already-tripped count instead of the fresh count the reconnect justified. With fix #1 in place this would still self-heal on the next cooldown, but we should not pay a 60s stall when we already have positive evidence the server is viable. Move _reset_server_error(server_name) up to immediately after the reconnect-and-ready-wait block, before the retry_call. The subsequent retry still goes through _bump_server_error on failure, so a genuinely broken server re-trips the breaker as normal — but the retry starts from a clean count (1 after a failure), not a stale one.
This commit is contained in:
parent
8cc3cebca2
commit
484d151e99
1 changed files with 10 additions and 0 deletions
|
|
@ -1429,6 +1429,16 @@ def _handle_auth_error_and_retry(
|
|||
break
|
||||
time.sleep(0.25)
|
||||
|
||||
# A successful OAuth recovery is independent evidence that the
|
||||
# server is viable again, so close the circuit breaker here —
|
||||
# not only on retry success. Without this, a reconnect
|
||||
# followed by a failing retry would leave the breaker pinned
|
||||
# above threshold forever (the retry-exception branch below
|
||||
# bumps the count again). The post-reset retry still goes
|
||||
# through _bump_server_error on failure, so a genuinely broken
|
||||
# server will re-trip the breaker as normal.
|
||||
_reset_server_error(server_name)
|
||||
|
||||
try:
|
||||
result = retry_call()
|
||||
try:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue