fix(telegram): fall back to no thread_id on 'Message thread not found' (#3390)

python-telegram-bot's BadRequest inherits from NetworkError, so the
send() retry loop was catching 'Message thread not found' as a transient
network error and retrying 3 times before silently failing. This killed
all tool progress messages, streaming responses, and typing indicators
when the incoming message carried an invalid message_thread_id.

Now detect BadRequest inside the NetworkError handler:
- 'thread not found' + thread_id set → clear thread_id and retry once
  (message still reaches the chat, just without topic threading)
- Other BadRequest errors → raise immediately (permanent, don't retry)
- True NetworkError → retry as before (transient)

252 silent failures in gateway.log traced to this on 2026-03-26.

5 new tests for thread fallback, non-thread BadRequest, no-thread sends,
network retry, and multi-chunk fallback.
This commit is contained in:
Teknium 2026-03-27 06:07:28 -07:00 committed by GitHub
parent b7bcae49c6
commit 41d9d08078
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 225 additions and 2 deletions

View file

@ -707,9 +707,15 @@ class TelegramAdapter(BasePlatformAdapter):
except ImportError:
_NetErr = OSError # type: ignore[misc,assignment]
try:
from telegram.error import BadRequest as _BadReq
except ImportError:
_BadReq = None # type: ignore[assignment,misc]
for i, chunk in enumerate(chunks):
should_thread = self._should_thread_reply(reply_to, i)
reply_to_id = int(reply_to) if should_thread else None
effective_thread_id = int(thread_id) if thread_id else None
msg = None
for _send_attempt in range(3):
@ -721,7 +727,7 @@ class TelegramAdapter(BasePlatformAdapter):
text=chunk,
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=reply_to_id,
message_thread_id=int(thread_id) if thread_id else None,
message_thread_id=effective_thread_id,
)
except Exception as md_error:
# Markdown parsing failed, try plain text
@ -733,12 +739,30 @@ class TelegramAdapter(BasePlatformAdapter):
text=plain_chunk,
parse_mode=None,
reply_to_message_id=reply_to_id,
message_thread_id=int(thread_id) if thread_id else None,
message_thread_id=effective_thread_id,
)
else:
raise
break # success
except _NetErr as send_err:
# BadRequest is a subclass of NetworkError in
# python-telegram-bot but represents permanent errors
# (not transient network issues). Detect and handle
# specific cases instead of blindly retrying.
if _BadReq and isinstance(send_err, _BadReq):
err_lower = str(send_err).lower()
if "thread not found" in err_lower and effective_thread_id is not None:
# Thread doesn't exist — retry without
# message_thread_id so the message still
# reaches the chat.
logger.warning(
"[%s] Thread %s not found, retrying without message_thread_id",
self.name, effective_thread_id,
)
effective_thread_id = None
continue
# Other BadRequest errors are permanent — don't retry
raise
if _send_attempt < 2:
wait = 2 ** _send_attempt
logger.warning("[%s] Network error on send (attempt %d/3), retrying in %ds: %s",