fix: use UTF-16 length for Telegram stream consumer message splitting

The stream consumer measured message length using Python's len() (Unicode code points), but Telegram's actual limit is in UTF-16 code units. This caused messages with supplementary characters (emoji, CJK, etc.) to exceed Telegram's 4096-character limit, resulting in truncated messages with formatting artifacts. Changes: - Add message_len_fn property to BasePlatformAdapter (defaults to len) - Override in TelegramAdapter to return utf16_len - Stream consumer uses adapter.message_len_fn for: - safe_limit calculation - overflow detection - truncate_message calls - split point calculation (via _custom_unit_to_cp) - fallback final send chunking Fixes truncated messages with black square artifacts on Telegram when the model generates responses containing multi-byte Unicode characters.
2026-07-20 15:33:54 +00:00 · 2026-04-16 13:05:22 -05:00 · 2026-04-16 13:05:22 -05:00 · c0da5d09a6
commit c0da5d09a6
parent c5f1f863ac
3 changed files with 54 additions and 12 deletions
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@ -1311,6 +1311,15 @@ class BasePlatformAdapter(ABC):
        # _keep_typing skips send_typing when the chat_id is in this set.
        self._typing_paused: set = set()

+    @property
+    def message_len_fn(self) -> Callable[[str], int]:
+        """Return the length function for measuring message size on this platform.
+
+        Override in adapters whose platform counts characters differently from
+        Python ``len`` (e.g. Telegram counts UTF-16 code units).
+        """
+        return len
+
    @property
    def has_fatal_error(self) -> bool:
        return self._fatal_error_message is not None