fix: use UTF-16 length for Telegram stream consumer message splitting

The stream consumer measured message length using Python's len() (Unicode code points), but Telegram's actual limit is in UTF-16 code units. This caused messages with supplementary characters (emoji, CJK, etc.) to exceed Telegram's 4096-character limit, resulting in truncated messages with formatting artifacts. Changes: - Add message_len_fn property to BasePlatformAdapter (defaults to len) - Override in TelegramAdapter to return utf16_len - Stream consumer uses adapter.message_len_fn for: - safe_limit calculation - overflow detection - truncate_message calls - split point calculation (via _custom_unit_to_cp) - fallback final send chunking Fixes truncated messages with black square artifacts on Telegram when the model generates responses containing multi-byte Unicode characters.
2026-07-19 15:18:03 +00:00 · 2026-04-16 13:05:22 -05:00 · 2026-04-16 13:05:22 -05:00 · c0da5d09a6
commit c0da5d09a6
parent c5f1f863ac
3 changed files with 54 additions and 12 deletions
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@ -1311,6 +1311,15 @@ class BasePlatformAdapter(ABC):
        # _keep_typing skips send_typing when the chat_id is in this set.
        self._typing_paused: set = set()

+    @property
+    def message_len_fn(self) -> Callable[[str], int]:
+        """Return the length function for measuring message size on this platform.
+
+        Override in adapters whose platform counts characters differently from
+        Python ``len`` (e.g. Telegram counts UTF-16 code units).
+        """
+        return len
+
    @property
    def has_fatal_error(self) -> bool:
        return self._fatal_error_message is not None
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@ -283,6 +283,11 @@ class TelegramAdapter(BasePlatformAdapter):
    MEDIA_GROUP_WAIT_SECONDS = 0.8
    _GENERAL_TOPIC_THREAD_ID = "1"

+    @property
+    def message_len_fn(self):
+        """Telegram measures message length in UTF-16 code units."""
+        return utf16_len
+
    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.TELEGRAM)
        self._app: Optional[Application] = None