feat(gateway): extract auto-TTS markdown strip into prepare_tts_text() hook

Refactor the inlined `re.sub(...)[:4000].strip()` cleanup at the
auto-TTS site in `_process_message_background` into an overridable
method `BasePlatformAdapter.prepare_tts_text(text: str) -> str`.

The default implementation is byte-identical to the previous inline
expression — strip `* _ \` # [ ] ( )` and truncate to 4000 chars — so
every existing adapter (Telegram, Discord, Slack, Matrix, IRC, etc.)
gets exactly the same behaviour as before. Zero behaviour change for
any consumer that doesn't override the method.

Why add the hook: voice-first platform adapters need stricter
cleanup than text-bubble platforms. The default strips a handful of
markdown sigils, which is fine when the output goes into a Discord
embed or a Telegram message bubble — but read aloud by a TTS engine,
URLs (`https://example.com/foo`), fenced code blocks, file paths
(`/Users/x/foo.py`), and `MEDIA:` tags turn into long sequences of
unintelligible characters. With this hook an adapter can drop those
spans before TTS while leaving the data-channel transcript intact
for visual rendering.

Without the hook, voice adapters have to either
  - duplicate the auto-TTS flow inside their own `handle_response`
    pipeline, which means re-implementing the entire `extract_media`,
    `extract_images`, `extract_local_files`, attachment routing and
    error-handling sequence in `_process_message_background`, or
  - live with TTS speaking URLs character-by-character.

Both are worse than a 7-line method addition.

Example consumer:
  https://github.com/kortexa-ai/hermes-livekit — LiveKit WebRTC voice
  gateway plugin. Its `LiveKitAdapter.prepare_tts_text()` additionally
  strips fenced code blocks, inline code, URLs, file paths, and
  `MEDIA:` tags before TTS synthesis, while the full response still
  reaches connected clients via the data channel. Drop-in installable
  via `pip install git+https://github.com/kortexa-ai/hermes-livekit.git`.

Carved out of #3894 (LiveKit WebRTC gateway PR) so the generic hook
can land independently of the LiveKit platform itself.
This commit is contained in:
Franci Penov 2026-05-14 22:37:51 -07:00 committed by Teknium
parent a3017508bf
commit 0afab4a32b

View file

@ -2014,6 +2014,13 @@ class BasePlatformAdapter(ABC):
text = f"{caption}\n{text}"
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to, metadata=metadata)
def prepare_tts_text(self, text: str) -> str:
"""Prepare text for TTS. Override to filter tool output, code, etc.
Default strips markdown formatting and truncates to 4000 chars.
"""
return re.sub(r'[*_`#\[\]()]', '', text)[:4000].strip()
async def play_tts(
self,
chat_id: str,
@ -3144,7 +3151,7 @@ class BasePlatformAdapter(ABC):
from tools.tts_tool import text_to_speech_tool, check_tts_requirements
if check_tts_requirements():
import json as _json
speech_text = re.sub(r'[*_`#\[\]()]', '', text_content)[:4000].strip()
speech_text = self.prepare_tts_text(text_content)
if not speech_text:
raise ValueError("Empty text after markdown cleanup")
tts_result_str = await asyncio.to_thread(