feat(gateway): support [[as_document]] directive for skill media routing

Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB) currently lose quality in Telegram delivery because `_IMAGE_EXTS` membership routes the file through `send_multiple_images` → `sendMediaGroup`, which Telegram's server re-encodes to JPEG @ 1280px max edge. The original bytes only survive when the file goes through `send_document`, which the dispatch tables in three places (`_process_message_background`, `_deliver_media_from_response`, and the `send_message` tool's telegram path) only reach for files whose extension is NOT in `_IMAGE_EXTS`. This commit adds an `[[as_document]]` directive that mirrors the existing `[[audio_as_voice]]` shape: a skill emits the directive once in its response, and every image-extension MEDIA: file in that response is delivered via `send_document` instead of `send_multiple_images` / `sendPhoto`. The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from the user-visible cleaned text in `extract_media` so it never leaks. Granularity is intentionally all-or-nothing per response, matching [[audio_as_voice]]'s scope. Skills that need fine control can split into two responses. Verified the targeted use case: info-graph emits 信息图已生成（...） [[as_document]] MEDIA:/tmp/info-graph-x/infographic.jpg → Telegram receives `infographic.jpg` via sendDocument, original 1MB JPEG bytes preserved, no recompression. Forwarding and download filenames stay clean (`infographic.jpg`). Tests: +3 cases in TestExtractMedia covering directive strip, isolation from voice flag, and coexistence with [[audio_as_voice]]. All 113 pre-existing media/extract/send tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 03:31:55 +00:00 · 2026-05-03 11:20:00 +08:00 · 2026-05-03 11:20:00 +08:00 · d34f03c32a
commit d34f03c32a
parent 8d363f8d54
4 changed files with 94 additions and 14 deletions
--- a/tests/gateway/test_platform_base.py
+++ b/tests/gateway/test_platform_base.py
@ -329,6 +329,37 @@ class TestExtractMedia:
        assert media == [("/tmp/Jane Doe/speech.flac", False)]
        assert cleaned == ""

+    def test_as_document_directive_stripped_from_cleaned_text(self):
+        """[[as_document]] is a routing directive — strip it from
+        user-visible text just like [[audio_as_voice]]. Callers detect the
+        directive on the original content (before extract_media)."""
+        content = "Here is your infographic:\n[[as_document]]\nMEDIA:/tmp/x.jpg"
+        media, cleaned = BasePlatformAdapter.extract_media(content)
+        assert media == [("/tmp/x.jpg", False)]
+        assert "[[as_document]]" not in cleaned
+        assert "Here is your infographic" in cleaned
+
+    def test_as_document_directive_alone_does_not_attach_voice_flag(self):
+        """[[as_document]] is independent of [[audio_as_voice]] — combining
+        them in the same response should not entangle the flags."""
+        content = "[[as_document]]\nMEDIA:/tmp/x.jpg"
+        media, cleaned = BasePlatformAdapter.extract_media(content)
+        assert media == [("/tmp/x.jpg", False)]  # voice flag stays False
+        assert "[[as_document]]" not in cleaned
+
+    def test_both_directives_can_coexist(self):
+        """A response could (rarely) contain both [[audio_as_voice]] for an
+        ogg file AND [[as_document]] for an attached image. The voice flag
+        propagates per-tuple; [[as_document]] is detected at dispatch."""
+        content = "[[audio_as_voice]]\n[[as_document]]\nMEDIA:/tmp/x.ogg"
+        media, cleaned = BasePlatformAdapter.extract_media(content)
+        # Voice flag is propagated to every media tuple (this matches the
+        # existing extract_media contract)
+        assert media == [("/tmp/x.ogg", True)]
+        # Both directives stripped from cleaned text
+        assert "[[audio_as_voice]]" not in cleaned
+        assert "[[as_document]]" not in cleaned
+

 # ---------------------------------------------------------------------------
 # should_send_media_as_audio