fix(image-routing): sniff magic bytes for image MIME, ignore misleading suffix

Discord (and similar platforms) can serve a PNG image cached as
discord_xxx.webp because the CDN reports content_type=image/webp for
proxied stickers, custom emoji, and certain bot-uploaded images even
when the actual bytes are PNG. Hermes' agent.image_routing._guess_mime
trusted the file suffix and declared media_type=image/webp to
Anthropic, which strict-validates and returns:

  HTTP 400 messages.N.content.M.image.source.base64:
  The image was specified using the image/webp media type,
  but the image appears to be a image/png image

The Discord image attachment never reaches the model; the whole turn
fails with no salvage path.

Fix: sniff magic bytes in _file_to_data_url before declaring MIME.
Suffix-based detection is kept as a fallback when bytes aren't
available. New helper _sniff_mime_from_bytes covers PNG, JPEG, GIF,
WEBP, BMP, and HEIC/HEIF.

Tests:
- Two existing tests asserted the old broken behaviour (PNG bytes in
  a .jpg/.webp file should report jpeg/webp); rewritten with real
  jpeg/webp magic bytes so they still cover suffix-aligned cases.
- New regression test test_mime_sniff_overrides_misleading_extension
  reproduces the exact Discord scenario (PNG bytes, .webp suffix) and
  asserts the data URL comes back as image/png.

All 28 tests in tests/agent/test_image_routing.py pass.
This commit is contained in:
shashwatgokhe 2026-05-03 11:55:51 +00:00 committed by Teknium
parent 5ead126709
commit 5cf703245b
2 changed files with 63 additions and 4 deletions

View file

@ -217,19 +217,34 @@ class TestBuildNativeContentParts:
assert str(img2) in text_part["text"]
def test_mime_inference_jpg(self, tmp_path: Path):
# Real JPEG bytes (SOI marker FF D8 FF): sniffing now wins over suffix.
img = tmp_path / "photo.jpg"
img.write_bytes(_png_bytes()) # bytes are PNG but extension is jpg
img.write_bytes(b"\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01" + b"\x00" * 32)
parts, _ = build_native_content_parts("x", [str(img)])
url = parts[1]["image_url"]["url"]
assert url.startswith("data:image/jpeg;base64,")
def test_mime_inference_webp(self, tmp_path: Path):
# Real WEBP bytes (RIFF....WEBP): sniffing now wins over suffix.
img = tmp_path / "pic.webp"
img.write_bytes(_png_bytes())
img.write_bytes(b"RIFF\x24\x00\x00\x00WEBPVP8 " + b"\x00" * 32)
parts, _ = build_native_content_parts("", [str(img)])
url = parts[1]["image_url"]["url"]
assert url.startswith("data:image/webp;base64,")
def test_mime_sniff_overrides_misleading_extension(self, tmp_path: Path):
"""Discord-style bug: file is named .webp but contains PNG bytes.
Anthropic rejects on MIME mismatch (HTTP 400) so we MUST sniff.
Regression guard for the user-reported Discord PNG-as-WEBP failure.
"""
img = tmp_path / "discord_cached.webp"
img.write_bytes(_png_bytes()) # bytes are PNG, suffix lies
parts, _ = build_native_content_parts("", [str(img)])
url = parts[1]["image_url"]["url"]
assert url.startswith("data:image/png;base64,"), (
f"Expected MIME sniffing to detect PNG bytes regardless of .webp suffix, got: {url[:60]}"
)
# ─── Oversize handling ───────────────────────────────────────────────────────