mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-21 10:22:18 +00:00
Android Signal delivers voice notes as raw ADTS AAC frames, which
share the `0xFF 0xFx` sync word with MPEG-1/2 Layer 3 (MP3). The
`_guess_extension` byte-signature test in gateway/platforms/signal.py
was matching both, so ADTS AAC was being misclassified as MP3 — saved
to disk with the wrong extension and rejected by every major STT API
(Groq, OpenAI) because their server-side format sniffers inspect the
actual codec, not the file extension.
Two changes:
1. Tighten the MP3 vs ADTS disambiguator. ADTS packs `ID`,
`layer`, and `protection_absent` into bits 3-0 of byte 1, where
`ID=0` and `layer=00` for AAC. Real MP3 has `ID=1` and
`layer` in {01, 10, 11}. The mask `0xF6` against target `0xF0`
cleanly separates them.
2. Remux raw ADTS AAC to MP4 container at the cache step via
`ffmpeg -c:a copy`. Single demux/remux, no re-encode, no quality
loss, sub-100ms on a Pi 5. The cached file is a normal `.m4a`
that all major STT providers accept. ffmpeg is a transitive
dependency of many other Hermes features (TTS, video skills) so
this isn't a new install requirement; the remux degrades
gracefully to a no-op if ffmpeg is missing.
The new helper `_remux_aac_to_m4a` is unit-tested with a real
Android voice note from the audio cache that originally triggered
the bug, plus synthetic ADTS frames for the byte-level
disambiguator and garbage-input graceful failure.
Closes the gap that broke transcription for any Android Signal user
sending voice messages to Hermes.
|
||
|---|---|---|
| .. | ||
| qqbot | ||
| __init__.py | ||
| _http_client_limits.py | ||
| ADDING_A_PLATFORM.md | ||
| api_server.py | ||
| base.py | ||
| bluebubbles.py | ||
| dingtalk.py | ||
| email.py | ||
| feishu.py | ||
| feishu_comment.py | ||
| feishu_comment_rules.py | ||
| feishu_meeting_invite.py | ||
| helpers.py | ||
| matrix.py | ||
| msgraph_webhook.py | ||
| signal.py | ||
| signal_format.py | ||
| signal_rate_limit.py | ||
| slack.py | ||
| sms.py | ||
| telegram.py | ||
| telegram_network.py | ||
| webhook.py | ||
| wecom.py | ||
| wecom_callback.py | ||
| wecom_crypto.py | ||
| weixin.py | ||
| whatsapp.py | ||
| whatsapp_cloud.py | ||
| whatsapp_common.py | ||
| yuanbao.py | ||
| yuanbao_media.py | ||
| yuanbao_proto.py | ||
| yuanbao_sticker.py | ||