Commit graph

5 commits

Author SHA1 Message Date
Albert G
ad2531be08 feat(telegram): skip-STT audio path + 2GB cap via local Bot API server
Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 22:59:40 -07:00
etcircle
72b345e068 fix(gateway): preserve queued voice events for STT 2026-04-11 14:43:53 -07:00
Teknium
0f597dd127
fix: STT provider-model mismatch — whisper-1 fed to faster-whisper (#7113)
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
  ValueError: Invalid model size 'whisper-1'

Changes:
- gateway/run.py, discord.py: stop passing model override — let
  transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
  nested section (stt.local.model, stt.openai.model, etc.); ignores
  legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
  config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
  provider section and removes the flat key

Reported by community user on Discord.
2026-04-10 03:27:30 -07:00
teknium1
1f72ce71b7 fix: restore local STT fallback for gateway voice notes
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
2026-03-15 21:51:40 -07:00
teyrebaz33
c36136084a fix(gateway): honor stt.enabled false for voice transcription
- bridge stt.enabled from config.yaml into gateway runtime config
- preserve the flag in GatewayConfig serialization
- skip gateway voice transcription when STT is disabled
- add regression tests for config loading and disabled transcription flow
2026-03-14 22:09:53 -07:00