mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
feat(telegram): skip-STT audio path + 2GB cap via local Bot API server
Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling. - `stt.enabled: false` no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces `[The user sent a voice message: <abs path> (duration: M:SS)]` to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present. - `platforms.telegram.extra.base_url` set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api `--local` ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of `base_url` is the opt-in. - `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs). - gateway/run.py: rewrites the `stt.enabled: false` branch of `_enrich_message_with_transcription`. New `_format_duration` + `_probe_audio_duration` helpers. - gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute derived from `extra.base_url`; `local_mode` builder wiring; dynamic "too large" message. - tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement. - tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default. - website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time `logOut` migration, `platforms.telegram.extra` config, the `local_mode` disk-access requirement, the silent HTTP-fallback 404). - website/docs/user-guide/features/voice-mode.md: documents the `stt.enabled` knob in the config reference. - `pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing. - Verified end-to-end on a live deployment: gateway log shows `Using custom Telegram base_url: http://...` and `Using Telegram local_mode (read files from disk)` on startup; voice messages above 20MB cache to disk and surface their path to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
6265b3a132
commit
ad2531be08
6 changed files with 327 additions and 14 deletions
|
|
@ -439,6 +439,14 @@ class TelegramAdapter(BasePlatformAdapter):
|
|||
self._dm_topic_chat_ids: Set[str] = {
|
||||
str(e["chat_id"]) for e in self._dm_topics_config if "chat_id" in e
|
||||
}
|
||||
# Document size cap. Telegram's public Bot API caps getFile at 20MB; a
|
||||
# locally-hosted telegram-bot-api server (configured via extra.base_url)
|
||||
# raises that to 2GB, so the presence of base_url is the opt-in.
|
||||
self._max_doc_bytes: int = (
|
||||
2 * 1024 * 1024 * 1024
|
||||
if self.config.extra.get("base_url")
|
||||
else 20 * 1024 * 1024
|
||||
)
|
||||
# Interactive model picker state per chat
|
||||
self._model_picker_state: Dict[str, dict] = {}
|
||||
# Approval button state: message_id → session_key
|
||||
|
|
@ -1315,6 +1323,14 @@ class TelegramAdapter(BasePlatformAdapter):
|
|||
"[%s] Using custom Telegram base_url: %s",
|
||||
self.name, custom_base_url,
|
||||
)
|
||||
# In local-mode telegram-bot-api, file_path is an absolute path on the
|
||||
# server's filesystem rather than a relative HTTP path. PTB needs
|
||||
# local_mode=True so download_*() reads from disk instead of issuing
|
||||
# an HTTP GET that would 404. Requires that the same path is
|
||||
# readable by the Hermes process (shared mount, same machine, etc.).
|
||||
if self.config.extra.get("local_mode"):
|
||||
builder = builder.local_mode(True)
|
||||
logger.info("[%s] Using Telegram local_mode (read files from disk)", self.name)
|
||||
|
||||
# PTB defaults (pool_timeout=1s) are too aggressive on flaky networks and
|
||||
# can trigger "Pool timeout: All connections in the connection pool are occupied"
|
||||
|
|
@ -4894,11 +4910,11 @@ class TelegramAdapter(BasePlatformAdapter):
|
|||
|
||||
# Check file size early so image documents cannot bypass the
|
||||
# document size limit by taking the image path.
|
||||
MAX_DOC_BYTES = 20 * 1024 * 1024
|
||||
if not doc.file_size or doc.file_size > MAX_DOC_BYTES:
|
||||
if not doc.file_size or doc.file_size > self._max_doc_bytes:
|
||||
limit_mb = self._max_doc_bytes // (1024 * 1024)
|
||||
event.text = (
|
||||
"The document is too large or its size could not be verified. "
|
||||
"Maximum: 20 MB."
|
||||
f"Maximum: {limit_mb} MB."
|
||||
)
|
||||
logger.info("[Telegram] Document too large: %s bytes", doc.file_size)
|
||||
await self.handle_message(event)
|
||||
|
|
|
|||
|
|
@ -918,6 +918,59 @@ def _build_media_placeholder(event) -> str:
|
|||
return "\n".join(parts)
|
||||
|
||||
|
||||
def _format_duration(seconds: float) -> str:
|
||||
total = int(round(seconds))
|
||||
if total < 0:
|
||||
total = 0
|
||||
hours, rem = divmod(total, 3600)
|
||||
minutes, secs = divmod(rem, 60)
|
||||
if hours:
|
||||
return f"{hours}:{minutes:02d}:{secs:02d}"
|
||||
return f"{minutes}:{secs:02d}"
|
||||
|
||||
|
||||
async def _probe_audio_duration(path: str) -> Optional[str]:
|
||||
"""Best-effort duration probe. Returns formatted MM:SS / HH:MM:SS, or None on failure."""
|
||||
ext = os.path.splitext(path)[1].lower()
|
||||
|
||||
if ext == ".wav":
|
||||
try:
|
||||
def _wav_duration() -> float:
|
||||
import wave
|
||||
with wave.open(path, "rb") as wf:
|
||||
frames = wf.getnframes()
|
||||
rate = wf.getframerate() or 1
|
||||
return frames / float(rate)
|
||||
secs = await asyncio.to_thread(_wav_duration)
|
||||
return _format_duration(secs)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if ext in (".ogg", ".opus", ".oga"):
|
||||
try:
|
||||
def _ogg_duration() -> float:
|
||||
from mutagen.oggopus import OggOpus
|
||||
return float(OggOpus(path).info.length)
|
||||
secs = await asyncio.to_thread(_ogg_duration)
|
||||
return _format_duration(secs)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
try:
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
"ffprobe", "-v", "error", "-show_entries", "format=duration",
|
||||
"-of", "default=noprint_wrappers=1:nokey=1", path,
|
||||
stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
|
||||
)
|
||||
stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=5.0)
|
||||
if proc.returncode == 0:
|
||||
return _format_duration(float(stdout.decode().strip()))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _dequeue_pending_event(adapter, session_key: str) -> MessageEvent | None:
|
||||
"""Consume and return the full pending event for a session.
|
||||
|
||||
|
|
@ -14151,16 +14204,25 @@ class GatewayRunner:
|
|||
The enriched message string with transcriptions prepended.
|
||||
"""
|
||||
if not getattr(self.config, "stt_enabled", True):
|
||||
disabled_note = "[The user sent voice message(s), but transcription is disabled in config."
|
||||
if self._has_setup_skill():
|
||||
disabled_note += (
|
||||
" You have a skill called hermes-agent-setup that can help "
|
||||
"users configure Hermes features including voice, tools, and more."
|
||||
)
|
||||
disabled_note += "]"
|
||||
notes = []
|
||||
for path in audio_paths:
|
||||
abs_path = os.path.abspath(path)
|
||||
duration_str = await _probe_audio_duration(abs_path)
|
||||
if duration_str:
|
||||
notes.append(
|
||||
f"[The user sent a voice message: {abs_path} (duration: {duration_str})]"
|
||||
)
|
||||
else:
|
||||
notes.append(f"[The user sent a voice message: {abs_path}]")
|
||||
if not notes:
|
||||
return user_text
|
||||
prefix = "\n\n".join(notes)
|
||||
_placeholder = "(The user sent a message with no text content)"
|
||||
if user_text and user_text.strip() == _placeholder:
|
||||
return prefix
|
||||
if user_text:
|
||||
return f"{disabled_note}\n\n{user_text}"
|
||||
return disabled_note
|
||||
return f"{prefix}\n\n{user_text}"
|
||||
return prefix
|
||||
|
||||
from tools.transcription_tools import transcribe_audio
|
||||
|
||||
|
|
|
|||
|
|
@ -33,25 +33,51 @@ def test_load_gateway_config_bridges_stt_enabled_from_config_yaml(tmp_path, monk
|
|||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_enrich_message_with_transcription_skips_when_stt_disabled():
|
||||
async def test_enrich_message_with_transcription_surfaces_path_when_stt_disabled():
|
||||
from gateway.run import GatewayRunner
|
||||
|
||||
runner = GatewayRunner.__new__(GatewayRunner)
|
||||
runner.config = GatewayConfig(stt_enabled=False)
|
||||
runner._has_setup_skill = lambda: True # Should NOT be consulted in disabled branch.
|
||||
|
||||
with patch(
|
||||
"tools.transcription_tools.transcribe_audio",
|
||||
side_effect=AssertionError("transcribe_audio should not be called when STT is disabled"),
|
||||
), patch(
|
||||
"gateway.run._probe_audio_duration",
|
||||
new=AsyncMock(return_value="0:12"),
|
||||
):
|
||||
result = await runner._enrich_message_with_transcription(
|
||||
"caption",
|
||||
["/tmp/voice.ogg"],
|
||||
)
|
||||
|
||||
assert "transcription is disabled" in result.lower()
|
||||
assert "/tmp/voice.ogg" in result
|
||||
assert "voice message" in result.lower()
|
||||
assert "(duration: 0:12)" in result
|
||||
assert "caption" in result
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_enrich_message_with_transcription_omits_duration_on_probe_failure():
|
||||
from gateway.run import GatewayRunner
|
||||
|
||||
runner = GatewayRunner.__new__(GatewayRunner)
|
||||
runner.config = GatewayConfig(stt_enabled=False)
|
||||
|
||||
with patch(
|
||||
"gateway.run._probe_audio_duration",
|
||||
new=AsyncMock(return_value=None),
|
||||
):
|
||||
result = await runner._enrich_message_with_transcription(
|
||||
"",
|
||||
["/tmp/voice.ogg"],
|
||||
)
|
||||
|
||||
assert "/tmp/voice.ogg" in result
|
||||
assert "duration" not in result.lower()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_enrich_message_with_transcription_avoids_bogus_no_provider_message_for_backend_key_errors():
|
||||
from gateway.run import GatewayRunner
|
||||
|
|
|
|||
56
tests/gateway/test_telegram_max_doc_bytes.py
Normal file
56
tests/gateway/test_telegram_max_doc_bytes.py
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
"""Tests for Telegram document-size cap.
|
||||
|
||||
The public Telegram Bot API caps `getFile` at 20MB. A locally-hosted
|
||||
`telegram-bot-api` server raises that ceiling to 2GB. We treat the presence
|
||||
of `extra.base_url` as the explicit opt-in to the higher cap.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
from gateway.config import PlatformConfig
|
||||
|
||||
|
||||
def _ensure_telegram_mock():
|
||||
if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
|
||||
return
|
||||
|
||||
telegram_mod = MagicMock()
|
||||
telegram_mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
|
||||
telegram_mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
|
||||
telegram_mod.constants.ChatType.GROUP = "group"
|
||||
telegram_mod.constants.ChatType.SUPERGROUP = "supergroup"
|
||||
telegram_mod.constants.ChatType.CHANNEL = "channel"
|
||||
telegram_mod.constants.ChatType.PRIVATE = "private"
|
||||
|
||||
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
|
||||
sys.modules.setdefault(name, telegram_mod)
|
||||
|
||||
|
||||
_ensure_telegram_mock()
|
||||
|
||||
from gateway.platforms.telegram import TelegramAdapter # noqa: E402
|
||||
|
||||
|
||||
def test_max_doc_bytes_defaults_to_20mb_without_base_url():
|
||||
adapter = TelegramAdapter(PlatformConfig(enabled=True, token="***", extra={}))
|
||||
assert adapter._max_doc_bytes == 20 * 1024 * 1024
|
||||
|
||||
|
||||
def test_max_doc_bytes_raised_to_2gb_when_base_url_set():
|
||||
adapter = TelegramAdapter(
|
||||
PlatformConfig(
|
||||
enabled=True,
|
||||
token="***",
|
||||
extra={"base_url": "http://localhost:8081/bot"},
|
||||
)
|
||||
)
|
||||
assert adapter._max_doc_bytes == 2 * 1024 * 1024 * 1024
|
||||
|
||||
|
||||
def test_max_doc_bytes_empty_base_url_keeps_default():
|
||||
"""An empty/falsy `base_url` should not flip the cap — only a real URL does."""
|
||||
adapter = TelegramAdapter(
|
||||
PlatformConfig(enabled=True, token="***", extra={"base_url": ""}),
|
||||
)
|
||||
assert adapter._max_doc_bytes == 20 * 1024 * 1024
|
||||
|
|
@ -391,6 +391,11 @@ voice:
|
|||
|
||||
# Speech-to-Text
|
||||
stt:
|
||||
enabled: true # set to false to skip auto-transcription —
|
||||
# the gateway still caches the audio file and
|
||||
# passes its path to the agent as part of the
|
||||
# inbound message, useful for custom pipelines
|
||||
# (diarization, alignment, archival, etc.)
|
||||
provider: "local" # "local" (free) | "groq" | "openai"
|
||||
local:
|
||||
model: "base" # tiny, base, small, medium, large-v3
|
||||
|
|
|
|||
|
|
@ -276,6 +276,25 @@ Voice messages you send on Telegram are automatically transcribed by Hermes's co
|
|||
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
|
||||
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
|
||||
|
||||
#### Skipping STT: pass the raw audio file to the agent
|
||||
|
||||
If you'd rather have the **agent itself** handle audio — for diarization, a custom transcription tool, or just archiving the recording — set `stt.enabled: false` in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
stt:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
With STT disabled, the gateway still downloads the voice/audio attachment into Hermes's audio cache, but **does not transcribe it**. The agent receives the message with a marker like:
|
||||
|
||||
```
|
||||
[The user sent a voice message: /home/<user>/.hermes/cache/audio/<hash>.ogg]
|
||||
```
|
||||
|
||||
Your tools or skills can then read that path directly (e.g., hand it off to a local diarization pipeline, a richer transcription model, or upload it to long-term storage). The file extension reflects the original format Telegram delivered (`.ogg` for voice notes, `.mp3`/`.m4a`/etc. for audio attachments).
|
||||
|
||||
This pairs naturally with the [local Bot API server](#large-files-20mb--via-local-bot-api-server) section below, which lifts Telegram's 20MB getFile ceiling to 2GB — useful when the recordings you want to process are longer than a couple of minutes.
|
||||
|
||||
### Outgoing Voice (Text-to-Speech)
|
||||
|
||||
When the agent generates audio via TTS, it's delivered as native Telegram **voice bubbles** — the round, inline-playable kind.
|
||||
|
|
@ -295,6 +314,135 @@ Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable,
|
|||
|
||||
Configure the TTS provider in your `config.yaml` under the `tts.provider` key.
|
||||
|
||||
## Large Files (>20MB) via Local Bot API Server
|
||||
|
||||
Telegram's **public** Bot API caps `getFile` downloads at **20 MB**, so any voice note, audio file, video, or document larger than that is silently rejected by Hermes with a "too large" reply. The documented way around this is to run a **local** [telegram-bot-api](https://github.com/tdlib/telegram-bot-api) daemon — the same server software Telegram uses, but running on your network. A local server raises the file ceiling to **2 GB** and Hermes auto-lifts its own internal cap when it sees a custom `base_url` configured.
|
||||
|
||||
This unlocks workflows like:
|
||||
|
||||
- Sending long voice memos (45-minute meetings, podcasts) to the bot
|
||||
- Uploading large videos for vision-tool processing
|
||||
- Archiving raw audio for offline pipelines like diarization, alignment, or training data
|
||||
|
||||
### Step 1: Obtain Telegram API credentials
|
||||
|
||||
The local server talks directly to Telegram's MTProto layer (not the public Bot API), so it needs **MTProto credentials**:
|
||||
|
||||
1. Visit [my.telegram.org/apps](https://my.telegram.org/apps) and sign in with your Telegram account.
|
||||
2. Create a new application (any name and short description will do).
|
||||
3. Copy the `api_id` and `api_hash` — both are required.
|
||||
|
||||
### Step 2: Run the telegram-bot-api server
|
||||
|
||||
The community-maintained [`aiogram/telegram-bot-api`](https://hub.docker.com/r/aiogram/telegram-bot-api) Docker image is the easiest path. A minimal `docker-compose.yaml` (use `--local` mode to enable the higher limits):
|
||||
|
||||
```yaml
|
||||
services:
|
||||
tg-bot-api:
|
||||
image: aiogram/telegram-bot-api:latest
|
||||
container_name: tg-bot-api
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "127.0.0.1:8081:8081" # bind to loopback only; see security note
|
||||
environment:
|
||||
TELEGRAM_API_ID: "12345" # your api_id from Step 1
|
||||
TELEGRAM_API_HASH: "abcdef..." # your api_hash from Step 1
|
||||
TELEGRAM_LOCAL: "1" # enable --local mode (raises 20MB → 2GB)
|
||||
volumes:
|
||||
- ./tg-bot-api-data:/var/lib/telegram-bot-api
|
||||
```
|
||||
|
||||
Bring it up:
|
||||
|
||||
```bash
|
||||
docker compose up -d tg-bot-api
|
||||
docker logs --tail 20 tg-bot-api
|
||||
```
|
||||
|
||||
:::warning Security
|
||||
The local Bot API server takes your bot token in the URL path (e.g. `/bot<TOKEN>/getMe`) with **no additional auth**. Anyone who can reach the port can fully control your bot — read every message it can see, send messages as it, etc. Bind the container to `127.0.0.1` and/or front it with a reverse proxy on a private network. **Never expose port 8081 to the public internet.**
|
||||
:::
|
||||
|
||||
### Step 3: Log the bot out of the public API (one-time)
|
||||
|
||||
A bot can only be active on **one** Bot API server at a time. If your bot was already running against `api.telegram.org` (which it almost certainly was), you must explicitly log it out there before the local server will accept it:
|
||||
|
||||
```bash
|
||||
curl "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/logOut"
|
||||
# expected response: {"ok":true,"result":true}
|
||||
```
|
||||
|
||||
This is a one-shot migration step — you don't repeat it on every restart. Telegram delivers any messages received after `logOut` through the new server instead.
|
||||
|
||||
Verify the local server can talk to Telegram on the bot's behalf:
|
||||
|
||||
```bash
|
||||
curl "http://127.0.0.1:8081/bot<YOUR_BOT_TOKEN>/getMe"
|
||||
# expected response: {"ok":true,"result":{"id":...,"is_bot":true,...}}
|
||||
```
|
||||
|
||||
### Step 4: Point Hermes at the local server
|
||||
|
||||
Add the URLs under `platforms.telegram.extra` in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
platforms:
|
||||
telegram:
|
||||
extra:
|
||||
base_url: "http://127.0.0.1:8081/bot"
|
||||
base_file_url: "http://127.0.0.1:8081/file/bot"
|
||||
local_mode: true # see Step 5 below — only set this if the bot's data
|
||||
# directory is readable by the Hermes process
|
||||
```
|
||||
|
||||
:::caution Use `platforms.telegram.extra`, not `telegram.extra`
|
||||
At the moment only the `platforms.<name>.extra` form is deep-merged into the platform config. Keys placed directly under a top-level `telegram.extra` block are silently dropped.
|
||||
:::
|
||||
|
||||
When `base_url` is set, Hermes:
|
||||
|
||||
- Builds the python-telegram-bot client against the local server
|
||||
- Auto-lifts its internal document/audio size cap from 20 MB → 2 GB
|
||||
- Reports the active limit in the "too large" error message (`Maximum: 2048 MB.`) so it's obvious which mode you're in
|
||||
|
||||
Restart the gateway and look for a confirmation log line:
|
||||
|
||||
```bash
|
||||
hermes gateway restart
|
||||
grep -E "Using custom Telegram base_url|Using Telegram local_mode" ~/.hermes/logs/gateway.log | tail
|
||||
```
|
||||
|
||||
### Step 5: `local_mode` — file access on disk
|
||||
|
||||
The local server has **two ways** to deliver files:
|
||||
|
||||
1. **Without `--local`** (the default): files are served over HTTP at `/file/bot<TOKEN>/<path>`, same as the public Bot API. The 20MB ceiling stays in effect. Useful as a network-fix only (e.g. when `api.telegram.org` is unreachable but you can self-host); not what you want for the size lift.
|
||||
2. **With `--local`** (set via `TELEGRAM_LOCAL=1` above): files are written to the server's filesystem and the `getFile` response returns an **absolute path** instead of an HTTP URL. The 20MB ceiling is lifted. Hermes must then read the bytes **from disk**, not over HTTP.
|
||||
|
||||
To make the disk-read path work, set `local_mode: true` in the config above **and** make sure the Hermes process can read the path the server returns. Two scenarios:
|
||||
|
||||
- **Same machine** — telegram-bot-api and Hermes run on the same host. Bind-mount the data volume to a directory that Hermes can read (e.g., `/var/lib/telegram-bot-api`), and make sure the file ownership matches. The container drops privileges to its internal `telegram-bot-api` user (uid varies by image); the simplest fix is to add `user: "<UID>:<GID>"` to the compose service so files are owned by a uid Hermes already runs as.
|
||||
- **Different machines** — the bot server runs on one host (e.g., a NAS, a separate VM) and Hermes on another. The server's data directory must be shared with the Hermes machine at the **same absolute path** the server reports (typically `/var/lib/telegram-bot-api`). NFS works well for this; CIFS/SMB with `uid=` mount remapping is friendlier if you don't want to deal with uid mismatches at the filesystem level.
|
||||
|
||||
If `local_mode: true` is set but Hermes can't `stat` the returned file path (permissions or wrong mount), python-telegram-bot silently falls back to an HTTP `getFile` against the local server — which in `--local` mode responds with `404 Not Found`. The symptom shows up in `gateway.log` as:
|
||||
|
||||
```
|
||||
[Telegram] Failed to cache voice: Not Found
|
||||
telegram.error.InvalidToken: Not Found
|
||||
```
|
||||
|
||||
If you see that, the cap-lift is working but the file-share isn't. Verify `ls -la /var/lib/telegram-bot-api/<TOKEN>/voice/` from the Hermes host as the user the gateway runs as, and confirm a single file is `cat`-able without a permission error.
|
||||
|
||||
### Step 6: Test it
|
||||
|
||||
Send the bot a voice note or audio file that's bigger than 20 MB. Tail the gateway log:
|
||||
|
||||
```bash
|
||||
tail -f ~/.hermes/logs/gateway.log | grep -iE "telegram|cache"
|
||||
```
|
||||
|
||||
You should see a `[Telegram] Cached user voice at /home/<user>/.hermes/cache/audio/...` line and **no** "too large" rejection. Combined with `stt.enabled: false` (above), the path to the original audio file then lands in the agent's inbound message for downstream processing.
|
||||
|
||||
## Group Chat Usage
|
||||
|
||||
Hermes Agent works in Telegram group chats with a few considerations:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue