Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks

Major feature additions inspired by OpenClaw/ClawdBot integration analysis: Voice Message Transcription (STT): - Auto-transcribe voice/audio messages via OpenAI Whisper API - Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp - Inject transcript as text so all models can understand voice input - Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe) Telegram Sticker Understanding: - Describe static stickers via vision tool with JSON-backed cache - Cache keyed by file_unique_id avoids redundant API calls - Animated/video stickers get emoji-based fallback description Discord Rich UX: - Native slash commands (/ask, /reset, /status, /stop) via app_commands - Button-based exec approvals (Allow Once / Always Allow / Deny) - ExecApprovalView with user authorization and timeout handling Slack Integration: - Full SlackAdapter using slack-bolt with Socket Mode - DMs, channel messages (mention-gated), /hermes slash command - File attachment handling with bot-token-authenticated downloads DM Pairing System: - Code-based user authorization as alternative to static allowlists - 8-char codes from unambiguous alphabet, 1-hour expiry - Rate limiting, lockout after failed attempts, chmod 0600 on data - CLI: hermes pairing list/approve/revoke/clear-pending Event Hook System: - File-based hook discovery from ~/.hermes/hooks/ - HOOK.yaml + handler.py per hook, sync/async handler support - Events: gateway:startup, session:start/reset, agent:start/step/end - Wildcard matching (command:* catches all command events) Cross-Channel Messaging: - send_message agent tool for delivering to any connected platform - Enables cron job delivery and cross-platform notifications Human-Like Response Pacing: - Configurable delays between message chunks (off/natural/custom) - HERMES_HUMAN_DELAY_MODE env var with min/max ms settings Warm Injection Message Style: - Retrofitted image vision messages with friendly kawaii-consistent tone - All new injection messages (STT, stickers, errors) use warm style Also: updated config migration to prompt for optional keys interactively, bumped config version, updated README, AGENTS.md, .env.example, cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-04-25 00:51:20 +00:00 · 2026-02-15 21:38:59 -08:00 · 2026-02-15 21:38:59 -08:00 · 69aa35a51c
commit 69aa35a51c
parent 5404a8fcd8
23 changed files with 2080 additions and 32 deletions
--- a/tools/transcription_tools.py
+++ b/tools/transcription_tools.py
@ -0,0 +1,103 @@
+#!/usr/bin/env python3
+"""
+Transcription Tools Module
+
+Provides speech-to-text transcription using OpenAI's Whisper API.
+Used by the messaging gateway to automatically transcribe voice messages
+sent by users on Telegram, Discord, WhatsApp, and Slack.
+
+Supported models:
+  - whisper-1        (cheapest, good quality)
+  - gpt-4o-mini-transcribe  (better quality, higher cost)
+  - gpt-4o-transcribe       (best quality, highest cost)
+
+Supported input formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg
+
+Usage:
+    from tools.transcription_tools import transcribe_audio
+
+    result = transcribe_audio("/path/to/audio.ogg")
+    if result["success"]:
+        print(result["transcript"])
+"""
+
+import os
+from pathlib import Path
+from typing import Optional
+
+
+# Default STT model -- cheapest and widely available
+DEFAULT_STT_MODEL = "whisper-1"
+
+
+def transcribe_audio(file_path: str, model: Optional[str] = None) -> dict:
+    """
+    Transcribe an audio file using OpenAI's Whisper API.
+
+    This function calls the OpenAI Audio Transcriptions endpoint directly
+    (not via OpenRouter, since Whisper isn't available there).
+
+    Args:
+        file_path: Absolute path to the audio file to transcribe.
+        model:     Whisper model to use. Defaults to config or "whisper-1".
+
+    Returns:
+        dict with keys:
+          - "success" (bool): Whether transcription succeeded
+          - "transcript" (str): The transcribed text (empty on failure)
+          - "error" (str, optional): Error message if success is False
+    """
+    api_key = os.getenv("OPENAI_API_KEY")
+    if not api_key:
+        return {
+            "success": False,
+            "transcript": "",
+            "error": "OPENAI_API_KEY not set",
+        }
+
+    audio_path = Path(file_path)
+    if not audio_path.is_file():
+        return {
+            "success": False,
+            "transcript": "",
+            "error": f"Audio file not found: {file_path}",
+        }
+
+    # Use provided model, or fall back to default
+    if model is None:
+        model = DEFAULT_STT_MODEL
+
+    try:
+        from openai import OpenAI
+
+        client = OpenAI(api_key=api_key)
+
+        with open(file_path, "rb") as audio_file:
+            transcription = client.audio.transcriptions.create(
+                model=model,
+                file=audio_file,
+                response_format="text",
+            )
+
+        # The response is a plain string when response_format="text"
+        transcript_text = str(transcription).strip()
+
+        print(f"[STT] Transcribed {audio_path.name} ({len(transcript_text)} chars)", flush=True)
+
+        return {
+            "success": True,
+            "transcript": transcript_text,
+        }
+
+    except Exception as e:
+        print(f"[STT] Transcription error: {e}", flush=True)
+        return {
+            "success": False,
+            "transcript": "",
+            "error": str(e),
+        }
+
+
+def check_stt_requirements() -> bool:
+    """Check if OpenAI API key is available for speech-to-text."""
+    return bool(os.getenv("OPENAI_API_KEY"))