mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language prompt control. Integrates through the existing provider chain: - tools/tts_tool.py: new _generate_gemini_tts() calls the generativelanguage REST endpoint with responseModalities=[AUDIO], wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header, then ffmpeg-converts to MP3 or Opus depending on output extension. For .ogg output, libopus is forced explicitly so Telegram voice bubbles get Opus (ffmpeg defaults to Vorbis for .ogg). - hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider option in the curses-based 'hermes tools' UI. - hermes_cli/setup.py: adds gemini to the setup wizard picker, tool status display, and API key prompt branch (accepts existing GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set). - tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model overrides, snake_case vs camelCase inlineData handling, HTTP error surfacing, and empty-audio edge cases. - docs: TTS features page updated to list seven providers with the new gemini config block and ffmpeg notes. Live-tested against api key against gemini-2.5-flash-preview-tts: .wav, .mp3, and Telegram-compatible .ogg (Opus codec) all produce valid playable audio. |
||
|---|---|---|
| .. | ||
| _category_.json | ||
| acp.md | ||
| api-server.md | ||
| batch-processing.md | ||
| browser.md | ||
| code-execution.md | ||
| context-files.md | ||
| context-references.md | ||
| credential-pools.md | ||
| cron.md | ||
| dashboard-plugins.md | ||
| delegation.md | ||
| fallback-providers.md | ||
| honcho.md | ||
| hooks.md | ||
| image-generation.md | ||
| mcp.md | ||
| memory-providers.md | ||
| memory.md | ||
| overview.md | ||
| personality.md | ||
| plugins.md | ||
| provider-routing.md | ||
| rl-training.md | ||
| skills.md | ||
| skins.md | ||
| tool-gateway.md | ||
| tools.md | ||
| tts.md | ||
| vision.md | ||
| voice-mode.md | ||
| web-dashboard.md | ||