mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 01:21:43 +00:00
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language prompt control. Integrates through the existing provider chain: - tools/tts_tool.py: new _generate_gemini_tts() calls the generativelanguage REST endpoint with responseModalities=[AUDIO], wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header, then ffmpeg-converts to MP3 or Opus depending on output extension. For .ogg output, libopus is forced explicitly so Telegram voice bubbles get Opus (ffmpeg defaults to Vorbis for .ogg). - hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider option in the curses-based 'hermes tools' UI. - hermes_cli/setup.py: adds gemini to the setup wizard picker, tool status display, and API key prompt branch (accepts existing GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set). - tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model overrides, snake_case vs camelCase inlineData handling, HTTP error surfacing, and empty-audio edge cases. - docs: TTS features page updated to list seven providers with the new gemini config block and ffmpeg notes. Live-tested against api key against gemini-2.5-flash-preview-tts: .wav, .mp3, and Telegram-compatible .ogg (Opus codec) all produce valid playable audio. |
||
|---|---|---|
| .. | ||
| docs | ||
| scripts | ||
| src | ||
| static | ||
| .gitignore | ||
| docusaurus.config.ts | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| sidebars.ts | ||
| tsconfig.json | ||
Website
This website is built using Docusaurus, a modern static website generator.
Installation
yarn
Local Development
yarn start
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
Build
yarn build
This command generates static content into the build directory and can be served using any static contents hosting service.
Deployment
Using SSH:
USE_SSH=true yarn deploy
Not using SSH:
GIT_USER=<Your GitHub username> yarn deploy
If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.
Diagram Linting
CI runs ascii-guard to lint docs for ASCII box diagrams. Use Mermaid (````mermaid`) or plain lists/tables instead of ASCII boxes to avoid CI failures.