mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-14 04:02:26 +00:00
docs(voice): add Doubao speech integration examples (TTS + STT)
This commit is contained in:
parent
ca8e68822d
commit
39560c948d
1 changed files with 43 additions and 1 deletions
|
|
@ -235,6 +235,30 @@ tts:
|
||||||
output_format: wav
|
output_format: wav
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### Example: Doubao (Chinese seed-tts-2.0)
|
||||||
|
|
||||||
|
For high-quality Chinese TTS via ByteDance's [seed-tts-2.0](https://www.volcengine.com/docs/6561/1257544) bidirectional-streaming API, install the [`doubao-speech`](https://pypi.org/project/doubao-speech/) PyPI package and wire it in as a command provider:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install doubao-speech
|
||||||
|
export VOLCENGINE_APP_ID="your-app-id"
|
||||||
|
export VOLCENGINE_ACCESS_TOKEN="your-access-token"
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
tts:
|
||||||
|
provider: doubao
|
||||||
|
providers:
|
||||||
|
doubao:
|
||||||
|
type: command
|
||||||
|
command: "doubao-speech say --text-file {input_path} --out {output_path}"
|
||||||
|
output_format: mp3
|
||||||
|
max_text_length: 1024
|
||||||
|
timeout: 30
|
||||||
|
```
|
||||||
|
|
||||||
|
Credentials come from your shell environment (`VOLCENGINE_APP_ID` / `VOLCENGINE_ACCESS_TOKEN`) or `~/.doubao-speech/config.yaml`. Pick a voice by adding `--voice zh-female-warm` (or any other alias from `doubao-speech list-voices`) to the command. `doubao-speech` also bundles streaming ASR — see the [STT section below](#example-doubao--volcengine-asr) for Hermes integration. Source and full docs: [github.com/Hypnus-Yuan/doubao-speech](https://github.com/Hypnus-Yuan/doubao-speech).
|
||||||
|
|
||||||
#### Placeholders
|
#### Placeholders
|
||||||
|
|
||||||
Your command template can reference these placeholders. Hermes substitutes them at render time and shell-quotes each value for the surrounding context (bare / single-quoted / double-quoted), so paths with spaces and other shell-sensitive characters are safe.
|
Your command template can reference these placeholders. Hermes substitutes them at render time and shell-quotes each value for the surrounding context (bare / single-quoted / double-quoted), so paths with spaces and other shell-sensitive characters are safe.
|
||||||
|
|
@ -323,7 +347,25 @@ stt:
|
||||||
|
|
||||||
**xAI Grok STT** — Requires `XAI_API_KEY`. Posts to `https://api.x.ai/v1/stt` as multipart/form-data. Good choice if you're already using xAI for chat or TTS and want one API key for everything. Auto-detection order puts it after Groq — explicitly set `stt.provider: xai` to force it.
|
**xAI Grok STT** — Requires `XAI_API_KEY`. Posts to `https://api.x.ai/v1/stt` as multipart/form-data. Good choice if you're already using xAI for chat or TTS and want one API key for everything. Auto-detection order puts it after Groq — explicitly set `stt.provider: xai` to force it.
|
||||||
|
|
||||||
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders.
|
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders. Your command must write a `.txt` transcript somewhere under `{output_dir}`.
|
||||||
|
|
||||||
|
#### Example: Doubao / Volcengine ASR
|
||||||
|
|
||||||
|
If you use [`doubao-speech`](https://pypi.org/project/doubao-speech/) for Doubao TTS (see [above](#example-doubao-chinese-seed-tts-20)), the same package handles speech-to-text via the local-command STT surface:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install doubao-speech
|
||||||
|
export VOLCENGINE_APP_ID="your-app-id"
|
||||||
|
export VOLCENGINE_ACCESS_TOKEN="your-access-token"
|
||||||
|
export HERMES_LOCAL_STT_COMMAND='doubao-speech transcribe {input_path} --out {output_dir}/transcript.txt'
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
stt:
|
||||||
|
provider: local_command
|
||||||
|
```
|
||||||
|
|
||||||
|
Hermes writes the incoming voice message to `{input_path}`, runs the command, and reads the `.txt` file produced under `{output_dir}`. Language is auto-detected by the Volcengine bigmodel endpoint.
|
||||||
|
|
||||||
### Fallback Behavior
|
### Fallback Behavior
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue