PR #13743 replaced the global MAX_TEXT_LENGTH=4000 with a per-provider
table and a user-override 'max_text_length:' key, but the user-guide
TTS page documented no length behaviour at all. Users hitting truncation
had no way to discover the new caps or the override.
Add an 'Input length limits' subsection after the existing Configuration
YAML block: provider default caps (Edge 5000 / OpenAI 4096 / xAI 15000 /
MiniMax 10000 / Mistral 4000 / Gemini 5000 / ElevenLabs model-aware /
NeuTTS,KittenTTS 2000), ElevenLabs model_id -> cap table (5k-40k), an
override example, and the validation rules (non-positive / non-integer /
boolean values fall through to the provider default).
Point users to xAI's custom voices feature — clone your voice in the
console, paste the voice_id into tts.xai.voice_id. No code changes
needed; the existing TTS pipeline already handles arbitrary voice IDs.
- config.py: link to xAI custom voices docs in voice_id comment
- setup.py: prompt accepts custom voice IDs during xAI TTS setup
- tts.md: short section linking to xAI console and docs
Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.
What ships:
- New 'piper' built-in provider in tools/tts_tool.py
- Lazy import via _import_piper()
- Module-level voice cache keyed on (model_path, use_cuda) so switching
voices doesn't invalidate older cached voices
- _resolve_piper_voice_path() accepts either an absolute .onnx path or a
voice name (auto-downloaded on first use via 'python -m
piper.download_voices --download-dir <cache>')
- Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
get_hermes_dir)
- Optional SynthesisConfig knobs: length_scale, noise_scale,
noise_w_scale, volume, normalize_audio, use_cuda — passed through
only when configured, so older piper-tts versions aren't broken
- WAV output then ffmpeg conversion path (same as neutts/kittentts) so
Telegram voice bubbles work when ffmpeg is present
- Piper added to BUILTIN_TTS_PROVIDERS so a user's
tts.providers.piper.command cannot shadow the native provider
(regression test included)
- 'hermes tools' wizard entry
- Piper appears under Voice and TTS as local free, with
'pip install piper-tts' auto-install via post_setup handler
- Prints voice-catalog URL and default-voice info after install
- config.yaml defaults
- tts.piper.voice defaults to en_US-lessac-medium
- Commented advanced knobs for discoverability
- Docs
- New 'Piper (local, 44 languages)' section in features/tts.md
explaining install path, voice switching, pre-downloaded voices,
and advanced knobs
- Piper listed in the ten-provider table and ffmpeg table
- Custom-command-providers section updated to drop the Piper example
(now native) and add a piper-custom example for users with their own
trained .onnx models
- overview.md bumps provider count to ten
- Tests (tests/tools/test_tts_piper.py, 16 tests)
- Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
- _resolve_piper_voice_path across every branch: direct .onnx path,
cached voice name, fresh download with correct CLI args, download
failure, successful-exit-but-missing-files, empty voice to default
- _generate_piper_tts: loads voice once, reuses cache, voice-name
download wiring, advanced knobs flow through SynthesisConfig
- text_to_speech_tool end-to-end dispatch and missing-package error
- check_tts_requirements: piper availability toggles the return value
- Regression guard: piper cannot be shadowed by a command provider
with the same name
- Pre-existing test_tts_mistral test broadened to mock the new
piper/kittentts/command-provider checks (otherwise it false-passes
when piper is installed in the test venv)
E2E verification (live):
Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.
This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
Reshape of PR #17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).
Config shape:
tts:
provider: piper-en
providers:
piper-en:
type: command
command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
output_format: wav
Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.
Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
under tts.providers.<name>.max_text_length.
Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.
Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.
E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.
Co-authored-by: Versun <me+github7604@versun.org>
Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the
provider behaves like every other TTS backend end to end.
- tools/tts_tool.py: `_check_kittentts_available()` helper and wire
into `check_tts_requirements()`; extend Opus-conversion list to
include kittentts (WAV → Opus for Telegram voice bubbles); point the
missing-package error at `hermes setup tts`.
- hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech"
toolset picker, with a `kittentts` post_setup hook that auto-installs
the wheel + soundfile via pip.
- hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install
flow in `_setup_tts_provider()`, provider_labels entry, and status row
in the `hermes setup` summary.
- website/docs/user-guide/features/tts.md: add KittenTTS to the provider
table, config example, ffmpeg note, and the zero-config voice-bubble tip.
- tests/tools/test_tts_kittentts.py: 10 unit tests covering generation,
model caching, config passthrough, ffmpeg conversion, availability
detection, and the missing-package dispatcher branch.
E2E verified against the real `kittentts` wheel:
- WAV direct output (pcm_s16le, 24kHz mono)
- MP3 conversion via ffmpeg (from WAV)
- Telegram flow (provider in Opus-conversion list) produces
`codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the
`[[audio_as_voice]]` marker
- check_tts_requirements() returns True when kittentts is installed
Comprehensive audit of every reference/messaging/feature doc page against the
live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY,
TOOLSETS, tool registry, on-disk skills). Every fix was verified against code
before writing.
### Wrong values fixed (users would paste-and-fail)
- reference/environment-variables.md:
- DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192
actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`.
- MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual
`/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint).
- reference/toolsets-reference.md MCP example used the non-existent nested
`mcp: servers:` key \u2192 real key is the flat `mcp_servers:`.
- reference/skills-catalog.md listed ~20 bundled skills that no longer exist
on disk (all moved to `optional-skills/`). Regenerated the whole bundled
section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names.
- messaging/slack.md ":::info" callout claimed Slack has no
`free_response_channels` equivalent; both the env var and the yaml key are
in fact read.
- messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the
adapter only reads `extra.markdown_support` from config.yaml. Removed the
env var row and noted config-only nature.
- messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`.
### Missing coverage added
- Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in
PROVIDER_REGISTRY but undocumented everywhere. Added sections to
integrations/providers.md, rows to quickstart.md and fallback-providers.md.
- integrations/providers.md "Fallback Model" provider list now includes
gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock.
- reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER
enum in env-vars now include the same set.
- reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`.
Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`.
- reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added
`feishu_doc` and `feishu_drive` toolset sections.
- reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core
rows + all missing `hermes-<platform>` toolsets in the platform table
(bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin,
homeassistant, webhook, gateway). Fixed the `debugging` composite to
describe the actual `includes=[...]` mechanism.
- reference/optional-skills-catalog.md: added `fitness-nutrition`.
- reference/environment-variables.md: added NOUS_BASE_URL,
NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL,
XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE,
BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS,
DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS,
QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX.
- messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY,
HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT
_DELAY_SECONDS (all actively read by the adapter).
- messaging/matrix.md: documented MATRIX_REACTIONS (default true).
- messaging/telegram.md: removed the redundant second Webhook Mode section
that invented a `telegram.webhook_mode: true` yaml key the adapter does
not read.
- user-guide/features/hooks.md: added `on_session_finalize` and
`on_session_reset` (both emitted via invoke_hook but undocumented).
- user-guide/features/api-server.md: documented GET /health/detailed, the
`/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events
(10 routes that were live but undocumented).
- user-guide/features/fallback-providers.md: added `approval` and
`title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth
to the supported-providers table.
- user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add
oversight in #11942).
- user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`;
yaml example block gains `mistral:`, `gemini:`, `xai:` subsections.
Auxiliary-provider enum now enumerates all real registry entries.
- reference/faq.md: stale AIAgent/config examples bumped from
`nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to
`claude-opus-4.7`.
### Docs-site integrity
- guides/build-a-hermes-plugin.md referenced two nonexistent hooks
(`pre_api_request`, `post_api_request`). Replaced with the real
`on_session_finalize` / `on_session_reset` entries.
- messaging/open-webui.md and features/api-server.md had pre-existing
broken links to `/docs/user-guide/features/profiles` (actual path is
`/docs/user-guide/profiles`). Fixed.
- reference/skills-catalog.md had one `<1%` literal that MDX parsed as a
JSX tag. Escaped to `<1%`.
### False positives filtered out (not changed, verified correct)
- `/set-home` is a registered alias of `/sethome` \u2014 docs were fine.
- `hermes setup gateway` is valid syntax (`hermes setup \<section\>`);
changed in qqbot.md for cross-doc consistency, not as a bug fix.
- Telegram reactions "disabled by default" matches code (default `"false"`).
- Matrix encryption "opt-in" matches code (empty env default \u2192 disabled).
- `pre_api_request` / `post_api_request` hooks do NOT exist in current code;
documented instead the real `on_session_finalize` / `on_session_reset`.
- SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it).
Validation:
- `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning).
- `ascii-guard lint docs` \u2014 124 files, 0 errors.
- 22 files changed, +317 / \u2212158.
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt
voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language
prompt control. Integrates through the existing provider chain:
- tools/tts_tool.py: new _generate_gemini_tts() calls the
generativelanguage REST endpoint with responseModalities=[AUDIO],
wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header,
then ffmpeg-converts to MP3 or Opus depending on output extension.
For .ogg output, libopus is forced explicitly so Telegram voice
bubbles get Opus (ffmpeg defaults to Vorbis for .ogg).
- hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider
option in the curses-based 'hermes tools' UI.
- hermes_cli/setup.py: adds gemini to the setup wizard picker, tool
status display, and API key prompt branch (accepts existing
GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set).
- tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header
wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model
overrides, snake_case vs camelCase inlineData handling, HTTP error
surfacing, and empty-audio edge cases.
- docs: TTS features page updated to list seven providers with the new
gemini config block and ffmpeg notes.
Live-tested against api key against gemini-2.5-flash-preview-tts: .wav,
.mp3, and Telegram-compatible .ogg (Opus codec) all produce valid
playable audio.
- New page: user-guide/features/tool-gateway.md covering eligibility,
setup (hermes model, hermes tools, manual config), how use_gateway
works, precedence, switching back, status checking, self-hosted
gateway env vars, and FAQ
- Added to sidebar under Features (top-level, before Core category)
- Cross-references from: overview.md, tools.md, browser.md,
image-generation.md, tts.md, providers.md, environment-variables.md
- Added Nous Tool Gateway subsection to env vars reference with
TOOL_GATEWAY_DOMAIN, TOOL_GATEWAY_SCHEME, TOOL_GATEWAY_USER_TOKEN,
and FIRECRAWL_GATEWAY_URL
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes#1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference
- Docusaurus with custom amber/gold theme matching the landing page branding
- GitHub Actions workflow to deploy landing page + docs to GitHub Pages
- Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com
- Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/)
- Auto-deploy on push to main when website/ or landingpage/ changes