Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the
provider behaves like every other TTS backend end to end.
- tools/tts_tool.py: `_check_kittentts_available()` helper and wire
into `check_tts_requirements()`; extend Opus-conversion list to
include kittentts (WAV → Opus for Telegram voice bubbles); point the
missing-package error at `hermes setup tts`.
- hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech"
toolset picker, with a `kittentts` post_setup hook that auto-installs
the wheel + soundfile via pip.
- hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install
flow in `_setup_tts_provider()`, provider_labels entry, and status row
in the `hermes setup` summary.
- website/docs/user-guide/features/tts.md: add KittenTTS to the provider
table, config example, ffmpeg note, and the zero-config voice-bubble tip.
- tests/tools/test_tts_kittentts.py: 10 unit tests covering generation,
model caching, config passthrough, ffmpeg conversion, availability
detection, and the missing-package dispatcher branch.
E2E verified against the real `kittentts` wheel:
- WAV direct output (pcm_s16le, 24kHz mono)
- MP3 conversion via ffmpeg (from WAV)
- Telegram flow (provider in Opus-conversion list) produces
`codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the
`[[audio_as_voice]]` marker
- check_tts_requirements() returns True when kittentts is installed
Comprehensive audit of every reference/messaging/feature doc page against the
live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY,
TOOLSETS, tool registry, on-disk skills). Every fix was verified against code
before writing.
### Wrong values fixed (users would paste-and-fail)
- reference/environment-variables.md:
- DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192
actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`.
- MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual
`/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint).
- reference/toolsets-reference.md MCP example used the non-existent nested
`mcp: servers:` key \u2192 real key is the flat `mcp_servers:`.
- reference/skills-catalog.md listed ~20 bundled skills that no longer exist
on disk (all moved to `optional-skills/`). Regenerated the whole bundled
section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names.
- messaging/slack.md ":::info" callout claimed Slack has no
`free_response_channels` equivalent; both the env var and the yaml key are
in fact read.
- messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the
adapter only reads `extra.markdown_support` from config.yaml. Removed the
env var row and noted config-only nature.
- messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`.
### Missing coverage added
- Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in
PROVIDER_REGISTRY but undocumented everywhere. Added sections to
integrations/providers.md, rows to quickstart.md and fallback-providers.md.
- integrations/providers.md "Fallback Model" provider list now includes
gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock.
- reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER
enum in env-vars now include the same set.
- reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`.
Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`.
- reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added
`feishu_doc` and `feishu_drive` toolset sections.
- reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core
rows + all missing `hermes-<platform>` toolsets in the platform table
(bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin,
homeassistant, webhook, gateway). Fixed the `debugging` composite to
describe the actual `includes=[...]` mechanism.
- reference/optional-skills-catalog.md: added `fitness-nutrition`.
- reference/environment-variables.md: added NOUS_BASE_URL,
NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL,
XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE,
BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS,
DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS,
QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX.
- messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY,
HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT
_DELAY_SECONDS (all actively read by the adapter).
- messaging/matrix.md: documented MATRIX_REACTIONS (default true).
- messaging/telegram.md: removed the redundant second Webhook Mode section
that invented a `telegram.webhook_mode: true` yaml key the adapter does
not read.
- user-guide/features/hooks.md: added `on_session_finalize` and
`on_session_reset` (both emitted via invoke_hook but undocumented).
- user-guide/features/api-server.md: documented GET /health/detailed, the
`/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events
(10 routes that were live but undocumented).
- user-guide/features/fallback-providers.md: added `approval` and
`title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth
to the supported-providers table.
- user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add
oversight in #11942).
- user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`;
yaml example block gains `mistral:`, `gemini:`, `xai:` subsections.
Auxiliary-provider enum now enumerates all real registry entries.
- reference/faq.md: stale AIAgent/config examples bumped from
`nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to
`claude-opus-4.7`.
### Docs-site integrity
- guides/build-a-hermes-plugin.md referenced two nonexistent hooks
(`pre_api_request`, `post_api_request`). Replaced with the real
`on_session_finalize` / `on_session_reset` entries.
- messaging/open-webui.md and features/api-server.md had pre-existing
broken links to `/docs/user-guide/features/profiles` (actual path is
`/docs/user-guide/profiles`). Fixed.
- reference/skills-catalog.md had one `<1%` literal that MDX parsed as a
JSX tag. Escaped to `<1%`.
### False positives filtered out (not changed, verified correct)
- `/set-home` is a registered alias of `/sethome` \u2014 docs were fine.
- `hermes setup gateway` is valid syntax (`hermes setup \<section\>`);
changed in qqbot.md for cross-doc consistency, not as a bug fix.
- Telegram reactions "disabled by default" matches code (default `"false"`).
- Matrix encryption "opt-in" matches code (empty env default \u2192 disabled).
- `pre_api_request` / `post_api_request` hooks do NOT exist in current code;
documented instead the real `on_session_finalize` / `on_session_reset`.
- SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it).
Validation:
- `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning).
- `ascii-guard lint docs` \u2014 124 files, 0 errors.
- 22 files changed, +317 / \u2212158.
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt
voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language
prompt control. Integrates through the existing provider chain:
- tools/tts_tool.py: new _generate_gemini_tts() calls the
generativelanguage REST endpoint with responseModalities=[AUDIO],
wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header,
then ffmpeg-converts to MP3 or Opus depending on output extension.
For .ogg output, libopus is forced explicitly so Telegram voice
bubbles get Opus (ffmpeg defaults to Vorbis for .ogg).
- hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider
option in the curses-based 'hermes tools' UI.
- hermes_cli/setup.py: adds gemini to the setup wizard picker, tool
status display, and API key prompt branch (accepts existing
GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set).
- tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header
wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model
overrides, snake_case vs camelCase inlineData handling, HTTP error
surfacing, and empty-audio edge cases.
- docs: TTS features page updated to list seven providers with the new
gemini config block and ffmpeg notes.
Live-tested against api key against gemini-2.5-flash-preview-tts: .wav,
.mp3, and Telegram-compatible .ogg (Opus codec) all produce valid
playable audio.
- New page: user-guide/features/tool-gateway.md covering eligibility,
setup (hermes model, hermes tools, manual config), how use_gateway
works, precedence, switching back, status checking, self-hosted
gateway env vars, and FAQ
- Added to sidebar under Features (top-level, before Core category)
- Cross-references from: overview.md, tools.md, browser.md,
image-generation.md, tts.md, providers.md, environment-variables.md
- Added Nous Tool Gateway subsection to env vars reference with
TOOL_GATEWAY_DOMAIN, TOOL_GATEWAY_SCHEME, TOOL_GATEWAY_USER_TOKEN,
and FIRECRAWL_GATEWAY_URL
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes#1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference
- Docusaurus with custom amber/gold theme matching the landing page branding
- GitHub Actions workflow to deploy landing page + docs to GitHub Pages
- Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com
- Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/)
- Auto-deploy on push to main when website/ or landingpage/ changes