Commit graph

5006 commits

Author SHA1 Message Date
Teknium
6fa197f973
Merge pull request #1298 from NousResearch/hermes/hermes-aa653753
fix: clearer terminal backend requirement errors
2026-03-14 06:05:58 -07:00
Oktay Aydin
00a0f18544 fix: clearer terminal backend requirement errors
Salvaged from PR #979 onto current main.

Preserve the current terminal backend checks while surfacing actionable
preflight errors for unknown TERMINAL_ENV values, missing SSH host/user
configuration, and missing Modal credentials/config. Tighten the modal
regression test so it deterministically exercises the config-missing
path.
2026-03-14 06:04:39 -07:00
teknium1
523a1b6faf merge: salvage PR #327 voice mode branch
Merge contributor branch feature/voice-mode onto current main for follow-up fixes.
2026-03-14 06:03:07 -07:00
teknium1
dd6a5732e7 docs: fix salvaged PR #980 troubleshooting details
Correct the PowerShell UTF-8 snippet in the new Windows encoding tip
and soften the Docker CLI wording to match Hermes' actual lookup
behavior.
2026-03-14 06:02:57 -07:00
aydnOktay
767b5463f9 docs: add terminal backend and windows troubleshooting 2026-03-14 06:01:22 -07:00
Teknium
acc669645f
Merge pull request #1294 from NousResearch/hermes/hermes-315847fd
fix(update): salvage autostash update flow from PR #978
2026-03-14 05:59:03 -07:00
teknium1
42c778b5eb fix(update): warn and prompt before restoring autostash
Add a restore prompt for interactive updates, keep the stash when the user declines, and print a post-restore warning that local changes were reapplied on top of updated code.
2026-03-14 05:50:18 -07:00
smillunchick
f764c7135d fix: auto-stash local changes during updates 2026-03-14 05:44:48 -07:00
Teknium
b646440ca0
fix(mcp): resolve npx stdio connection failures (#1291)
Salvaged from PR #977 onto current main.
Preserves the MCP stdio command resolution and improved error diagnostics,
with deterministic regression tests for the npx/node PATH cases.

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-14 05:44:00 -07:00
0xbyt4
92c14ec4b0 fix(test): add missing voice state attrs to CLI stub in skin tests
The rebase added voice prompt checks to _get_tui_prompt_fragments but
the test stub was missing _voice_recording, _voice_processing and
_voice_mode attributes, causing AttributeError.
2026-03-14 15:00:45 +03:00
0xbyt4
eb34c0b09a fix: voice pipeline hardening — 7 bug fixes with tests
1. Anthropic + ElevenLabs TTS silence: forward full response to TTS
   callback for non-streaming providers (choices first, then native
   content blocks fallback).

2. Subprocess timeout kill: play_audio_file now kills the process on
   TimeoutExpired instead of leaving zombie processes.

3. Discord disconnect cleanup: leave all voice channels before closing
   the client to prevent leaked state.

4. Audio stream leak: close InputStream if stream.start() fails.

5. Race condition: read/write _on_silence_stop under lock in audio
   callback thread.

6. _vprint force=True: show API error, retry, and truncation messages
   even during streaming TTS.

7. _refresh_level lock: read _voice_recording under _voice_lock.
2026-03-14 14:27:21 +03:00
0xbyt4
7a24168080 fix: add missing choices/Choice to discord mock in test_discord_free_response
The mock's app_commands SimpleNamespace lacked choices and Choice attrs,
causing xdist test ordering failures when this mock loaded before
test_discord_slash_commands.
2026-03-14 14:27:21 +03:00
0xbyt4
cc0a453476 fix: address PR review round 5 — streaming guard, VC auth, history prefix, auto-TTS control
1. Gate _streaming_api_call to chat_completions mode only — Anthropic and
   Codex fall back to _interruptible_api_call. Preserve Anthropic base_url
   across all client rebuild paths (interrupt, fallback, 401 refresh).

2. Discord VC synthetic events now use chat_type="channel" instead of
   defaulting to "dm" — prevents session bleed into DM context.
   Authorization runs before echoing transcript. Sanitize @everyone/@here
   in voice transcripts.

3. CLI voice prefix ("[Voice input...]") is now API-call-local only —
   stripped from returned history so it never persists to session DB or
   resumed sessions.

4. /voice off now disables base adapter auto-TTS via _auto_tts_disabled_chats
   set — voice input no longer triggers TTS when voice mode is off.
2026-03-14 14:27:21 +03:00
0xbyt4
35748a2fb0 fix: address PR review round 4 — remove web UI, fix audio/import/interface issues
Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB
enum) per maintainer request — Nous is building their own official chat UI.

Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent
indefinite hang when audio device stalls (consistent with play_beep()).

Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability
checks instead of module-level imports that trigger heavy native library
loading (CUDA/cuDNN) at import time.

Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs
to Telegram send_voice() so all adapters accept metadata uniformly.

Fix 4: Make session loading resilient to removed platform enum values — skip
entries with unknown platforms instead of crashing the entire gateway.
2026-03-14 14:27:21 +03:00
0xbyt4
1ad5e0ed15 feat: add voice channel awareness — inject participant and speaking state into agent context 2026-03-14 14:27:21 +03:00
0xbyt4
49f3f0fc62 fix: add choices/Choice to discord mock for /voice slash command test 2026-03-14 14:27:21 +03:00
0xbyt4
e3126aeb40 fix: STT consistency — web.py model param, error matching, local provider key
- web.py: pass stt_model from config like discord.py and run.py do
- run.py: match new error messages (No STT provider / not set)
- _transcribe_local: add missing "provider": "local" to return dict
2026-03-14 14:27:21 +03:00
0xbyt4
41162e0aca fix: prevent shutdown deadlock and unblockable Ctrl+C on exit
Move stream close outside the lock in shutdown() to prevent deadlock
when audio callback tries to acquire the same lock. Replace single
t.join(timeout) with a polling loop (0.1s intervals) so KeyboardInterrupt
is not blocked during stream cleanup.
2026-03-14 14:27:21 +03:00
0xbyt4
69cb373864 fix: update /voice status to show correct STT provider
Voice status was hardcoded to check API keys only. Now uses the actual
provider resolution (local/groq/openai) so it correctly shows
"local faster-whisper" when installed instead of "Groq" or "MISSING".
2026-03-14 14:27:21 +03:00
0xbyt4
eb052b1b42 fix: add explicit metadata param to Discord send_voice signature 2026-03-14 14:27:21 +03:00
0xbyt4
b8f8d3ef9e feat: integrate faster-whisper local STT with three-provider fallback
Merge main's faster-whisper (local, free) with our Groq support into a
unified three-provider STT pipeline: local > groq > openai.

Provider priority ensures free options are tried first. Each provider
has its own transcriber function with model auto-correction, env-
overridable endpoints, and proper error handling.

74 tests cover the full provider matrix, fallback chains, model
correction, config loading, validation edge cases, and dispatch.
2026-03-14 14:27:21 +03:00
0xbyt4
c433c89d7d fix: demote RTP debug logs to DEBUG and isolate web sessions
- Change RTP packet logging from INFO to DEBUG level to reduce noise
  (SPEAKING events remain at INFO as they are important lifecycle events)
- Use per-session chat_id (web_{session_id}) instead of shared "web"
  to isolate conversation context between simultaneous web users
2026-03-14 14:27:21 +03:00
0xbyt4
fa2c825e2f fix: isolate WEB_UI_HOST env var in test and handle empty string
- Patch WEB_UI_HOST in test_web_defaults to avoid env leak
- Handle empty WEB_UI_HOST string in config (fall back to 127.0.0.1)
2026-03-14 14:27:21 +03:00
0xbyt4
5b47b87c42 fix: show only reachable URLs in Web UI startup message
When bound to 127.0.0.1, only show localhost URL instead of listing
unreachable network interfaces. Add hint about WEB_UI_HOST=0.0.0.0
for phone/tablet access. Add VPN/multi-interface and token exposure
tests (11 new tests).
2026-03-14 14:27:21 +03:00
0xbyt4
a21f518c0b fix: hide configured token value in Web UI startup log
Only print the access token when auto-generated (user needs it to
log in). When set via WEB_UI_TOKEN env var, just confirm it is set
without exposing the value in console output.
2026-03-14 14:27:21 +03:00
0xbyt4
44abe852fb fix: add macOS Homebrew Opus fallback and fix shutdown dict iteration
- Add Homebrew library path fallback when ctypes.util.find_library fails
  on macOS (Apple Silicon + Intel paths, guarded by platform check)
- Fix RuntimeError in gateway stop() by iterating over dict copy
- Update Opus tests to verify find_library-first + conditional fallback
2026-03-14 14:27:21 +03:00
0xbyt4
c797314fcf test: add security and hardening tests for voice mode fixes
- Path traversal sanitization (Path.name strips ../)
- Media endpoint authentication (401 without token, 404 on traversal)
- hmac.compare_digest usage verification (no == for tokens)
- DOMPurify XSS prevention in HTML template
- Default bind 127.0.0.1 (adapter and config)
- /remote-control token hiding in group chats
- Opus find_library instead of hardcoded paths
- Opus decode error logging (no silent swallow)
- Interrupt _vprint force=True on all 6 calls
- Anthropic interrupt handler in both API call paths
- Update test_web_defaults for new 127.0.0.1 default
2026-03-14 14:27:21 +03:00
0xbyt4
0ff1b4ade2 fix: harden web gateway security and fix error swallowing
- Use hmac.compare_digest for timing-safe token comparison (3 endpoints)
- Default bind to 127.0.0.1 instead of 0.0.0.0
- Sanitize upload filenames with Path.name to prevent path traversal
- Add DOMPurify to sanitize marked.parse() output against XSS
- Replace add_static with authenticated media handler
- Hide token in group chats for /remote-control command
- Use ctypes.util.find_library for Opus instead of hardcoded paths
- Add force=True to 5 interrupt _vprint calls for visibility
- Log Opus decode errors and voice restart failures instead of swallowing
2026-03-14 14:27:21 +03:00
0xbyt4
d646442692 fix: restore Anthropic interrupt handler in _interruptible_api_call
Rebase auto-merge silently overwrote main's Anthropic-aware interrupt
handler with the older OpenAI-only version. Without this fix, interrupting
an Anthropic API call closes the wrong client and leaves token generation
running on the Anthropic side.
2026-03-14 14:27:21 +03:00
0xbyt4
0a8985acf9 fix: add missing load_config import in _show_voice_status 2026-03-14 14:27:21 +03:00
0xbyt4
2c84979d77 refactor: extract get_stt_model_from_config helper to eliminate DRY violation
Duplicated YAML config parsing for stt.model existed in gateway/run.py
and gateway/platforms/discord.py. Moved to a single helper in
transcription_tools.py and added 5 tests covering all edge cases.
2026-03-14 14:27:21 +03:00
0xbyt4
3260413cc7 docs: add STT override env vars to .env.example 2026-03-14 14:27:20 +03:00
0xbyt4
238a431545 fix: make STT config env-overridable and fix doc issues
Code fixes:
- STT model, Groq base URL, and OpenAI STT base URL are now
  configurable via env vars (STT_GROQ_MODEL, STT_OPENAI_MODEL,
  GROQ_BASE_URL, STT_OPENAI_BASE_URL) instead of hardcoded
- Gateway and Discord VC now read stt.model from config.yaml
  (previously only CLI did this — gateway always used defaults)

Doc fixes:
- voice-mode.md: move Web UI troubleshooting to web.md (was duplicated)
- voice-mode.md: simplify "How It Works" for end users (remove NaCl,
  DAVE, RTP internals)
- voice-mode.md: clarify STT priority (OpenAI used first if both keys
  set, Groq recommended for free tier)
- voice-mode.md: document new STT env overrides in config reference
- web.md: remove duplicate Quick Start / Step 1-3 sections
- web.md: add mobile HTTPS mic workarounds (moved from voice-mode.md)
- web.md: clarify STT fallback order
2026-03-14 14:27:20 +03:00
0xbyt4
79ed0effdd docs: fix 3 inaccuracies found during code-vs-docs audit
- voice-mode.md: Discord sends native voice bubbles (OGG/Opus flags=8192),
  not MP3 file attachments. Falls back to file only if voice API fails.
- discord.md: Bot requires @mention by default in server channels
  (DISCORD_REQUIRE_MENTION=true). Previous text incorrectly said no
  mention needed.
- index.md: Fix broken ASCII architecture diagram alignment after
  adding Web adapter box.
2026-03-14 14:27:20 +03:00
0xbyt4
9722bd8be0 fix: 8 voice pipeline bugs with tests proving each fix
1. VoiceReceiver.stop() now acquires _lock before clearing shared state
   to prevent race with _on_packet on the socket reader thread
2. _packet_debug_count moved from class-level to instance-level to avoid
   cross-instance race condition in multi-guild setups
3. play_in_voice_channel uses asyncio.get_running_loop() instead of
   deprecated asyncio.get_event_loop()
4. _send_voice_reply uses uuid for filenames instead of time-based names
   that can collide when two replies happen in the same second
5. Voice timeout now notifies runner via _on_voice_disconnect callback
   so runner cleans up _voice_mode state (prevents orphaned TTS replies)
6. play_in_voice_channel adds PLAYBACK_TIMEOUT (120s) to prevent
   infinite blocking when FFmpeg callback is never called
7. _send_voice_reply moves temp file cleanup to finally block so files
   are always cleaned up even when send_voice/play raises
8. Base adapter auto-TTS wraps play_tts in try/finally with os.remove
   to clean up generated audio files after playback

18 new tests (120 total voice tests)
2026-03-14 14:27:20 +03:00
0xbyt4
c925d2ee76 fix: voice pipeline thread safety and error handling bugs
- Add lock protection around VoiceReceiver buffer writes in _on_packet
  to prevent race condition with check_silence on different threads
- Wire _voice_input_callback BEFORE join_voice_channel to avoid
  losing voice input during the join window
- Add try/except around leave_voice_channel to ensure state cleanup
  (voice_mode, callback) even if leave raises an exception
- Guard against empty text after markdown stripping in base.py auto-TTS
- Add 11 tests proving each bug and verifying the fix
2026-03-14 14:27:20 +03:00
0xbyt4
34c324ff59 fix(test): use real _strip_markdown_for_tts instead of duplicated copy
- Import from tools.tts_tool instead of reimplementing the logic
- Fix test_truncates_long_text: truncation is the caller's job, not the function's
- Remove unused re import
2026-03-14 14:27:20 +03:00
0xbyt4
86ddaaee9c fix: extract voice reply logic and add comprehensive tests
- Fix tempfile.mktemp() TOCTOU race in Discord voice input (use NamedTemporaryFile)
- Extract voice reply decision from _handle_message into _should_send_voice_reply()
- Rewrite TestAutoVoiceReply to call real method instead of testing a copy
- Add 59 new tests: VoiceReceiver, VC commands, adapter methods, streaming TTS
2026-03-14 14:27:20 +03:00
0xbyt4
0d56b79685 docs: add firewall and mobile HTTPS troubleshooting for Web UI
- macOS firewall may block LAN access to Web UI
- Mobile browsers require HTTPS for microphone API
- Document workarounds: Android Chrome flag, mkcert self-signed cert,
  Caddy reverse proxy, SSH tunnel for iOS
2026-03-14 14:27:20 +03:00
0xbyt4
3431f73c96 fix: show mic button on mobile Web UI with HTTPS warning
Mobile browsers require HTTPS for navigator.mediaDevices API.
Instead of hiding the mic button (confusing UX), show it as dimmed
and display an informative message when tapped explaining the HTTPS
requirement.
2026-03-14 14:27:20 +03:00
0xbyt4
fbf47e9ff6 fix: allow voice reply in Discord VC despite skip_double guard
When bot is in a Discord voice channel, both base auto-TTS and Discord
play_tts override skip audio. The skip_double guard was also blocking
the runner's _send_voice_reply, resulting in zero audio output in VC.

Now skip_double is overridden when the bot is actively connected to a
voice channel, allowing play_in_voice_channel to handle TTS.

Add comprehensive test matrix covering all platform x input x mode
combinations with full decision table documentation.
2026-03-14 14:27:20 +03:00
0xbyt4
dcb84a8d30 test: add double TTS prevention tests for voice reply logic
- Update TestAutoVoiceReply to include skip_double logic: voice input
  is handled by base adapter auto-TTS, gateway runner skips to prevent
  duplicate audio
- Add TestDiscordPlayTtsSkip: verifies Discord adapter skips play_tts
  when bot is in a voice channel (VC playback handled by runner)
- Add TestWebPlayTts: verifies Web adapter sends invisible play_audio
  instead of voice bubble
2026-03-14 14:27:20 +03:00
0xbyt4
095815d520 fix: skip gateway voice reply for all platforms on voice input
Base adapter auto-TTS already generates and sends audio for voice
messages in _process_message_background. The gateway runner's
_send_voice_reply was causing double audio on all platforms (not
just Web). Now skip_double applies to any voice input regardless
of platform.
2026-03-14 14:27:20 +03:00
0xbyt4
62e75cd158 fix: skip duplicate TTS file attachment when bot is in Discord voice channel
Override play_tts in DiscordAdapter to no-op when connected to a voice
channel for the same guild. The gateway runner already plays TTS audio
in the VC via play_in_voice_channel, so the base adapter's fallback
to send_voice (file attachment) was causing double audio output.
2026-03-14 14:27:20 +03:00
0xbyt4
815e83952e fix: prevent double TTS on Web UI voice messages
When voice mode is enabled and user sends a voice message on Web UI,
both the base adapter auto-TTS (play_audio) and the gateway voice reply
(send_voice) would fire, causing duplicate audio playback. Skip the
gateway voice reply for Web platform voice input since base adapter
already handles it.
2026-03-14 14:27:20 +03:00
0xbyt4
e21a13488b docs: add Discord DM usage and mention requirement to voice mode guide
- Document DM vs server channel interaction modes
- Explain @mention requirement and how to select bot user vs role
- Add DISCORD_REQUIRE_MENTION and DISCORD_FREE_RESPONSE_CHANNELS config
- Add troubleshooting entry for bot not responding in server channels
2026-03-14 14:27:20 +03:00
0xbyt4
1b10c3711d fix: accept **kwargs in send_voice for Discord and Slack adapters
play_tts base class forwards metadata via **kwargs to send_voice,
but Discord and Slack adapters did not accept extra keyword arguments,
causing TypeError and silent message handling failure.

Also fix test_web_defaults to patch correct env var (WEB_UI_TOKEN).
2026-03-14 14:27:20 +03:00
0xbyt4
f078cb4038 fix(test): isolate WEB_TOKEN env var in test_web_defaults 2026-03-14 14:27:20 +03:00
0xbyt4
6205f061fe test: add comprehensive tests for web gateway adapter
32 tests covering:
- Platform enum and config env overrides
- WebAdapter init, port/host/token parsing, auto-token generation
- aiohttp server lifecycle (connect/disconnect)
- HTML serving on GET /
- WebSocket auth handshake (success/failure)
- WebSocket text message routing to handler
- send/send_voice/play_tts broadcast payloads
- hermes-web toolset registration
- Groq STT fallback in transcription_tools
- LAN IP detection
- Media directory management
2026-03-14 14:27:20 +03:00
0xbyt4
c477f660da feat: add continuous voice mode with VAD silence detection
- Voice mode: press mic once to enter, press again to exit
- VAD (Voice Activity Detection) auto-stops recording after 1.5s silence
- Continuous loop: speak → transcribe → agent responds → TTS plays → auto-listen
- Voice mode UI: input bar hides, large mic button centered
- Auto-restart listening when TTS playback finishes
- Fallback: restart listening on text response if no TTS arrives
2026-03-14 14:27:20 +03:00