hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

Author	SHA1	Message	Date
0xbyt4	cbe4c23efa	fix: Discord voice bubble + edge-tts mp3/ogg format mismatch - Send Discord voice messages with flags=8192 and waveform metadata so they render as native voice bubbles instead of file attachments - Use .mp3 output path for TTS so edge-tts opus conversion works correctly (edge always outputs mp3, convert was skipped for .ogg) - Use actual file_path from TTS result after potential opus conversion	2026-03-14 14:27:20 +03:00
0xbyt4	f6cf4ca826	feat: add /voice slash command to Discord + fix cross-platform send_voice - Register /voice as Discord slash command with mode choices - Fix _send_voice_reply to handle adapters that don't accept metadata parameter (Discord) by inspecting the method signature at runtime	2026-03-14 14:27:20 +03:00
0xbyt4	d80da5ddd8	feat: add /voice command for auto voice reply in Telegram gateway - /voice on: reply with voice when user sends voice messages - /voice tts: reply with voice to all messages - /voice off: disable, text-only replies - /voice status: show current mode - Per-chat state persisted to gateway_voice_mode.json - Dedup: skips auto-reply if agent already called text_to_speech tool - drop_pending_updates=True to ignore stale Telegram messages on restart - 25 tests covering command handler, reply logic, and edge cases	2026-03-14 14:27:20 +03:00
0xbyt4	8aab13d12d	refactor: remove dead _generation counter from AudioRecorder The counter was incremented in start/stop/cancel but never read anywhere in the codebase. The race condition it was meant to guard against is practically impossible with the persistent stream design.	2026-03-14 14:27:20 +03:00
0xbyt4	39a77431e2	fix: use shutdown() instead of cancel() on CLI exit to release persistent audio stream	2026-03-14 14:27:20 +03:00
0xbyt4	eb79dda04b	fix: persistent audio stream and silence detection improvements - Keep InputStream alive across recordings to avoid CoreAudio hang on repeated open/close cycles on macOS. New _ensure_stream() creates the stream once; start()/stop()/cancel() only toggle frame collection. - Add _close_stream_with_timeout() with daemon thread to prevent stream.stop()/close() from blocking indefinitely. - Add generation counter to detect stale stream-open completions after cancel or restart. - Run recorder.cancel() in background thread from Ctrl+C handler to keep the event loop responsive. - Add shutdown() method called on /voice off to release audio resources. - Fix silence timer reset during active speech: use dip tolerance for _resume_start tracker so natural speech pauses (< 0.3s) don't prevent the silence timer from being reset. - Update tests to match persistent stream behavior.	2026-03-14 14:27:20 +03:00
0xbyt4	eec04d180a	fix(test): update play_beep test to match polling-based implementation play_beep was changed from sd.wait() to a poll loop + sd.stop() in 302e1fe but the test was not updated. Now asserts sd.stop() instead of sd.wait().	2026-03-14 14:27:20 +03:00
0xbyt4	8b57a3cb7e	fix: add max recording timeout to prevent infinite wait in quiet environments AudioRecorder now auto-stops after 15 seconds if no speech is detected (_has_spoken remains False). In quiet environments where ambient RMS never exceeds the silence threshold (200), the recording would wait indefinitely. The new _max_wait parameter fires the silence callback after the timeout, triggering the normal "No speech detected" flow.	2026-03-14 14:27:20 +03:00
0xbyt4	c3dc4448bf	fix: disable STT retries and stop continuous mode after 3 silent cycles - Set max_retries=0 on the STT OpenAI client. The SDK default (2) honors Groq's retry-after header (often 53s), blocking the thread for up to ~106s on rate limits. Voice STT should fail fast, not retry silently. - Stop continuous recording mode after 3 consecutive no-speech cycles to prevent infinite restart loops when nobody is talking.	2026-03-14 14:27:20 +03:00
0xbyt4	0a89933f9b	fix: add STT timeout, move finally restart to thread, guard exit on recording - Set OpenAI client timeout=30s in transcribe_audio() — default 600s blocks _voice_processing for 10 min if Groq/OpenAI stalls - Move _voice_start_recording in _voice_stop_and_transcribe finally block to a daemon thread (same pattern as Ctrl+B handler and process_loop) - Add _should_exit guard at top of _voice_start_recording so all 4 call sites respect shutdown without individual checks	2026-03-14 14:27:20 +03:00
0xbyt4	bcf4513cb3	fix: add timeout to play_beep sd.wait and wrap silence callback in try-except - Replace sd.wait() with a poll loop + sd.stop() in play_beep(). sd.wait() calls Event.wait() without timeout — hangs forever if the audio device stalls. Poll with a 2s ceiling and force-stop instead. - Wrap _on_silence callback in try-except so exceptions are logged instead of silently lost in the daemon thread. Prevents recording state from becoming inconsistent on unexpected errors.	2026-03-14 14:27:20 +03:00
0xbyt4	9d58cafec9	fix: move process_loop voice restart to daemon thread, use _cprint consistently - process_loop's continuous mode restart called _voice_start_recording() directly, blocking the loop if play_beep/sd.wait hangs — queued user input would stall silently. Dispatch to daemon thread like Ctrl+B handler. - Replace print() with _cprint() in _handle_voice_command for consistency with the rest of the voice mode code.	2026-03-14 14:27:20 +03:00
0xbyt4	d0e3b39e69	fix: prevent Ctrl+B key handler from blocking prompt_toolkit event loop The handle_voice_record key binding runs in prompt_toolkit's event-loop thread. When silence auto-stopped recording, _voice_recording was False but recorder.stop() still held AudioRecorder._lock. A concurrent Ctrl+B press entered the START path and blocked on that lock, freezing all keyboard input. Three changes: - Set _voice_processing atomically with _voice_recording=False in _voice_stop_and_transcribe to close the race window - Add _voice_processing guard in the START path to prevent starting while stop/transcribe is still running - Dispatch _voice_start_recording to a daemon thread so play_beep (sd.wait) and AudioRecorder.start (lock acquire) never block the event loop	2026-03-14 14:27:20 +03:00
0xbyt4	ecc3dd7c63	test: add comprehensive voice mode test coverage (86 tests) - Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py - Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup, browser_tool signal handler removal) - Add real behavior tests for CLI voice methods via _make_voice_cli() fixture: TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7), TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7), TestVoiceStopAndTranscribeReal (12)	2026-03-14 14:27:20 +03:00
0xbyt4	6e51729c4c	fix: remove browser_tool signal handlers that cause voice mode deadlock browser_tool.py registered SIGINT/SIGTERM handlers that called sys.exit() at module import time. When a signal arrived during a lock acquisition (e.g. AudioRecorder._lock in voice mode), SystemExit was raised inside prompt_toolkit's async event loop, corrupting coroutine state and making the process unkillable (required SIGKILL). atexit handler already ensures browser sessions are cleaned up on any normal exit path, so the signal handlers were redundant and harmful.	2026-03-14 14:27:20 +03:00
0xbyt4	ddfd6e0c59	fix: resolve 6 voice mode bugs found during audit - edge_tts NameError: _generate_edge_tts now calls _import_edge_tts() instead of referencing bare module name (tts_tool.py) - TTS thread leak: chat() finally block sends sentinel to text_queue, sets stop_event, and joins tts_thread on exception paths (cli.py) - output_stream leak: moved close() into finally block so audio device is released even on exception (tts_tool.py) - Ctrl+C continuous mode: cancel handler now resets _voice_continuous to prevent auto-restart after user cancels recording (cli.py) - _disable_voice_mode: now calls stop_playback() and sets _voice_tts_done so TTS stops when voice mode is turned off (cli.py) - _show_voice_status: reads record key from config instead of hardcoding Ctrl+B (cli.py)	2026-03-14 14:27:20 +03:00
0xbyt4	a78249230c	fix: address voice mode PR review (streaming TTS, prompt cache, _vprint) Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with lazy import function calls (_import_elevenlabs, _import_sounddevice). The old constants no longer exist in tts_tool -- the try/except silently swallowed the ImportError, leaving streaming TTS dead. Bug B: Use user message prefix instead of modifying system prompt for voice mode instruction. Changing ephemeral_system_prompt mid-session invalidates the prompt cache. Now the concise-response hint is prepended to the user_message passed to run_conversation while conversation_history keeps the original text. Minor: Add force parameter to _vprint so critical error messages (max retries, non-retryable errors, API failures) are always shown even during streaming TTS playback. Tests: 15 new tests in test_voice_cli_integration.py covering all three fixes -- lazy import activation, message prefix behavior, history cleanliness, system prompt stability, and AST verification that all critical _vprint calls use force=True.	2026-03-14 14:27:20 +03:00
0xbyt4	fc893f98f4	fix: wrap sd.InputStream in try-except and fix config key name - AudioRecorder.start() now catches InputStream errors gracefully with a clear error message about microphone availability - Fix config key mismatch: cli.py was reading "push_to_talk_key" but config.py defines "record_key" -- now consistent - Add format conversion from config format ("ctrl+b") to prompt_toolkit format ("c-b")	2026-03-14 14:27:20 +03:00
0xbyt4	a8838a7ae5	fix: replace all hardcoded Ctrl+R references with Ctrl+B	2026-03-14 14:27:20 +03:00
0xbyt4	b859dfab16	fix: address voice mode review feedback 1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and openai are never imported at module level. Each is imported only when the feature is explicitly activated, preventing crashes in headless environments (SSH, Docker, WSL, no PortAudio). 2. No core agent loop changes: streaming TTS path extracted from _interruptible_api_call() into separate _streaming_api_call() method. The original method is restored to its upstream form. 3. Configurable key binding: push-to-talk key changed from Ctrl+R (conflicts with readline reverse-search) to Ctrl+B by default. Configurable via voice.push_to_talk_key in config.yaml. 4. Environment detection: new detect_audio_environment() function checks for SSH, Docker, WSL, and missing audio devices before enabling voice mode. Auto-disables with clear warnings in incompatible environments. 5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream, sd.OutputStream) wrapped in try/except with ImportError/OSError handling. Failures produce warnings, not crashes.	2026-03-14 14:27:20 +03:00
0xbyt4	143cc68946	fix(test): add /voice to EXPECTED_COMMANDS set in test_commands.py	2026-03-14 14:27:20 +03:00
0xbyt4	46db7aeffd	fix: streaming tool call parsing, error handling, and fake HA state mutation - Fix Gemini streaming tool call merge bug: multiple tool calls with same index but different IDs are now parsed as separate calls instead of concatenating names (e.g. ha_call_serviceha_call_service) - Handle partial results in voice mode: show error and stop continuous mode when agent returns partial/failed results with empty response - Fix error display during streaming TTS: error messages are shown in full response box even when streaming box was already opened - Add duplicate sentence filter in TTS: skip near-duplicate sentences from LLM repetition - Fix fake HA server state mutation: turn_on/turn_off/set_temperature correctly update entity states; temperature sensor simulates change when thermostat is adjusted	2026-03-14 14:27:20 +03:00
0xbyt4	404123aea7	feat: add persistent voice mode status bar below input area Shows voice state (recording, transcribing, TTS/continuous toggles) as a persistent toolbar using prompt_toolkit ConditionalContainer.	2026-03-14 14:27:20 +03:00
0xbyt4	b00c5949fc	fix: suppress verbose logs during streaming TTS, improve hallucination filter, stop continuous mode on errors - Add _vprint() helper to suppress log output when stream_callback is active - Expand Whisper hallucination filter with multi-language phrases and regex pattern for repetitive text - Stop continuous voice mode when agent returns a failed result (e.g. 429 rate limit)	2026-03-14 14:26:55 +03:00
0xbyt4	3a1b35ed92	fix: voice mode race conditions, temp file leak, think tag parsing - Atomic check-and-set for _voice_recording flag with _voice_lock - Guard _voice_stop_and_transcribe against concurrent invocation - Remove premature flag clearing from Ctrl+R handler - Clean up temp WAV files in finally block (_play_via_tempfile) - Use buffer-level regex for <think> block filtering (handles chunked tags) - Prevent /voice on prompt accumulation on repeated calls - Include Groq in STT key error message	2026-03-14 14:26:55 +03:00
0xbyt4	7d4b4e95f1	feat: sync text display with TTS audio playback Move screen output from stream_callback to display_callback called by TTS consumer thread. Text now appears sentence-by-sentence in sync with audio instead of streaming ahead at LLM speed. Removes quiet_mode hack.	2026-03-14 14:26:55 +03:00
0xbyt4	a15fa85248	fix: catch OSError on sounddevice import in voice_mode.py Same PortAudio fix as tts_tool.py — sounddevice raises OSError when the native library is missing on CI runners.	2026-03-14 14:26:30 +03:00
0xbyt4	fd4f229eab	fix: catch OSError on sounddevice import for CI without PortAudio sounddevice raises OSError (not ImportError) when the PortAudio C library is missing. This broke test collection on CI runners that have the Python package installed but lack the native library.	2026-03-14 14:26:30 +03:00
0xbyt4	179d9e1a22	feat: add streaming sentence-by-sentence TTS via ElevenLabs Stream audio to speaker as the agent generates tokens instead of waiting for the full response. First sentence plays within ~1-2s of agent starting to respond. - run_agent: add stream_callback to run_conversation/chat, streaming path in _interruptible_api_call accumulates chunks into mock ChatCompletion while forwarding content deltas to callback - tts_tool: add stream_tts_to_speaker() with sentence buffering, think block filtering, markdown stripping, ElevenLabs pcm_24000 streaming to sounddevice OutputStream - cli: wire up streaming TTS pipeline in chat(), detect elevenlabs provider + sounddevice availability, skip batch TTS when streaming is active, signal stop on interrupt Falls back to batch TTS for Edge/OpenAI providers or when elevenlabs/sounddevice are not available. Zero impact on non-voice mode (callback defaults to None).	2026-03-14 14:26:30 +03:00
0xbyt4	d7425343ee	fix: fix voice recording stuck in continuous mode - Track submitted state locally instead of using racy qsize() check - Allow Ctrl+R to stop recording even while agent is running - Add double-start guard to prevent concurrent recording attempts	2026-03-14 14:26:30 +03:00
0xbyt4	dad865e920	fix: fix silence detection bugs and add Phase 4 voice mode features Fix 3 critical bugs in silence detection: - Micro-pause tolerance now tracks dip duration (not time since speech start) - Peak RMS check in stop() prevents discarding recordings with real speech - Reduced min_speech_duration from 0.5s to 0.3s for reliable speech confirmation Phase 4 features: configurable silence params, visual audio level indicator, voice system prompt, tool call audio cues, TTS interrupt, continuous mode auto-restart, interruptable playback via Popen tracking.	2026-03-14 14:26:30 +03:00
0xbyt4	32b033c11c	feat: add silence filter, hallucination guard, and continuous mode control - Skip silent recordings before STT call (RMS check in AudioRecorder.stop) - Filter known Whisper hallucinations ("Thank you.", "Bye." etc.) - Continuous mode: Ctrl+R starts loop, Ctrl+R during recording exits it - Wait for TTS to finish before auto-restart to avoid recording speaker - Silence timeout increased to 3s for natural pauses - Tests: hallucination filter, silent recording skip, real speech passthrough	2026-03-14 14:25:28 +03:00
0xbyt4	bfd9c97705	feat: add Phase 4 low-latency features for voice mode - Audio cues: beep on record start (880Hz), double beep on stop (660Hz) - Silence detection: auto-stop recording after 3s of silence (RMS-based) - Continuous mode: auto-restart recording after agent responds - Ctrl+R starts continuous mode, Ctrl+R during recording exits it - Waits for TTS to finish before restarting to avoid recording speaker - Tests: 7 new tests for beep generation and silence detection	2026-03-14 14:25:28 +03:00
0xbyt4	a69bd55b5a	fix: isolate GROQ_API_KEY in test_missing_stt_key test The test was failing because GROQ_API_KEY leaked from the environment. Now both VOICE_TOOLS_OPENAI_KEY and GROQ_API_KEY are removed to properly test the "no STT key" scenario.	2026-03-14 14:25:28 +03:00
0xbyt4	c23928d089	fix: improve voice mode robustness and add integration tests - Show TTS errors to user instead of silently logging - Improve markdown stripping: code blocks, URLs, links, horizontal rules - Fix stripping order: process markdown links before removing URLs - Add threading.Lock for voice state variables (cross-thread safety) - Add 14 CLI integration tests (markdown stripping, command parsing, thread safety) - Total: 47 voice-related tests	2026-03-14 14:25:28 +03:00
0xbyt4	37b01ab964	test: add transcription_tools tests for multi-provider STT - Provider resolution: OpenAI priority, Groq fallback, no keys - Model auto-correction: Groq corrects OpenAI models and vice versa - Success path: transcription, API errors, whitespace stripping - 12 new tests, 33 total voice-related tests	2026-03-14 14:25:28 +03:00
0xbyt4	ea5b89825a	fix: voice mode TTS playback and keybinding issues - Change record key from c-@ to c-r (Ctrl+R) for macOS compatibility - Add missing tempfile and time imports that caused silent TTS crash - Use MP3 output for CLI TTS playback (afplay doesn't handle OGG well) - Strip markdown formatting from text before sending to TTS - Remove duplicate transcript echo in voice pipeline	2026-03-14 14:25:28 +03:00
0xbyt4	ec32e9a540	feat: add Groq STT support and fix voice mode keybinding - Add multi-provider STT support (OpenAI > Groq fallback) in transcription_tools - Auto-correct model selection when provider doesn't support the configured model - Change voice record key from Ctrl+Space to Ctrl+R (macOS compatibility) - Fix duplicate transcript echo in voice pipeline - Add GROQ_API_KEY to .env.example	2026-03-14 14:25:28 +03:00
0xbyt4	1a6fbef8a9	feat: add voice mode with push-to-talk and TTS output for CLI Implements Issue #314 Phase 2 & 3: - /voice command to toggle voice mode (on/off/tts/status) - Ctrl+Space push-to-talk recording via sounddevice - Whisper STT transcription via existing transcription_tools - Optional TTS response playback via existing tts_tool - Visual indicators in prompt (recording/transcribing/voice) - 21 unit tests, all mocked (no real mic/API) - Optional deps: sounddevice, numpy (pip install hermes-agent[voice])	2026-03-14 14:25:28 +03:00
Teknium	1a857123b3	feat(skills): add optional telephony skill with Twilio, SMS, and AI calls (#1289 ) * feat: improve context compaction handoff summaries Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path. * fix: clearer error when docker backend is unavailable * fix: preserve docker discovery in backend preflight Follow up on salvaged PR #940 by reusing find_docker() during the new availability check so non-PATH Docker Desktop installs still work. Add a regression test covering the resolved executable path. * test: make gateway async tests xdist-safe Replace sync test usage of asyncio.get_event_loop().run_until_complete() with asyncio.run() so tests do not depend on an ambient current event loop. Also create the email disconnect poll task inside a running loop. This fixes xdist/CI failures where workers have no current loop in MainThread. * feat(skills): add phone-calls skill for outbound AI voice calls Reformulated from core tool (PR #847 feedback) into a skill with a standalone helper script. No new dependencies — uses only Python stdlib. Two providers supported: - Bland.ai (default): simple setup, one API key - Vapi: flexible, better voice quality via ElevenLabs/Deepgram + Twilio Includes: - SKILL.md with full procedure, safety rules, provider docs, pitfalls - scripts/phone_call.py CLI helper (call, status, diagnose commands) * feat(skills): expand phone-calls into optional telephony skill Follow up on salvaged PR #965 by moving the capability into optional-skills and broadening it from outbound AI calling to a full telephony skill. Add Twilio number provisioning, env/state persistence, SMS/MMS, inbound SMS polling, Vapi import helpers, and a provider decision tree while keeping telephony out of core runtime code. * docs(skills): clarify Hermes TTS telephony workflow --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com> Co-authored-by: mormio <morganemoss@gmai.com>	2026-03-14 04:16:48 -07:00
Teknium	02752c83b4	Merge pull request #1287 from NousResearch/hermes/hermes-cc060dd9 fix(gateway): avoid slash-command crash with GatewayConfig	2026-03-14 04:13:56 -07:00
Teknium	a48ebc68f4	Merge pull request #1288 from NousResearch/hermes/hermes-de3d4e49-pr976 fix: reliably notify gateway users when updates finish	2026-03-14 04:13:13 -07:00
Teknium	b42ee3050e	Merge pull request #1290 from NousResearch/hermes/hermes-f48b210a fix(send_message): salvage and complete MEDIA delivery from #971	2026-03-14 04:12:54 -07:00
teknium1	5c9a84219d	fix: complete send_message MEDIA delivery salvage - prevent raw MEDIA tag leakage outside the gateway pipeline - make extract_media handle quoted/backticked paths and optional whitespace - send Telegram media natively with explicit error/warning handling - add regression tests for Telegram media dispatch and MEDIA parsing	2026-03-14 04:02:03 -07:00
quabug	50d6659392	fix: handle MEDIA tags in send_message tool for native file delivery The send_message tool's _send_telegram() sent MEDIA:<path> tags as literal text instead of delivering actual files. This fixes it by extracting MEDIA tags via BasePlatformAdapter.extract_media() and routing files to the appropriate Telegram Bot API method by extension. Changes: - send_message_tool: extract MEDIA tags and send files natively as photo/video/voice/audio/document based on file extension - send_message_tool: add per-file error handling and missing-file logging - send_message_tool: use cleaned text in fallback to avoid leaking tags - base.py extract_media: handle optional space after MEDIA: colon - base.py extract_media: strip surrounding backticks/quotes from paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 04:02:03 -07:00
Teknium	9525db913f	feat(skills): add X/Twitter xitter skill via upstream x-cli (#1285 ) * feat(skills): salvage xitter skill from PR #1065 Adapt the X/Twitter skill onto current main without vendoring an external CLI. Use upstream x-cli installation instructions, add a social-media category, and align credential/setup guidance with Hermes conventions. * docs(skills): explain X credential requirements in xitter skill Clarify why the official X flow needs five credentials and call out the setup/cost friction explicitly.	2026-03-14 04:00:27 -07:00
clabbe-bot	3126c60885	fix: notify gateway users when updates finish or fail	2026-03-14 03:59:05 -07:00
Teknium	cac238c2a3	Merge pull request #1286 from NousResearch/hermes/hermes-315847fd fix(patch): avoid corrupting pipe chars in v4a patch apply	2026-03-14 03:58:27 -07:00
teknium1	7e52e8eb54	fix(gateway): bridge quick commands into GatewayConfig runtime Follow-up on salvaged PR #975. Bridge quick_commands from config.yaml into load_gateway_config(), normalize non-dict quick command config at runtime, and add coverage for GatewayConfig round-trips plus config.yaml bridging. This makes the GatewayConfig quick-command fix complete for the real user-facing config path implicated by issue #973.	2026-03-14 03:57:25 -07:00
teknium1	96c250e538	test: cover pipe characters in v4a patch apply Add a regression test for apply_v4a_operations when read content contains a literal pipe character outside a line-number prefix.	2026-03-14 03:54:46 -07:00

... 2 3 4 5 6 ...

1794 commits