mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix(tui): voice TTS speak-back + transcript-key bug + auto-submit
Three issues surfaced during end-to-end testing of the CLI-parity voice
loop and are fixed together because they all blocked "speak → agent
responds → TTS reads it back" from working at all:
1. Wrong result key (hermes_cli/voice.py)
transcribe_recording() returns {"success": bool, "transcript": str},
matching cli.py:_voice_stop_and_transcribe. The wrapper was reading
result.get("text"), which is None, so every successful Groq / local
STT response was thrown away and the 3-strikes halt fired after
three silent-looking cycles. Fixed by reading "transcript" and also
honouring "success" like the CLI does. Updated the loop simulation
tests to return the correct shape.
2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py)
The TUI had a voice.toggle "tts" subcommand but nothing downstream
actually read the flag — agent replies never spoke. Mirrored
cli.py:8747-8754's dispatch: on message.complete with status ==
"complete", if _voice_tts_enabled() is true, spawn a daemon thread
running speak_text(response). Rewrote speak_text as a full port of
cli.py:_voice_speak_response — same markdown-strip regex pipeline
(code blocks, links, bold/italic, inline code, headers, list bullets,
horizontal rules, excessive newlines), same 4000-char cap, same
explicit mp3 output path, same MP3-over-OGG playback choice (afplay
misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS
audible output byte-for-byte identical to the classic CLI.
3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts)
The voice.transcript handler branched on prev input via a setInput
updater and fired submitRef.current inside the updater when prev was
empty. React strict mode double-invokes state updaters, which would
queue the submit twice; and when the composer had any content the
transcript was merely appended — the agent never saw it. CLI
_pending_input.put(transcript) unconditionally feeds the transcript
as the next turn, so match that: always clear the composer and
setTimeout(() => submitRef.current(text), 0) outside any updater.
Side effect can't run twice this way, and a half-typed draft on the
rare occasion is a fair trade vs. silently dropping the turn.
Also added peak_rms to the rec.stop debug line so "recording too quiet"
is diagnosable at a glance when HERMES_VOICE_DEBUG=1.
This commit is contained in:
parent
04c489b587
commit
42ff785771
3 changed files with 84 additions and 38 deletions
|
|
@ -2126,6 +2126,28 @@ def _(rid, params: dict) -> dict:
|
|||
if rendered:
|
||||
payload["rendered"] = rendered
|
||||
_emit("message.complete", sid, payload)
|
||||
|
||||
# CLI parity: when voice-mode TTS is on, speak the agent reply
|
||||
# (cli.py:_voice_speak_response). Only the final text — tool
|
||||
# calls / reasoning already stream separately and would be
|
||||
# noisy to read aloud.
|
||||
if (
|
||||
status == "complete"
|
||||
and isinstance(raw, str)
|
||||
and raw.strip()
|
||||
and _voice_tts_enabled()
|
||||
):
|
||||
try:
|
||||
from hermes_cli.voice import speak_text
|
||||
|
||||
spoken = raw
|
||||
threading.Thread(
|
||||
target=speak_text, args=(spoken,), daemon=True
|
||||
).start()
|
||||
except ImportError:
|
||||
logger.warning("voice TTS skipped: hermes_cli.voice unavailable")
|
||||
except Exception as e:
|
||||
logger.warning("voice TTS dispatch failed: %s", e)
|
||||
except Exception as e:
|
||||
_emit("error", sid, {"message": str(e)})
|
||||
finally:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue