fix(tui): restore voice push-to-talk parity (#20897)

* fix(tui): restore classic CLI voice push-to-talk parity

(cherry picked from commit 93b9ae301b)

* fix(tui): harden voice push-to-talk stop flow

Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.

* fix(tui): preserve silent voice strike tracking

Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.

* fix(tui): clean up voice stop failure path

Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.

* fix(tui): report busy voice capture starts

Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.

* fix(tui): handle busy voice record responses

Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.

* fix(tui): clear voice recording on null response

Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.

* fix(tui): count silent manual voice stops

Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.

---------

Co-authored-by: Montbra <montbra@gmail.com>
This commit is contained in:
brooklyn! 2026-05-06 15:49:59 -07:00 committed by GitHub
parent 5ccab51fa8
commit 04cf4788cc
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 527 additions and 57 deletions

View file

@ -5619,14 +5619,13 @@ def _(rid, params: dict) -> dict:
@method("voice.record")
def _(rid, params: dict) -> dict:
"""VAD-driven continuous record loop, CLI-parity.
"""VAD-bounded push-to-talk capture, CLI-parity.
``start`` turns on a VAD loop that emits ``voice.transcript`` events
for each detected utterance and auto-restarts for the next turn.
``stop`` halts the loop (manual stop; matches cli.py's Ctrl+B-while-
recording branch clearing ``_voice_continuous``). Three consecutive
silent cycles stop the loop automatically and emit a
``voice.transcript`` with ``no_speech_limit=True``.
``start`` begins one VAD-bounded capture and emits ``voice.transcript``
after silence stops the recorder. ``stop`` forces transcription of the
active buffer, matching classic CLI push-to-talk. The voice wrapper retains
no-speech counts across single-shot starts, so three consecutive silent
captures emit ``voice.transcript`` with ``no_speech_limit=True``.
"""
action = params.get("action", "start")
@ -5665,7 +5664,7 @@ def _(rid, params: dict) -> dict:
if isinstance(duration, (int, float)) and not isinstance(duration, bool)
else 3.0
)
start_continuous(
started = start_continuous(
on_transcript=lambda t: _voice_emit("voice.transcript", {"text": t}),
on_status=lambda s: _voice_emit("voice.status", {"state": s}),
on_silent_limit=lambda: _voice_emit(
@ -5673,13 +5672,19 @@ def _(rid, params: dict) -> dict:
),
silence_threshold=safe_threshold,
silence_duration=safe_duration,
auto_restart=False,
)
if started is False:
return _ok(rid, {"status": "busy"})
return _ok(rid, {"status": "recording"})
# action == "stop"
with _voice_sid_lock:
_voice_event_sid = params.get("session_id") or _voice_event_sid
from hermes_cli.voice import stop_continuous
stop_continuous()
stop_continuous(force_transcribe=True)
return _ok(rid, {"status": "stopped"})
except ImportError:
return _err(