mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-24 05:41:40 +00:00
feat(tui): match CLI's voice slash + VAD-continuous recording model
The TUI had drifted from the CLI's voice model in two ways:
- /voice on was lighting up the microphone immediately and Ctrl+B was
interpreted as a mode toggle. The CLI separates the two: /voice on
just flips the umbrella bit, recording only starts once the user
presses Ctrl+B, which also sets _voice_continuous so the VAD loop
auto-restarts until the user presses Ctrl+B again or three silent
cycles pass.
- /voice tts was missing entirely, so users couldn't turn agent reply
speech on/off from inside the TUI.
This commit brings the TUI to parity.
Python
- hermes_cli/voice.py: continuous-mode API (start_continuous,
stop_continuous, is_continuous_active) layered on the existing PTT
wrappers. The silence callback transcribes, fires on_transcript,
tracks consecutive no-speech cycles, and auto-restarts — mirroring
cli.py:_voice_stop_and_transcribe + _restart_recording.
- tui_gateway/server.py:
- voice.toggle now supports on / off / tts / status. The umbrella
bit lives in HERMES_VOICE + display.voice_enabled; tts lives in
HERMES_VOICE_TTS + display.voice_tts. /voice off also tears down
any active continuous loop so a toggle-off really releases the
microphone.
- voice.record start/stop now drives start_continuous/stop_continuous.
start is refused with a clear error when the mode is off, matching
cli.py:handle_voice_record's early return on `not _voice_mode`.
- New voice.transcript / voice.status events emit through
_voice_emit (remembers the sid that last enabled the mode so
events land in the right session).
TypeScript
- gatewayTypes.ts: voice.status + voice.transcript event
discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse
gains status for the new "started/stopped" responses.
- interfaces.ts: GatewayEventHandlerContext gains composer.setInput +
submission.submitRef + voice.{setRecording, setProcessing,
setVoiceEnabled}; InputHandlerContext.voice gains enabled +
setVoiceEnabled for the mode-aware Ctrl+B handler.
- createGatewayEventHandler.ts: voice.status drives REC/STT badges;
voice.transcript auto-submits when the composer is empty (CLI
_pending_input.put parity) and appends when a draft is in flight.
no_speech_limit flips voice off + sys line.
- useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop),
not voice.toggle, and nudges the user with a sys line when the
mode is off instead of silently flipping it on.
- useMainApp.ts: wires the new event-handler context fields.
- slash/commands/session.ts: /voice handles on / off / tts / status
with CLI-matching output ("voice: mode on · tts off").
Backward compat preserved for voice.record (was always PTT shape;
gateway still honours start/stop with mode-gating added).
This commit is contained in:
parent
0bb460b070
commit
04c489b587
10 changed files with 861 additions and 78 deletions
|
|
@ -134,45 +134,43 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
|
|||
}
|
||||
}
|
||||
|
||||
const voiceStop = () => {
|
||||
voice.setRecording(false)
|
||||
voice.setProcessing(true)
|
||||
// CLI parity: Ctrl+B toggles the VAD-driven continuous recording loop
|
||||
// (NOT the voice-mode umbrella bit). The mode is enabled via /voice on;
|
||||
// Ctrl+B while the mode is off sys-nudges the user. While the mode is
|
||||
// on, the first press starts a continuous loop (gateway → start_continuous,
|
||||
// VAD auto-stop → transcribe → auto-restart), a subsequent press stops it.
|
||||
// The gateway publishes voice.status + voice.transcript events that
|
||||
// createGatewayEventHandler turns into UI badges and composer injection.
|
||||
const voiceRecordToggle = () => {
|
||||
if (!voice.enabled) {
|
||||
return actions.sys('voice: mode is off — enable with /voice on')
|
||||
}
|
||||
|
||||
const starting = !voice.recording
|
||||
const action = starting ? 'start' : 'stop'
|
||||
|
||||
// Optimistic UI — flip the REC badge immediately so the user gets
|
||||
// feedback while the RPC round-trips; the voice.status event is the
|
||||
// authoritative source and may correct us.
|
||||
if (starting) {
|
||||
voice.setRecording(true)
|
||||
} else {
|
||||
voice.setRecording(false)
|
||||
voice.setProcessing(false)
|
||||
}
|
||||
|
||||
gateway
|
||||
.rpc<VoiceRecordResponse>('voice.record', { action: 'stop' })
|
||||
.then(r => {
|
||||
if (!r) {
|
||||
return
|
||||
.rpc<VoiceRecordResponse>('voice.record', { action })
|
||||
.catch((e: Error) => {
|
||||
// Revert optimistic UI on failure.
|
||||
if (starting) {
|
||||
voice.setRecording(false)
|
||||
}
|
||||
|
||||
const transcript = String(r.text || '').trim()
|
||||
|
||||
if (!transcript) {
|
||||
return actions.sys('voice: no speech detected')
|
||||
}
|
||||
|
||||
cActions.setInput(prev => (prev ? `${prev}${/\s$/.test(prev) ? '' : ' '}${transcript}` : transcript))
|
||||
})
|
||||
.catch((e: Error) => actions.sys(`voice error: ${e.message}`))
|
||||
.finally(() => {
|
||||
voice.setProcessing(false)
|
||||
patchUiState({ status: 'ready' })
|
||||
actions.sys(`voice error: ${e.message}`)
|
||||
})
|
||||
}
|
||||
|
||||
const voiceStart = () =>
|
||||
gateway
|
||||
.rpc<VoiceRecordResponse>('voice.record', { action: 'start' })
|
||||
.then(r => {
|
||||
if (!r) {
|
||||
return
|
||||
}
|
||||
|
||||
voice.setRecording(true)
|
||||
patchUiState({ status: 'recording…' })
|
||||
})
|
||||
.catch((e: Error) => actions.sys(`voice error: ${e.message}`))
|
||||
|
||||
useInput((ch, key) => {
|
||||
const live = getUiState()
|
||||
|
||||
|
|
@ -371,7 +369,7 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
|
|||
}
|
||||
|
||||
if (isVoiceToggleKey(key, ch)) {
|
||||
return voice.recording ? voiceStop() : voiceStart()
|
||||
return voiceRecordToggle()
|
||||
}
|
||||
|
||||
if (isAction(key, ch, 'g')) {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue