feat(tui): match CLI's voice slash + VAD-continuous recording model

The TUI had drifted from the CLI's voice model in two ways:

- /voice on was lighting up the microphone immediately and Ctrl+B was
  interpreted as a mode toggle.  The CLI separates the two: /voice on
  just flips the umbrella bit, recording only starts once the user
  presses Ctrl+B, which also sets _voice_continuous so the VAD loop
  auto-restarts until the user presses Ctrl+B again or three silent
  cycles pass.
- /voice tts was missing entirely, so users couldn't turn agent reply
  speech on/off from inside the TUI.

This commit brings the TUI to parity.

Python

- hermes_cli/voice.py: continuous-mode API (start_continuous,
  stop_continuous, is_continuous_active) layered on the existing PTT
  wrappers. The silence callback transcribes, fires on_transcript,
  tracks consecutive no-speech cycles, and auto-restarts — mirroring
  cli.py:_voice_stop_and_transcribe + _restart_recording.
- tui_gateway/server.py:
  - voice.toggle now supports on / off / tts / status.  The umbrella
    bit lives in HERMES_VOICE + display.voice_enabled; tts lives in
    HERMES_VOICE_TTS + display.voice_tts.  /voice off also tears down
    any active continuous loop so a toggle-off really releases the
    microphone.
  - voice.record start/stop now drives start_continuous/stop_continuous.
    start is refused with a clear error when the mode is off, matching
    cli.py:handle_voice_record's early return on `not _voice_mode`.
  - New voice.transcript / voice.status events emit through
    _voice_emit (remembers the sid that last enabled the mode so
    events land in the right session).

TypeScript

- gatewayTypes.ts: voice.status + voice.transcript event
  discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse
  gains status for the new "started/stopped" responses.
- interfaces.ts: GatewayEventHandlerContext gains composer.setInput +
  submission.submitRef + voice.{setRecording, setProcessing,
  setVoiceEnabled}; InputHandlerContext.voice gains enabled +
  setVoiceEnabled for the mode-aware Ctrl+B handler.
- createGatewayEventHandler.ts: voice.status drives REC/STT badges;
  voice.transcript auto-submits when the composer is empty (CLI
  _pending_input.put parity) and appends when a draft is in flight.
  no_speech_limit flips voice off + sys line.
- useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop),
  not voice.toggle, and nudges the user with a sys line when the
  mode is off instead of silently flipping it on.
- useMainApp.ts: wires the new event-handler context fields.
- slash/commands/session.ts: /voice handles on / off / tts / status
  with CLI-matching output ("voice: mode on · tts off").

Backward compat preserved for voice.record (was always PTT shape;
gateway still honours start/stop with mode-gating added).
This commit is contained in:
0xbyt4 2026-04-24 00:55:17 +03:00 committed by Teknium
parent 0bb460b070
commit 04c489b587
10 changed files with 861 additions and 78 deletions

View file

@ -184,15 +184,64 @@ export const sessionCommands: SlashCommand[] = [
},
{
help: 'toggle voice input',
help: 'voice mode: [on|off|tts|status]',
name: 'voice',
run: (arg, ctx) => {
const action = arg === 'on' || arg === 'off' ? arg : 'status'
const normalized = (arg ?? '').trim().toLowerCase()
const action =
normalized === 'on' || normalized === 'off' || normalized === 'tts' || normalized === 'status'
? normalized
: 'status'
ctx.gateway.rpc<VoiceToggleResponse>('voice.toggle', { action }).then(
ctx.guarded<VoiceToggleResponse>(r => {
ctx.voice.setVoiceEnabled(!!r.enabled)
ctx.transcript.sys(`voice: ${r.enabled ? 'on — press Ctrl+B to record' : 'off'}`)
// Match CLI's _show_voice_status / _enable_voice_mode /
// _toggle_voice_tts output shape so users don't have to learn
// two vocabularies.
if (action === 'status') {
const mode = r.enabled ? 'ON' : 'OFF'
const tts = r.tts ? 'ON' : 'OFF'
ctx.transcript.sys('Voice Mode Status')
ctx.transcript.sys(` Mode: ${mode}`)
ctx.transcript.sys(` TTS: ${tts}`)
ctx.transcript.sys(' Record key: Ctrl+B')
// CLI's "Requirements:" block — surfaces STT/audio setup issues
// so the user sees "STT provider: MISSING ..." instead of
// silently failing on every Ctrl+B press.
if (r.details) {
ctx.transcript.sys('')
ctx.transcript.sys(' Requirements:')
for (const line of r.details.split('\n')) {
if (line.trim()) {
ctx.transcript.sys(` ${line}`)
}
}
}
return
}
if (action === 'tts') {
ctx.transcript.sys(`Voice TTS ${r.tts ? 'enabled' : 'disabled'}.`)
return
}
// on/off — mirror cli.py:_enable_voice_mode's 3-line output
if (r.enabled) {
const tts = r.tts ? ' (TTS enabled)' : ''
ctx.transcript.sys(`Voice mode enabled${tts}`)
ctx.transcript.sys(' Ctrl+B to start/stop recording')
ctx.transcript.sys(' /voice tts to toggle speech output')
ctx.transcript.sys(' /voice off to disable voice mode')
} else {
ctx.transcript.sys('Voice mode disabled.')
}
})
)
}