Voice Input for Terminal

Decisions

Which STT engine to use? (local vs cloud) · Phase 1 · ready
Push-to-talk or continuous listening? · Phase 1 · ready

User Tasks

Test voice input latency in real session · Phase 1 · ready

Summary

Enable speaking to the terminal instead of typing, using speech-to-text to convert voice into text input.

Problem / Motivation

Typing is slow for longer instructions and context. Voice input allows faster, more natural communication — especially useful when thinking out loud or multitasking.

Proposed Solution

Add a speech-to-text layer that captures microphone input and feeds transcribed text into the terminal as if typed. Could work as a standalone wrapper script or integrated into the workflow.

Open Questions

1. STT Engine

Question: Which speech-to-text engine to use?

Option	Description
A) Whisper (local)	OpenAI’s Whisper running locally. Free, private, works offline. Needs GPU for real-time
B) Whisper API	OpenAI cloud API. Low latency, cheap, requires internet
C) Deepgram / AssemblyAI	Cloud STT services. Very fast, streaming support
D) Windows Speech Recognition	Built-in, zero setup. Lower accuracy

Recommendation: Start with Whisper API for simplicity, with option to switch to local Whisper later.

Decision:

2. Activation Mode

Question: How to activate voice input?

Option	Description
A) Push-to-talk hotkey	Hold a key to record, release to transcribe. Simple, no false triggers
B) Continuous listening	Always on with wake word. More natural but more complex
C) Toggle mode	Press once to start, press again to stop

Recommendation: Option A — push-to-talk is simplest and most reliable.

Decision:

Phase Overview

Phase	Description	Status
Phase 1	Basic push-to-talk with STT transcription	—
Phase 2	Language detection (Dutch speech → English text option)	—
Phase 3	Streaming / real-time transcription	—

Phase 1: Basic Voice Input —

Goal: Record audio via hotkey, transcribe, and paste into terminal.

File / Feature	Details	Owner	Status
`src/opus/voice/recorder.py`	Microphone capture with push-to-talk	opus	—
`src/opus/voice/transcribe.py`	STT engine wrapper (Whisper API initially)	opus	—
`src/opus/voice/cli.py`	CLI entry point / hotkey listener	opus	—

Phase 2: Language Handling —

Goal: Detect Dutch speech and optionally auto-translate to English for commands.

File / Feature	Details	Owner	Status
`src/opus/voice/translate.py`	Optional Dutch→English translation layer	opus	—

Test

Manual tests

Test	Expected	Actual	Last
Record 5s audio and transcribe	Text output matches spoken words	pending	-
Push-to-talk hotkey works	Recording starts/stops cleanly	pending	-

AI-verified tests

Scenario	Expected behavior	Verification method
…	…	…

E2E tests

Scenario	Assertion
…	…

Integration tests

Component	Coverage
…	…

Unit tests

Component	Tests	Coverage
…	…	…

History

Date	Event	Details
2026-03-16	Created	User request for voice input in terminal

References

OpenAI Whisper

Opus Vault

Explorer