Decisions

  • Which STT engine to use? (local vs cloud) · Phase 1 · ready
  • Push-to-talk or continuous listening? · Phase 1 · ready

User Tasks

  • Test voice input latency in real session · Phase 1 · ready

Summary

Enable speaking to the terminal instead of typing, using speech-to-text to convert voice into text input.

Problem / Motivation

Typing is slow for longer instructions and context. Voice input allows faster, more natural communication — especially useful when thinking out loud or multitasking.

Proposed Solution

Add a speech-to-text layer that captures microphone input and feeds transcribed text into the terminal as if typed. Could work as a standalone wrapper script or integrated into the workflow.


Open Questions

1. STT Engine

Question: Which speech-to-text engine to use?

OptionDescription
A) Whisper (local)OpenAI’s Whisper running locally. Free, private, works offline. Needs GPU for real-time
B) Whisper APIOpenAI cloud API. Low latency, cheap, requires internet
C) Deepgram / AssemblyAICloud STT services. Very fast, streaming support
D) Windows Speech RecognitionBuilt-in, zero setup. Lower accuracy

Recommendation: Start with Whisper API for simplicity, with option to switch to local Whisper later.

Decision:

2. Activation Mode

Question: How to activate voice input?

OptionDescription
A) Push-to-talk hotkeyHold a key to record, release to transcribe. Simple, no false triggers
B) Continuous listeningAlways on with wake word. More natural but more complex
C) Toggle modePress once to start, press again to stop

Recommendation: Option A — push-to-talk is simplest and most reliable.

Decision:


Phase Overview

PhaseDescriptionStatus
Phase 1Basic push-to-talk with STT transcription
Phase 2Language detection (Dutch speech → English text option)
Phase 3Streaming / real-time transcription

Phase 1: Basic Voice Input —

Goal: Record audio via hotkey, transcribe, and paste into terminal.

File / FeatureDetailsOwnerStatus
src/opus/voice/recorder.pyMicrophone capture with push-to-talkopus
src/opus/voice/transcribe.pySTT engine wrapper (Whisper API initially)opus
src/opus/voice/cli.pyCLI entry point / hotkey listeneropus

Phase 2: Language Handling —

Goal: Detect Dutch speech and optionally auto-translate to English for commands.

File / FeatureDetailsOwnerStatus
src/opus/voice/translate.pyOptional Dutch→English translation layeropus

Test

Manual tests

TestExpectedActualLast
Record 5s audio and transcribeText output matches spoken wordspending-
Push-to-talk hotkey worksRecording starts/stops cleanlypending-

AI-verified tests

ScenarioExpected behaviorVerification method

E2E tests

ScenarioAssertion

Integration tests

ComponentCoverage

Unit tests

ComponentTestsCoverage

History

DateEventDetails
2026-03-16CreatedUser request for voice input in terminal

References