Decisions
- Which STT engine to use? (local vs cloud) · Phase 1 · ready
- Push-to-talk or continuous listening? · Phase 1 · ready
User Tasks
- Test voice input latency in real session · Phase 1 · ready
Summary
Enable speaking to the terminal instead of typing, using speech-to-text to convert voice into text input.
Problem / Motivation
Typing is slow for longer instructions and context. Voice input allows faster, more natural communication — especially useful when thinking out loud or multitasking.
Proposed Solution
Add a speech-to-text layer that captures microphone input and feeds transcribed text into the terminal as if typed. Could work as a standalone wrapper script or integrated into the workflow.
Open Questions
1. STT Engine
Question: Which speech-to-text engine to use?
| Option | Description |
|---|---|
| A) Whisper (local) | OpenAI’s Whisper running locally. Free, private, works offline. Needs GPU for real-time |
| B) Whisper API | OpenAI cloud API. Low latency, cheap, requires internet |
| C) Deepgram / AssemblyAI | Cloud STT services. Very fast, streaming support |
| D) Windows Speech Recognition | Built-in, zero setup. Lower accuracy |
Recommendation: Start with Whisper API for simplicity, with option to switch to local Whisper later.
Decision:
2. Activation Mode
Question: How to activate voice input?
| Option | Description |
|---|---|
| A) Push-to-talk hotkey | Hold a key to record, release to transcribe. Simple, no false triggers |
| B) Continuous listening | Always on with wake word. More natural but more complex |
| C) Toggle mode | Press once to start, press again to stop |
Recommendation: Option A — push-to-talk is simplest and most reliable.
Decision:
Phase Overview
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Basic push-to-talk with STT transcription | — |
| Phase 2 | Language detection (Dutch speech → English text option) | — |
| Phase 3 | Streaming / real-time transcription | — |
Phase 1: Basic Voice Input —
Goal: Record audio via hotkey, transcribe, and paste into terminal.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/voice/recorder.py | Microphone capture with push-to-talk | opus | — |
src/opus/voice/transcribe.py | STT engine wrapper (Whisper API initially) | opus | — |
src/opus/voice/cli.py | CLI entry point / hotkey listener | opus | — |
Phase 2: Language Handling —
Goal: Detect Dutch speech and optionally auto-translate to English for commands.
| File / Feature | Details | Owner | Status |
|---|---|---|---|
src/opus/voice/translate.py | Optional Dutch→English translation layer | opus | — |
Test
Manual tests
| Test | Expected | Actual | Last |
|---|---|---|---|
| Record 5s audio and transcribe | Text output matches spoken words | pending | - |
| Push-to-talk hotkey works | Recording starts/stops cleanly | pending | - |
AI-verified tests
| Scenario | Expected behavior | Verification method |
|---|---|---|
| … | … | … |
E2E tests
| Scenario | Assertion |
|---|---|
| … | … |
Integration tests
| Component | Coverage |
|---|---|
| … | … |
Unit tests
| Component | Tests | Coverage |
|---|---|---|
| … | … | … |
History
| Date | Event | Details |
|---|---|---|
| 2026-03-16 | Created | User request for voice input in terminal |