A Python-based voice input system for Ubuntu that uses OpenAI's Whisper for offline speech-to-text transcription. The system runs continuously in the background and can be triggered with a keyboard shortcut.
This project has been AI-generated with Claude Code in a vibe coding approach. Human review is pending (07/2025), use at your own risk.
- Offline Speech Recognition: Uses OpenAI Whisper for local transcription
- Keyboard Shortcut Control: Press a hotkey to start/stop recording
- Background Operation: Runs as a system service/daemon. For now stick to running it a script, when the script is terminated transcription is done. The system has no user interface.
- Cross-Application: Works in any text input field. You need to send keyboard events using a reliable library.
- Configurable: Adjustable Whisper model size and recording settings
- The program runs continuously in the background
- Press the configured hotkey (default:
Ctrl+Shift+<space>
) to start recording - Speak your message while the recording indicator is active
- Press the hotkey again to stop recording and trigger transcription
- The transcribed text is automatically typed at the current cursor position
Install Python 3
This package requires PortAudio
:
sudo apt-get update
# https://www.portaudio.com/
sudo apt-get install portaudio19-dev
The recorder will output OSError: PortAudio library not found
if you missed this step.
Alternatives: pyalsaaudio
for ALSA doesn't seem to have been maintained lately.
# Clone or download the project
git clone <repository-url>
cd mic-to-keyboard
# Install Python requirements
pip install -r requirements.txt
# Make the main script executable
chmod +x voice_input.py
The system supports different Whisper model sizes. Choose based on your hardware capabilities:
tiny
: Fastest, least accurate (~39 MB)base
: Good balance (~74 MB) - Recommendedsmall
: Better accuracy (~244 MB)medium
: High accuracy (~769 MB)large
: Best accuracy (~1550 MB)
Default hotkey is Ctrl+Shift+V
. You can modify this in the configuration section of voice_input.py
.
Default recording settings:
- Sample Rate: 16kHz
- Channels: Mono
- Format: 32-bit float
- Max recording duration: 30 seconds
# Run in foreground (for testing)
python3 voice_input.py
# Run in background
python3 voice_input.py &
# Run as systemd service (recommended for permanent use)
sudo systemctl start voice-input
sudo systemctl enable voice-input # Auto-start on boot
-
Start Recording: Press
Ctrl+Shift+V
- You'll see a brief notification or hear a beep
- Speak clearly into your microphone
-
Stop Recording & Transcribe: Press
Ctrl+Shift+V
again- Recording stops immediately
- Transcription begins (may take 1-3 seconds)
- Text is automatically typed at cursor position
1. Open any text editor (gedit, LibreOffice, browser, etc.)
2. Position cursor where you want text
3. Press Ctrl+Shift+V
4. Say: "Hello, this is a test of the voice input system."
5. Press Ctrl+Shift+V again
6. Watch as the text appears: "Hello, this is a test of the voice input system."
- Speak clearly and at normal pace
- Use a good quality microphone
- Minimize background noise
- Keep recordings under 15 seconds for faster processing
- Use
tiny
orbase
model for faster transcription - Ensure sufficient RAM (2GB+ recommended for larger models)
- Close unnecessary applications during heavy usage
Create a desktop entry:
# Create autostart directory if it doesn't exist
mkdir -p ~/.config/autostart
# Create desktop entry
cat > ~/.config/autostart/voice-input.desktop << EOF
[Desktop Entry]
Type=Application
Name=Voice Input System
Exec=python3 /home/$USER/code/mic-to-keyboard/voice_input.py
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
EOF
For system-wide availability, create a systemd service:
# Create service file
sudo tee /etc/systemd/system/voice-input.service << EOF
[Unit]
Description=Voice Input System
After=sound.target
[Service]
Type=simple
User=$USER
Environment=DISPLAY=:0
ExecStart=/usr/bin/python3 /home/$USER/code/mic-to-keyboard/voice_input.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable voice-input.service
sudo systemctl start voice-input.service
Microphone Not Detected
# Check available audio devices
python3 -c "import pyaudio; p = pyaudio.PyAudio(); [print(f'{i}: {p.get_device_info_by_index(i)}') for i in range(p.get_device_count())]"
Permission Denied for Audio
# Add user to audio group
sudo usermod -a -G audio $USER
# Log out and back in
Keyboard Input Not Working
- Ensure the program has proper permissions for input simulation
- Some applications may block programmatic keyboard input
High CPU Usage
- Try using a smaller Whisper model (
tiny
orbase
) - Reduce maximum recording duration
- Check for background processes consuming resources
Run with debug output:
python3 voice_input.py --debug
Edit voice_input.py
and modify the hotkey configuration:
# Change this line to customize your hotkey
HOTKEY = {keyboard.Key.ctrl, keyboard.Key.shift, keyboard.KeyCode.from_char('v')}
Adjust maximum recording time:
# In voice_input.py, modify this constant
MAX_RECORDING_DURATION = 30 # seconds
Specify custom model location:
# Custom model path
MODEL_PATH = "/path/to/your/whisper/model"
model = whisper.load_model("base", download_root=MODEL_PATH)
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details
- OpenAI for the Whisper speech recognition model
- PyAudio team for audio interface
- pynput developers for keyboard/mouse control