Voice-to-Keyboard Input System

A Python-based voice input system for Ubuntu that uses OpenAI's Whisper for offline speech-to-text transcription. The system runs continuously in the background and can be triggered with a keyboard shortcut.

This project has been AI-generated with Claude Code in a vibe coding approach. Human review is pending (07/2025), use at your own risk.

Features

Offline Speech Recognition: Uses OpenAI Whisper for local transcription
Keyboard Shortcut Control: Press a hotkey to start/stop recording
Background Operation: Runs as a system service/daemon. For now stick to running it a script, when the script is terminated transcription is done. The system has no user interface.
Cross-Application: Works in any text input field. You need to send keyboard events using a reliable library.
Configurable: Adjustable Whisper model size and recording settings

How It Works

The program runs continuously in the background
Press the configured hotkey (default: Ctrl+Shift+<space>) to start recording
Speak your message while the recording indicator is active
Press the hotkey again to stop recording and trigger transcription
The transcribed text is automatically typed at the current cursor position

Installation

Prerequisites

Install Python 3

This package requires PortAudio:

sudo apt-get update
# https://www.portaudio.com/
sudo apt-get install portaudio19-dev

The recorder will output OSError: PortAudio library not found if you missed this step.

Alternatives: pyalsaaudio for ALSA doesn't seem to have been maintained lately.

Download and Setup

# Clone or download the project
git clone <repository-url>
cd mic-to-keyboard

# Install Python requirements
pip install -r requirements.txt

# Make the main script executable
chmod +x voice_input.py

Configuration

Whisper Model Selection

The system supports different Whisper model sizes. Choose based on your hardware capabilities:

tiny: Fastest, least accurate (~39 MB)
base: Good balance (~74 MB) - Recommended
small: Better accuracy (~244 MB)
medium: High accuracy (~769 MB)
large: Best accuracy (~1550 MB)

Keyboard Shortcut

Default hotkey is Ctrl+Shift+V. You can modify this in the configuration section of voice_input.py.

Audio Settings

Default recording settings:

Sample Rate: 16kHz
Channels: Mono
Format: 32-bit float
Max recording duration: 30 seconds

Usage

Running the Program

# Run in foreground (for testing)
python3 voice_input.py

# Run in background
python3 voice_input.py &

# Run as systemd service (recommended for permanent use)
sudo systemctl start voice-input
sudo systemctl enable voice-input  # Auto-start on boot

Using Voice Input

Start Recording: Press Ctrl+Shift+V
- You'll see a brief notification or hear a beep
- Speak clearly into your microphone
Stop Recording & Transcribe: Press Ctrl+Shift+V again
- Recording stops immediately
- Transcription begins (may take 1-3 seconds)
- Text is automatically typed at cursor position

Example Workflow

1. Open any text editor (gedit, LibreOffice, browser, etc.)
2. Position cursor where you want text
3. Press Ctrl+Shift+V
4. Say: "Hello, this is a test of the voice input system."
5. Press Ctrl+Shift+V again
6. Watch as the text appears: "Hello, this is a test of the voice input system."

Performance Tips

For Better Accuracy

Speak clearly and at normal pace
Use a good quality microphone
Minimize background noise
Keep recordings under 15 seconds for faster processing

For Better Performance

Use tiny or base model for faster transcription
Ensure sufficient RAM (2GB+ recommended for larger models)
Close unnecessary applications during heavy usage

System Integration

Auto-Start on Login

Create a desktop entry:

# Create autostart directory if it doesn't exist
mkdir -p ~/.config/autostart

# Create desktop entry
cat > ~/.config/autostart/voice-input.desktop << EOF
[Desktop Entry]
Type=Application
Name=Voice Input System
Exec=python3 /home/$USER/code/mic-to-keyboard/voice_input.py
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
EOF

System Service (Advanced)

For system-wide availability, create a systemd service:

# Create service file
sudo tee /etc/systemd/system/voice-input.service << EOF
[Unit]
Description=Voice Input System
After=sound.target

[Service]
Type=simple
User=$USER
Environment=DISPLAY=:0
ExecStart=/usr/bin/python3 /home/$USER/code/mic-to-keyboard/voice_input.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable voice-input.service
sudo systemctl start voice-input.service

Troubleshooting

Common Issues

Microphone Not Detected

# Check available audio devices
python3 -c "import pyaudio; p = pyaudio.PyAudio(); [print(f'{i}: {p.get_device_info_by_index(i)}') for i in range(p.get_device_count())]"

Permission Denied for Audio

# Add user to audio group
sudo usermod -a -G audio $USER
# Log out and back in

Keyboard Input Not Working

Ensure the program has proper permissions for input simulation
Some applications may block programmatic keyboard input

High CPU Usage

Try using a smaller Whisper model (tiny or base)
Reduce maximum recording duration
Check for background processes consuming resources

Debug Mode

Run with debug output:

python3 voice_input.py --debug

Advanced Configuration

Custom Hotkey

Edit voice_input.py and modify the hotkey configuration:

# Change this line to customize your hotkey
HOTKEY = {keyboard.Key.ctrl, keyboard.Key.shift, keyboard.KeyCode.from_char('v')}

Recording Duration

Adjust maximum recording time:

# In voice_input.py, modify this constant
MAX_RECORDING_DURATION = 30  # seconds

Whisper Model Path

Specify custom model location:

# Custom model path
MODEL_PATH = "/path/to/your/whisper/model"
model = whisper.load_model("base", download_root=MODEL_PATH)

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

OpenAI for the Whisper speech recognition model
PyAudio team for audio interface
pynput developers for keyboard/mouse control

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
.gitignore		.gitignore
PROMPT.md		PROMPT.md
README.md		README.md
audio_recorder.py		audio_recorder.py
dummy_audio_recorder.py		dummy_audio_recorder.py
file_manager.py		file_manager.py
hotkey_listener.py		hotkey_listener.py
keyboard_controller.py		keyboard_controller.py
requirements.txt		requirements.txt
test_audio_recorder.py		test_audio_recorder.py
transcription_handler.py		transcription_handler.py
voice_input.py		voice_input.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice-to-Keyboard Input System

Features

How It Works

Installation

Prerequisites

Download and Setup

Configuration

Whisper Model Selection

Keyboard Shortcut

Audio Settings

Usage

Running the Program

Using Voice Input

Example Workflow

Performance Tips

For Better Accuracy

For Better Performance

System Integration

Auto-Start on Login

System Service (Advanced)

Troubleshooting

Common Issues

Debug Mode

Advanced Configuration

Custom Hotkey

Recording Duration

Whisper Model Path

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

lbke/mic2key

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Keyboard Input System

Features

How It Works

Installation

Prerequisites

Download and Setup

Configuration

Whisper Model Selection

Keyboard Shortcut

Audio Settings

Usage

Running the Program

Using Voice Input

Example Workflow

Performance Tips

For Better Accuracy

For Better Performance

System Integration

Auto-Start on Login

System Service (Advanced)

Troubleshooting

Common Issues

Debug Mode

Advanced Configuration

Custom Hotkey

Recording Duration

Whisper Model Path

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages