A Windows desktop application that provides instant voice-to-text transcription using OpenAI's Whisper API.
- 🎤 One-click voice recording with
Ctrl + Space
hotkey - 📝 Real-time waveform visualization
- ⚡ Instant transcription
- 📋 Automatic clipboard copy
- 🔑 Global hotkey support
- 🎨 Modern, minimalist UI
- Visit OpenAI's website
- Create an account or sign in
- Go to API Keys section
- Click "Create new secret key"
- Copy your API key (keep it secure!)
- Create a file named
.env
in the application directory and add:OPENAI_API_KEY=your_api_key_here
- Python 3.8 or higher (Download Python)
- Windows 10 or higher
- Download the latest release from the Releases page
- Extract the ZIP file to your desired location
- Create the
.env
file with your OpenAI API key (as shown above) - Double-click
Windows Whisper.exe
to start
-
Clone the repository:
git clone https://github.com/yourusername/windows-whisper.git cd windows-whisper
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
python main.py
-
Start Recording
- Press
Ctrl + Space
from anywhere - Or click the system tray icon and select "Start Recording"
- Press
-
During Recording
- Speak clearly into your microphone
- Watch the real-time waveform visualization
- Press Space or click "Done" when finished
- Click "×" or press Escape to cancel
-
After Recording
- The text will be automatically transcribed
- Transcribed text is copied to your clipboard
- Click "Record Again" for another recording
- Or close the window to finish
-
API Key Issues
- Ensure your
.env
file is in the correct location - Check if the API key is valid
- Verify you have sufficient OpenAI credits
- Ensure your
-
Audio Recording Issues
- Check if your microphone is set as the default recording device
- Ensure no other application is using the microphone
- Try restarting the application
-
Transcription Language Issues
- By default, the app uses English ("en") for transcription
- If you're getting transcriptions in the wrong language, add
WHISPER_LANGUAGE=en
to your.env
file - For other languages, use the appropriate language code (e.g., "fr" for French, "de" for German)
- If translations occur regardless of setting, try adding a more specific prompt in your
.env
file:WHISPER_PROMPT="Transcribe exactly as spoken. Do not translate."
-
Application Won't Start
- Verify all dependencies are installed
- Check if Python is in your system PATH
- Run from command line to see error messages
No module named 'xyz'
: Runpip install -r requirements.txt
againAPI key not found
: Check your.env
file setupPortAudio error
: Restart your computer or check audio devices
Edit config.py
or add to your .env
file to modify:
- Default hotkey combination (
SHORTCUT_KEY
) - Audio recording parameters (
SAMPLE_RATE
,MAX_RECORDING_SECONDS
) - Language settings (
WHISPER_LANGUAGE
) - UI appearance settings (
UI_THEME
,UI_OPACITY
) - Temporary file locations
Minimum:
- Windows 10 (64-bit)
- 4GB RAM
- Python 3.8+
- Microphone
- Internet connection
Recommended:
- Windows 10/11 (64-bit)
- 8GB RAM
- Python 3.10+
- High-quality microphone
- Stable internet connection
-
API Key Security
- Never share your API key
- Don't commit the
.env
file to version control - Regularly rotate your API key
- Set usage limits in OpenAI dashboard
-
Data Privacy
- Audio is processed locally before sending to OpenAI
- Only the audio data is sent, no personal information
- Transcribed text is stored only in clipboard
- No data is permanently stored
- Check the GitHub repository for updates
- Submit issues for bugs or feature requests
- Join our community discussions
This project is licensed under the MIT License - a permissive open source license that allows for:
- ✅ Commercial use
- ✅ Modification
- ✅ Distribution
- ✅ Private use
Key points of the MIT License:
- You can freely use, modify, and distribute this software
- You must include the original copyright notice and license
- The software comes with no warranties
- The authors are not liable for any damages
See the LICENSE file for the full license text.
This project was developed with the assistance of:
-
AI Development Support:
- Cursor IDE's AI pair programming features
- Anthropic's Claude (3.5/3.7 Sonnet) for code generation and problem-solving
-
Core Technologies:
- OpenAI Whisper API - Speech-to-text engine
- PyQt5 - UI framework
- PyAudio - Audio recording
- NumPy - Audio processing
- python-dotenv - Environment management
Contributions are welcome! Please feel free to submit issues or pull requests. When contributing, please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
All contributions will be released under the MIT License.