A standalone Text-to-Speech application using the Orpheus TTS model with a modern Gradio interface.
- 🎧 High-quality Text-to-Speech using the Orpheus TTS model
- 💻 Completely standalone - no external services or API keys needed
- 🔊 Multiple voice options (tara, leah, jess, leo, dan, mia, zac, zoe)
- 💾 Save audio to WAV files
- 🎨 Modern Gradio web interface
- 🔧 Adjustable generation parameters (temperature, top_p, repetition penalty)
- 😊 Emotive speech generation with natural expressions
Listen to a sample of the generated speech: Sample Audio
- Install Python 3.8 or higher
- Install dependencies:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
- Run the application:
python gradio_orpheus.py
The application will automatically:
- Download the Orpheus TTS model on first run
- Download and initialize the SNAC audio codec
- Start the Gradio web interface
- Open your web browser and navigate to the URL shown in the terminal (usually http://127.0.0.1:7860)
- Enter the text you want to convert to speech
- Select a voice from the dropdown menu
- Adjust generation parameters if desired:
- Temperature: Controls randomness (0.0-1.0)
- Top P: Controls diversity (0.0-1.0)
- Repetition Penalty: Controls repetition (1.0-2.0)
- Click "Generate Speech" to create the audio
- Play the generated audio directly in the browser or download it
Welcome to our presentation. Today, we'll be discussing the latest developments in artificial intelligence and machine learning.
<giggle>Oh, that's hilarious!</giggle> I can't believe what just happened. <laugh>This is the funniest thing I've seen all day!</laugh>
<sigh>But seriously though,</sigh> we need to focus on the task at hand. <gasp>Look at what we've accomplished!</gasp>
- tara - Best overall voice for general use (default)
- leah
- jess
- leo
- dan
- mia
- zac
- zoe
You can add emotion to the speech by adding the following tags:
<giggle>
<laugh>
<chuckle>
<sigh>
<cough>
<sniffle>
<groan>
<yawn>
<gasp>
This implementation:
- Uses
llama-cpp-python
to run the Orpheus model locally - Uses the SNAC neural audio codec for high-quality audio generation
- Processes tokens in chunks of 28 for optimal audio quality
- Supports both CPU and GPU (CUDA/MPS) acceleration
- Python 3.8 or higher
- 8GB RAM minimum (16GB recommended)
- CUDA-capable GPU (optional, for faster generation)
- See
requirements.txt
for Python package dependencies
Apache 2.0