Skip to content

v0.3.0

Latest
Compare
Choose a tag to compare
@yixin0829 yixin0829 released this 02 Aug 16:54
· 6 commits to main since this release

PushToTalk v0.3.0 - Enhanced Threading Architecture & Streamlined Experience

What's New

Enhanced Threading Architecture

  • Added comprehensive threading documentation with detailed Mermaid sequence diagram showing multi-threaded operation
  • Improved thread safety with better threading.Lock() usage for concurrent operations
  • Implemented non-blocking audio processing using daemon threads to prevent UI freezing
  • Added parallel audio feedback for immediate user response during recording operations

Streamlined User Experience

  • Simplified entry point: Consolidated to single main.py for cleaner project structure
  • Removed deprecated files: Cleaned up old console-specific files and examples
  • Updated documentation: Comprehensive README updates with better organization and clearer instructions
  • Enhanced project structure: More intuitive file organization for easier development and deployment

Technical Improvements

  • Cross-platform audio feedback: Migrated from Windows-specific winsound to pygame for universal compatibility
  • Smart audio processing: Advanced silence removal and pitch-preserving speed adjustment for faster transcription
  • Better configuration management: Improved GUI settings persistence and validation
  • Enhanced logging: More detailed logging with better file-only logging for GUI mode

Key Features

  • Push-to-Talk & Toggle Recording with customizable hotkeys
  • OpenAI Whisper Integration for accurate speech-to-text
  • AI Text Refinement using GPT models
  • Auto Text Insertion with multiple methods (clipboard/sendkeys)
  • Cross-platform Audio Feedback with clean start/stop cues
  • Smart Audio Processing for faster transcription
  • Persistent GUI Interface with real-time status monitoring

Security Notice

Windows SmartScreen Warning

When first running PushToTalk.exe, Windows SmartScreen may display a warning because the executable is not digitally signed. This is normal for open-source applications.

To proceed safely:

  1. Click "More info" when the SmartScreen dialog appears
  2. Click "Run anyway" to launch the application

The application is safe and contains no malicious code - this warning appears only because the executable lacks a commercial code signing certificate.

Installation

  1. Download PushToTalk.zip from the assets below
  2. Extract the ZIP file to your preferred location
  3. Run PushToTalk.exe (click "Run anyway" if SmartScreen warning appears)
  4. Configure your OpenAI API key and preferences through the GUI

Upgrade from v0.2.0

  • Your existing push_to_talk_config.json will be automatically migrated
  • All settings and preferences are preserved
  • The new streamlined interface provides the same functionality with improved performance

Full documentation and source code available at: https://github.com/yixin0829/push-to-talk

Minimum Requirements: Windows 10+, Microphone access, OpenAI API key