You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+
## Development Commands
6
+
7
+
### Installation and Setup
8
+
```bash
9
+
# Install dependencies using uv
10
+
uv sync
11
+
12
+
# Run GUI application
13
+
uv run python main_gui.py
14
+
15
+
# Run console application
16
+
uv run python main_console.py
17
+
```
18
+
19
+
### Building and Packaging
20
+
```bash
21
+
# Build Windows executable (GUI version)
22
+
.\build.bat
23
+
24
+
# Manual build with PyInstaller
25
+
uv run pyinstaller push_to_talk.spec
26
+
27
+
# For console executable: modify push_to_talk.spec to use main_console.py and set console=True
28
+
```
29
+
30
+
### Code Quality
31
+
```bash
32
+
# Format code with ruff
33
+
uv run ruff format
34
+
35
+
# Lint code with ruff
36
+
uv run ruff check
37
+
38
+
# Fix linting issues automatically
39
+
uv run ruff check --fix
40
+
```
41
+
42
+
## Architecture Overview
43
+
44
+
This is a Windows push-to-talk speech-to-text application with dual interfaces (GUI and console) that uses OpenAI's API for transcription and text refinement.
45
+
46
+
### Core Components
47
+
-**PushToTalkApp** (`src/push_to_talk.py`): Main orchestrator with configuration management and dynamic updates
48
+
-**ConfigurationGUI** (`src/config_gui.py`): Persistent GUI interface with real-time status management
49
+
-**AudioRecorder** (`src/audio_recorder.py`): PyAudio-based recording with configurable audio settings
50
+
-**Transcriber** (`src/transcription.py`): OpenAI Whisper integration for speech-to-text
51
+
-**TextRefiner** (`src/text_refiner.py`): GPT-based text improvement and correction
52
+
-**TextInserter** (`src/text_inserter.py`): Windows text insertion via clipboard or sendkeys
53
+
-**HotkeyService** (`src/hotkey_service.py`): Global hotkey detection requiring admin privileges
54
+
55
+
### Entry Points
56
+
-**main_gui.py**: GUI application with persistent configuration interface
57
+
-**main_console.py**: Console-based application for command-line usage
58
+
-**Built executable**: `dist/PushToTalk.exe` (GUI version, no console window)
59
+
60
+
### Data Flow
61
+
1. User presses hotkey → Audio recording starts with optional audio feedback
62
+
2. User releases hotkey → Recording stops, audio saved to temp file
63
+
3. Audio sent to OpenAI Whisper for transcription
64
+
4. Raw text optionally refined using GPT models
65
+
5. Refined text inserted into active window via Windows API
66
+
67
+
### Configuration System
68
+
-**File-based**: `push_to_talk_config.json` for persistent settings
69
+
-**Environment**: `OPENAI_API_KEY` environment variable support
70
+
-**GUI**: Real-time configuration with validation and testing
71
+
-**Dynamic updates**: Application can update configuration without restart
72
+
73
+
## Key Technical Details
74
+
75
+
### Windows-Specific Requirements
76
+
-**Administrator privileges**: Required for global hotkey detection
77
+
-**pywin32**: Used for Windows text insertion and audio feedback
78
+
-**Audio permissions**: Microphone access required for recording
79
+
80
+
### Audio Processing
81
+
-**Sample rates**: 8kHz-44.1kHz supported, 16kHz recommended for Whisper
82
+
-**Formats**: WAV files for temporary audio storage
83
+
-**Feedback**: Optional audio cues using Windows winsound module
84
+
85
+
### Text Insertion Methods
86
+
-**sendkeys**: Character-by-character simulation, better for special characters
87
+
-**clipboard**: Faster method using Ctrl+V, may not work in all applications
88
+
89
+
### Configuration Parameters
90
+
Key settings in `PushToTalkConfig` class:
91
+
-`openai_api_key`: Required for transcription and refinement
92
+
-`stt_model`: "gpt-4o-transcribe" or "whisper-1"
93
+
-`refinement_model`: "gpt-4.1-nano", "gpt-4o-mini", or "gpt-4o"
0 commit comments