Skip to content

Commit c09d469

Browse files
authored
Merge pull request #3 from yixin0829/codex/diagnose-nonetype-application-thread-error
Fix stop app race condition
2 parents 77ab5b1 + ddc3865 commit c09d469

File tree

6 files changed

+229
-113
lines changed

6 files changed

+229
-113
lines changed

CLAUDE.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Development Commands
6+
7+
### Installation and Setup
8+
```bash
9+
# Install dependencies using uv
10+
uv sync
11+
12+
# Run GUI application
13+
uv run python main_gui.py
14+
15+
# Run console application
16+
uv run python main_console.py
17+
```
18+
19+
### Building and Packaging
20+
```bash
21+
# Build Windows executable (GUI version)
22+
.\build.bat
23+
24+
# Manual build with PyInstaller
25+
uv run pyinstaller push_to_talk.spec
26+
27+
# For console executable: modify push_to_talk.spec to use main_console.py and set console=True
28+
```
29+
30+
### Code Quality
31+
```bash
32+
# Format code with ruff
33+
uv run ruff format
34+
35+
# Lint code with ruff
36+
uv run ruff check
37+
38+
# Fix linting issues automatically
39+
uv run ruff check --fix
40+
```
41+
42+
## Architecture Overview
43+
44+
This is a Windows push-to-talk speech-to-text application with dual interfaces (GUI and console) that uses OpenAI's API for transcription and text refinement.
45+
46+
### Core Components
47+
- **PushToTalkApp** (`src/push_to_talk.py`): Main orchestrator with configuration management and dynamic updates
48+
- **ConfigurationGUI** (`src/config_gui.py`): Persistent GUI interface with real-time status management
49+
- **AudioRecorder** (`src/audio_recorder.py`): PyAudio-based recording with configurable audio settings
50+
- **Transcriber** (`src/transcription.py`): OpenAI Whisper integration for speech-to-text
51+
- **TextRefiner** (`src/text_refiner.py`): GPT-based text improvement and correction
52+
- **TextInserter** (`src/text_inserter.py`): Windows text insertion via clipboard or sendkeys
53+
- **HotkeyService** (`src/hotkey_service.py`): Global hotkey detection requiring admin privileges
54+
55+
### Entry Points
56+
- **main_gui.py**: GUI application with persistent configuration interface
57+
- **main_console.py**: Console-based application for command-line usage
58+
- **Built executable**: `dist/PushToTalk.exe` (GUI version, no console window)
59+
60+
### Data Flow
61+
1. User presses hotkey → Audio recording starts with optional audio feedback
62+
2. User releases hotkey → Recording stops, audio saved to temp file
63+
3. Audio sent to OpenAI Whisper for transcription
64+
4. Raw text optionally refined using GPT models
65+
5. Refined text inserted into active window via Windows API
66+
67+
### Configuration System
68+
- **File-based**: `push_to_talk_config.json` for persistent settings
69+
- **Environment**: `OPENAI_API_KEY` environment variable support
70+
- **GUI**: Real-time configuration with validation and testing
71+
- **Dynamic updates**: Application can update configuration without restart
72+
73+
## Key Technical Details
74+
75+
### Windows-Specific Requirements
76+
- **Administrator privileges**: Required for global hotkey detection
77+
- **pywin32**: Used for Windows text insertion and audio feedback
78+
- **Audio permissions**: Microphone access required for recording
79+
80+
### Audio Processing
81+
- **Sample rates**: 8kHz-44.1kHz supported, 16kHz recommended for Whisper
82+
- **Formats**: WAV files for temporary audio storage
83+
- **Feedback**: Optional audio cues using Windows winsound module
84+
85+
### Text Insertion Methods
86+
- **sendkeys**: Character-by-character simulation, better for special characters
87+
- **clipboard**: Faster method using Ctrl+V, may not work in all applications
88+
89+
### Configuration Parameters
90+
Key settings in `PushToTalkConfig` class:
91+
- `openai_api_key`: Required for transcription and refinement
92+
- `stt_model`: "gpt-4o-transcribe" or "whisper-1"
93+
- `refinement_model`: "gpt-4.1-nano", "gpt-4o-mini", or "gpt-4o"
94+
- `hotkey`/`toggle_hotkey`: Customizable key combinations
95+
- `insertion_method`: "sendkeys" or "clipboard"
96+
- `enable_text_refinement`: Toggle GPT text improvement
97+
98+
## Development Workflow
99+
100+
### Making Changes
101+
1. Test changes with both GUI and console applications
102+
2. Ensure admin privileges are handled correctly for hotkey functionality
103+
3. Validate OpenAI API integration with proper error handling
104+
4. Test text insertion in various Windows applications
105+
106+
### Building for Distribution
107+
1. Use `build.bat` for standard GUI executable
108+
2. Modify `push_to_talk.spec` for console builds or customization
109+
3. Test executable on clean Windows system without Python installed
110+
4. Consider antivirus false positives with PyInstaller executables
111+
112+
### Configuration Testing
113+
- Use GUI "Test Configuration" button for API validation
114+
- Test hotkey combinations don't conflict with system shortcuts
115+
- Verify text insertion works in target applications (text editors, browsers, etc.)
116+
- Check audio settings produce clear recordings for transcription accuracy

build.bat

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
echo Building PushToTalk GUI Windows Executable...
33
echo.
44

5-
REM Clean previous builds
6-
if exist "dist" rmdir /s /q "dist"
7-
if exist "build" rmdir /s /q "build"
5+
REM Clean previous .exe and .zip files
6+
if exist "dist\PushToTalk.exe" del /f /q "dist\PushToTalk.exe"
7+
if exist "dist\PushToTalk.zip" del /f /q "dist\PushToTalk.zip"
88

99
REM Build the executable
1010
echo Building GUI application with PyInstaller...

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ dependencies = [
88
"keyboard>=0.13.5",
99
"openai>=1.97.1",
1010
"pyaudio>=0.2.14",
11-
"pywin32>=309",
12-
"websocket-client>=1.8.0",
11+
"pyautogui>=0.9.54",
12+
"pyperclip>=1.9.0",
1313
]
1414

1515
[dependency-groups]

src/config_gui.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -622,7 +622,7 @@ def _run_application_thread(self):
622622
self.app_instance.start(setup_signals=False)
623623

624624
# Keep running until stopped
625-
while self.app_instance.is_running:
625+
while self.app_instance and self.app_instance.is_running:
626626
import time
627627

628628
time.sleep(0.1)
@@ -658,6 +658,10 @@ def _stop_application(self):
658658
)
659659
self._update_status_display()
660660

661+
# Wait for the background thread to finish before clearing references
662+
if self.app_thread and self.app_thread.is_alive():
663+
self.app_thread.join(timeout=1)
664+
661665
self.app_instance = None
662666
self.app_thread = None
663667

src/text_inserter.py

Lines changed: 21 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
import time
22
import logging
3+
import sys
34
from typing import Optional
4-
import win32gui
5-
import win32con
6-
import win32clipboard
7-
import win32api
5+
6+
import pyautogui
7+
import pyperclip
88

99
logger = logging.getLogger(__name__)
1010

@@ -48,38 +48,21 @@ def insert_text(self, text: str, method: str = "clipboard") -> bool:
4848
return False
4949

5050
def _insert_via_clipboard(self, text: str) -> bool:
51-
"""
52-
Insert text by copying to clipboard and pasting.
53-
This is generally more reliable for longer texts.
54-
"""
55-
try:
56-
# Get the current active window
57-
active_window = win32gui.GetForegroundWindow()
58-
if not active_window:
59-
logger.error("No active window found")
60-
return False
61-
62-
# Save current clipboard content
63-
original_clipboard = self._get_clipboard_text()
51+
"""Insert text by copying to clipboard and pasting."""
6452

65-
# Copy text to clipboard
66-
self._set_clipboard_text(text)
53+
try:
54+
original_clipboard = pyperclip.paste()
55+
pyperclip.copy(text)
6756

68-
# Small delay to ensure clipboard is set
6957
time.sleep(0.05)
7058

71-
# Send Ctrl+V to paste
72-
win32api.keybd_event(win32con.VK_CONTROL, 0, 0, 0)
73-
win32api.keybd_event(ord("V"), 0, 0, 0)
74-
win32api.keybd_event(ord("V"), 0, win32con.KEYEVENTF_KEYUP, 0)
75-
win32api.keybd_event(win32con.VK_CONTROL, 0, win32con.KEYEVENTF_KEYUP, 0)
59+
paste_keys = ["command", "v"] if sys.platform == "darwin" else ["ctrl", "v"]
60+
pyautogui.hotkey(*paste_keys)
7661

77-
# Small delay before restoring clipboard
7862
time.sleep(0.1)
7963

80-
# Restore original clipboard content
81-
if original_clipboard is not None:
82-
self._set_clipboard_text(original_clipboard)
64+
if original_clipboard:
65+
pyperclip.copy(original_clipboard)
8366

8467
logger.info(f"Text inserted via clipboard: {len(text)} characters")
8568
return True
@@ -89,54 +72,10 @@ def _insert_via_clipboard(self, text: str) -> bool:
8972
return False
9073

9174
def _insert_via_sendkeys(self, text: str) -> bool:
92-
"""
93-
Insert text by simulating individual keystrokes.
94-
Better for short texts but slower for longer ones.
95-
"""
96-
try:
97-
active_window = win32gui.GetForegroundWindow()
98-
if not active_window:
99-
logger.error("No active window found")
100-
return False
101-
102-
# Send each character individually
103-
for char in text:
104-
if char == "\n":
105-
# Send Enter for newlines
106-
win32api.keybd_event(win32con.VK_RETURN, 0, 0, 0)
107-
win32api.keybd_event(
108-
win32con.VK_RETURN, 0, win32con.KEYEVENTF_KEYUP, 0
109-
)
110-
elif char == "\t":
111-
# Send Tab for tabs
112-
win32api.keybd_event(win32con.VK_TAB, 0, 0, 0)
113-
win32api.keybd_event(
114-
win32con.VK_TAB, 0, win32con.KEYEVENTF_KEYUP, 0
115-
)
116-
else:
117-
# Convert character to virtual key code
118-
vk_code = win32api.VkKeyScan(char)
119-
if vk_code != -1:
120-
# Handle shift modifier for uppercase letters and symbols
121-
if vk_code & 0x100: # Shift key needed
122-
win32api.keybd_event(win32con.VK_SHIFT, 0, 0, 0)
123-
win32api.keybd_event(vk_code & 0xFF, 0, 0, 0)
124-
win32api.keybd_event(
125-
vk_code & 0xFF, 0, win32con.KEYEVENTF_KEYUP, 0
126-
)
127-
win32api.keybd_event(
128-
win32con.VK_SHIFT, 0, win32con.KEYEVENTF_KEYUP, 0
129-
)
130-
else:
131-
win32api.keybd_event(vk_code & 0xFF, 0, 0, 0)
132-
win32api.keybd_event(
133-
vk_code & 0xFF, 0, win32con.KEYEVENTF_KEYUP, 0
134-
)
135-
136-
# Small delay between keystrokes
137-
if self.insertion_delay > 0:
138-
time.sleep(self.insertion_delay)
75+
"""Insert text by simulating individual keystrokes."""
13976

77+
try:
78+
pyautogui.write(text, interval=self.insertion_delay)
14079
logger.info(f"Text inserted via sendkeys: {len(text)} characters")
14180
return True
14281

@@ -147,23 +86,13 @@ def _insert_via_sendkeys(self, text: str) -> bool:
14786
def _get_clipboard_text(self) -> Optional[str]:
14887
"""Get current clipboard text content."""
14988
try:
150-
win32clipboard.OpenClipboard()
151-
data = win32clipboard.GetClipboardData(win32con.CF_TEXT)
152-
win32clipboard.CloseClipboard()
153-
return data.decode("utf-8") if isinstance(data, bytes) else data
89+
return pyperclip.paste()
15490
except Exception:
155-
try:
156-
win32clipboard.CloseClipboard()
157-
except Exception:
158-
pass
15991
return None
16092

161-
def _set_clipboard_text(self, text: str):
93+
def _set_clipboard_text(self, text: str) -> None:
16294
"""Set clipboard text content."""
163-
win32clipboard.OpenClipboard()
164-
win32clipboard.EmptyClipboard()
165-
win32clipboard.SetClipboardText(text)
166-
win32clipboard.CloseClipboard()
95+
pyperclip.copy(text)
16796

16897
def get_active_window_title(self) -> Optional[str]:
16998
"""
@@ -173,10 +102,9 @@ def get_active_window_title(self) -> Optional[str]:
173102
Window title or None if no active window
174103
"""
175104
try:
176-
active_window = win32gui.GetForegroundWindow()
177-
if active_window:
178-
window_title = win32gui.GetWindowText(active_window)
179-
return window_title if window_title else None
105+
window = pyautogui.getActiveWindow()
106+
if window:
107+
return window.title if window.title else None
180108
return None
181109
except Exception as e:
182110
logger.error(f"Failed to get active window title: {e}")

0 commit comments

Comments
 (0)