-
Notifications
You must be signed in to change notification settings - Fork 103
Fix crash on concurrent insertions #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Set up Python reproduction script for page corruption issue with concurrent diskann insertions. Successfully reproduces the assertion failure: "assertion failed: (*header).pd_special >= SizeOfPageHeaderData as u16" - Add issue_193_repro.py with adapted connection string for local development - Add Python dependencies (SQLAlchemy, asyncpg, pgvector, numpy, greenlet) - Add run_repro.sh helper script for easy testing - Add documentation in scripts/README.md - Update CLAUDE.md with additional development context Issue occurs with parallelism > 1 but works fine with parallelism = 1. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Resolves assertion failure: "(*header).pd_special >= SizeOfPageHeaderData as u16" that occurred during concurrent vector insertions with high parallelism. Root cause: Race condition when multiple threads call Tape::resume() and try to modify the same "newest" page simultaneously, causing page corruption during WAL registration in WritablePage::modify(). Solution: Modified insert_storage() to use create_tape_safely() which always creates new pages during insertions instead of resuming existing pages. Trade-offs: - ✅ Eliminates race condition and ensures data integrity - ✅ Maintains performance for concurrent workloads - ❌ Uses more disk space due to reduced page sharing - ❌ Temporary fix - proper locking would be more elegant Testing: - Reproduced issue consistently before fix (60% failure rate) - All tests pass after fix (0% failure rate across multiple runs) - Verified with parallelism up to 10 and batch sizes up to 300 TODO: Implement proper per-relation locking for page resumption to restore space efficiency while maintaining concurrency safety. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…tion race condition ## Summary - Implement Python test infrastructure for multi-process concurrency testing - Fix GitHub issue #193: concurrent insertion race condition in diskann index - Add comprehensive testing documentation and CI integration ## Python Test Infrastructure - **Directory structure**: Organized tests/ with concurrency/ and integration/ subdirs - **Test framework**: pytest with psycopg2 for reliable database connections - **Fixtures**: Clean database state management and extension setup - **Test categories**: Concurrency tests (reproduce issue #193) and integration tests - **Test runners**: Shell script and Makefile integration for easy execution - **CI integration**: GitHub Actions workflow for multi-version PostgreSQL testing ## Concurrency Fix (Issue #193) - **Root cause**: Race condition when multiple processes resume writing to same page - **Solution**: Modified create_tape_safely() to use Tape::new() instead of Tape::resume() - **Trade-off**: Uses more disk space but ensures data integrity - **Validation**: Concurrency tests now pass, reproducing and validating the fix ## Files Added - `tests/` - Complete Python test suite with concurrency and integration tests - `TESTING.md` - Comprehensive testing documentation and developer guide - `.github/workflows/python_tests.yml` - CI workflow for Python tests - `scripts/run-python-tests.sh` - Smart test runner with environment detection ## Files Modified - `pgvectorscale/src/access_method/build.rs` - Fix for concurrent insertion safety and functional regression test - `Makefile` - Added Python test targets - `README.md` - Link to testing documentation ## Files Moved - `scripts/issue_193_repro.py` → `tests/concurrency/issue_193_original.py` - Organized into test structure ## Testing Results - ✅ All integration tests pass (extension installation, index creation, queries) - ✅ Concurrency tests validate the fix (previously failing, now passing) - ✅ Multi-process testing works correctly (separate database connections) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Add common ignore patterns for Python development: - __pycache__/ directories and compiled Python files - .pytest_cache/ for pytest temporary files - .DS_Store for macOS system files - .claude/ for Claude Code workspace files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Fix deprecated actions/upload-artifact@v3 that was causing CI failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Add --no-default-features flag to prevent pg17 default from conflicting with matrix PostgreSQL versions. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Explicitly set PGRX_PG_VERSION and clean cargo cache to prevent multiple PostgreSQL version features from conflicting. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Ensure pgrx is reinitialized with the correct PostgreSQL version from the matrix to prevent pg14/pg15 feature conflicts. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove Docker service and use the custom-built PostgreSQL instance where pgvector and pgvectorscale are properly installed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Quote variables to prevent word splitting - Use direct exit code checks instead of $? - Add exit on cd failure - Fix array vs string parameter handling - Add shellcheck disable for venv/bin/activate 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Change from md5 to trust authentication and simplify database setup to avoid password prompts. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Move test files from subdirectories to root tests/ directory - Remove empty concurrency/, integration/, and fixtures/ subdirectories - Update TESTING.md to reflect flattened structure with marker-based organization - Use pytest markers (@pytest.mark.concurrency, @pytest.mark.integration) instead of directory-based categorization - Simplify test discovery and execution with standard pytest conventions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Update GitHub workflow to use pytest markers (-m concurrency) instead of directory paths - Update Makefile test-concurrency and test-integration targets to use markers - Remove pytest cache with outdated directory references - Maintain functionality of make test-concurrency and make test-integration commands 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Python version matrix, use fixed Python 3.11 - Reduces CI job count from 6 to 3 (3 PG versions instead of 3 PG × 2 Python) - Python 3.11 provides sufficient coverage for integration tests - Update artifact naming to remove Python version reference 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
syvb
requested changes
Jun 25, 2025
Quote PYTEST_ARGS variable to prevent word splitting and globbing issues. This addresses SC2086 warnings and follows bash best practices for variable expansion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
syvb
approved these changes
Jun 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #193: