-
Notifications
You must be signed in to change notification settings - Fork 103
Fix crash on concurrent insertions #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
c7d408b
Add reproduction setup for GitHub issue #193
tjgreen42 637e465
Fix concurrent page access issue in diskann inserts (GitHub issue #193)
tjgreen42 944c858
Add comprehensive Python test infrastructure and fix concurrent inser…
tjgreen42 590f78c
Update .gitignore for Python development
tjgreen42 32aeed0
Update upload-artifact from v3 to v4 in Python tests workflow
tjgreen42 a34c87f
Fix cargo build command to avoid multiple PostgreSQL feature conflicts
tjgreen42 56e260e
Add cargo clean and PGRX_PG_VERSION env var to fix feature conflicts
tjgreen42 8a67501
Fix PostgreSQL version mismatch in pgrx initialization
tjgreen42 65199dd
Replace Docker PostgreSQL service with custom-built PostgreSQL
tjgreen42 cfd8a69
Fix shellcheck issues in shell scripts
tjgreen42 846c9ef
Fix PostgreSQL authentication using trust for all connections
tjgreen42 b3260da
Use transaction-level lock; cleanup
tjgreen42 f973b0d
Flatten tests directory structure and update documentation
tjgreen42 670d321
Update CI/CD configuration for flattened test structure
tjgreen42 63bbc69
Tidy up in preparation for review
tjgreen42 cae0f4c
Simplify Python version matrix in CI workflow
tjgreen42 627e8bd
More polish
tjgreen42 0242610
Merge branch 'main' into tj/concurrent_insertion_safety
tjgreen42 4467eca
Address PR feedback; fix test instability when running locally due to…
tjgreen42 fd68dfa
Fix shellcheck warnings in Python test script
tjgreen42 79f99f6
Merge branch 'main' into tj/concurrent_insertion_safety
tjgreen42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
name: Python Integration Tests | ||
on: [push, pull_request, workflow_dispatch] | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
python-tests: | ||
runs-on: ubuntu-22.04 | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
pgvector: | ||
- version: 0.7.4 | ||
pg: | ||
- major: 15 | ||
minor: 7 | ||
- major: 16 | ||
minor: 3 | ||
- major: 17 | ||
minor: 0 | ||
tjgreen42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
env: | ||
PG_SRC_DIR: pgbuild | ||
PG_INSTALL_DIR: postgresql | ||
MAKE_JOBS: 6 | ||
PG_CONFIG_PATH: postgresql/bin/pg_config | ||
PGDATA: /tmp/pgdata | ||
PGPORT: 5432 | ||
|
||
steps: | ||
- name: Checkout pgvectorscale | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Python 3.11 | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Cache pip dependencies | ||
uses: actions/cache@v3 | ||
with: | ||
path: ~/.cache/pip | ||
key: ${{ runner.os }}-pip-${{ hashFiles('tests/requirements.txt') }} | ||
restore-keys: | | ||
${{ runner.os }}-pip- | ||
|
||
- name: Install Linux Packages | ||
uses: ./.github/actions/install-packages | ||
|
||
- name: Install PostgreSQL ${{ matrix.pg.major }} | ||
uses: ./.github/actions/install-postgres | ||
with: | ||
pg-version: ${{ matrix.pg.major }}.${{ matrix.pg.minor }} | ||
pg-src-dir: ~/${{ env.PG_SRC_DIR }} | ||
pg-install-dir: ~/${{ env.PG_INSTALL_DIR }} | ||
|
||
- name: Install pgvector ${{ matrix.pgvector.version }} | ||
uses: ./.github/actions/install-pgvector | ||
with: | ||
pgvector-version: ${{ matrix.pgvector.version }} | ||
pg-install-dir: ~/${{ env.PG_INSTALL_DIR }} | ||
|
||
- name: Install pgrx | ||
uses: ./.github/actions/install-pgrx | ||
with: | ||
pg-install-dir: ~/${{ env.PG_INSTALL_DIR }} | ||
pgrx-version: 0.12.9 | ||
|
||
- name: Build and install pgvectorscale | ||
run: | | ||
cd pgvectorscale | ||
cargo clean | ||
# Ensure we use the correct PostgreSQL version that was installed | ||
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH | ||
export PG_CONFIG=~/${{ env.PG_INSTALL_DIR }}/bin/pg_config | ||
# Reinitialize pgrx to ensure it uses the correct PostgreSQL version | ||
cargo pgrx init --pg${{ matrix.pg.major }}=$PG_CONFIG | ||
# Install with explicit version matching | ||
cargo pgrx install --no-default-features --features pg${{ matrix.pg.major }} | ||
|
||
- name: Install Python dependencies | ||
run: | | ||
pip install -r tests/requirements.txt | ||
|
||
- name: Initialize and start PostgreSQL | ||
run: | | ||
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH | ||
# Initialize the database with trust authentication for all connections | ||
initdb -D ${{ env.PGDATA }} --auth-local=trust --auth-host=trust | ||
# Start PostgreSQL server | ||
pg_ctl -D ${{ env.PGDATA }} -l /tmp/postgres.log start | ||
# Wait for PostgreSQL to start | ||
sleep 5 | ||
# Create test user and database (using current user, no password needed with trust auth) | ||
createuser -s postgres || true # may already exist | ||
createdb test_db || true # may already exist | ||
|
||
- name: Setup test database with extensions | ||
run: | | ||
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH | ||
# Install extensions in the test database (no -U needed with trust auth) | ||
psql -h localhost -p 5432 -d test_db -c "CREATE EXTENSION IF NOT EXISTS vector;" | ||
psql -h localhost -p 5432 -d test_db -c "CREATE EXTENSION IF NOT EXISTS vectorscale;" | ||
|
||
- name: Run Python tests | ||
env: | ||
DATABASE_URL: postgresql+asyncpg://postgres@localhost:5432/test_db | ||
run: | | ||
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH | ||
pytest tests/ -v --tb=short | ||
|
||
- name: Upload test results | ||
uses: actions/upload-artifact@v4 | ||
if: always() | ||
with: | ||
name: python-test-results-pg${{ matrix.pg.major }} | ||
path: | | ||
pytest.log | ||
test-results.xml | ||
retention-days: 7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# CLAUDE.md | ||
|
||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
||
## Development Commands | ||
|
||
### Building and Installation | ||
- **Build development version**: `cd pgvectorscale && cargo pgrx install --features pg17` | ||
- **Build release version**: `cd pgvectorscale && cargo pgrx install --release --features pg17` | ||
- **Package extension**: `cd pgvectorscale && cargo pgrx package --features pg17` | ||
- **Initialize PGRX environment**: `cd pgvectorscale && cargo pgrx init --pg17 pg_config` | ||
|
||
### Testing | ||
- **Run Rust unit tests**: `cd pgvectorscale && cargo test` | ||
- **Run PGRX integration tests**: `cd pgvectorscale && cargo pgrx test pg17` | ||
- **Run specific test**: `cd pgvectorscale && cargo pgrx test pg17 test_name` | ||
- **Run tests for specific PostgreSQL version**: `cd pgvectorscale && cargo pgrx test -- pg16` (or pg13, pg14, pg15, pg17) | ||
|
||
### Code Quality | ||
- **Format code**: `cd pgvectorscale && cargo fmt` | ||
- **Check formatting**: `cd pgvectorscale && cargo fmt --check` | ||
- **Run linter**: `cd pgvectorscale && cargo clippy --all-targets --features pg17` | ||
- **Format shell scripts**: `make shfmt` | ||
- **Check shell scripts**: `make shellcheck` | ||
|
||
### Makefile Commands | ||
- **Format Rust code**: `make format` | ||
- **Build debug**: `make build` | ||
- **Install debug**: `make install-debug` | ||
- **Install release**: `make install-release` | ||
|
||
## Architecture Overview | ||
|
||
pgvectorscale is a PostgreSQL extension written in Rust using the PGRX framework that provides high-performance vector indexing and search capabilities. It builds on pgvector with new index types and compression methods. | ||
|
||
### Core Components | ||
|
||
**Access Method Implementation** (`src/access_method/`): | ||
- **StreamingDiskANN Index**: Main index algorithm based on Microsoft's DiskANN research | ||
- **SBQ (Statistical Binary Quantization)**: Compression method for memory-efficient storage | ||
- **Plain Storage**: Uncompressed vector storage option | ||
- **Label-based Filtering**: Efficient filtered vector search using smallint arrays | ||
|
||
**Key Modules**: | ||
- `access_method/mod.rs`: Main access method registration and interface | ||
- `access_method/build.rs`: Index building logic and construction algorithms | ||
- `access_method/scan.rs`: Query execution and graph traversal during search | ||
- `access_method/sbq/`: Statistical Binary Quantization implementation for compression | ||
- `access_method/plain/`: Plain (uncompressed) storage implementation | ||
- `access_method/labels/`: Label-based filtering system for efficient metadata filtering | ||
- `access_method/distance/`: Optimized distance calculations with SIMD support | ||
|
||
**Storage Architecture**: | ||
- Uses PostgreSQL's access method API for integration | ||
- Supports both compressed (SBQ) and uncompressed (plain) storage layouts | ||
- Graph-based index structure stored across PostgreSQL pages | ||
- Label arrays stored as smallint[] for efficient filtering | ||
|
||
**Distance Support**: | ||
- Cosine distance (`<=>`) with `vector_cosine_ops` | ||
- L2 distance (`<->`) with `vector_l2_ops` | ||
- Inner product (`<#>`) with `vector_ip_ops` | ||
|
||
### Build Configuration | ||
|
||
The project uses a workspace structure with the main extension in `pgvectorscale/` and derives in `pgvectorscale_derive/`. PostgreSQL version support is controlled via Cargo features (pg13-pg17). The default feature is pg17, but you can build for other versions using `--features pg16`, `--features pg15`, etc. | ||
|
||
**Version Dependencies**: | ||
- PGRX version: 0.12.9 (must match cargo-pgrx version) | ||
- Supports PostgreSQL 13, 14, 15, 16, and 17 | ||
- Requires pgvector extension as a dependency | ||
|
||
### Testing Strategy | ||
|
||
Tests are primarily Rust unit tests within modules and PGRX integration tests. The extension includes benchmark suites for distance calculations and graph operations. | ||
|
||
**CI/CD Process**: | ||
- Code formatting is checked via `cargo fmt --check` in CI | ||
- Clippy linting runs on all PostgreSQL versions (pg13-pg17) | ||
- Full test suite runs on both AMD64 and ARM64 platforms | ||
- Tests run against multiple PostgreSQL versions and pgvector 0.7.4 | ||
|
||
### Development Notes | ||
|
||
**Important Limitations**: | ||
- Building on macOS X86 (Intel) is currently not supported (use ARM Mac, Linux, or Docker) | ||
- Index creation on UNLOGGED tables is not yet implemented | ||
- The StreamingDiskANN index uses relaxed ordering (results may be slightly out of order by distance) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.