Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
c7d408b
Add reproduction setup for GitHub issue #193
tjgreen42 Jun 23, 2025
637e465
Fix concurrent page access issue in diskann inserts (GitHub issue #193)
tjgreen42 Jun 23, 2025
944c858
Add comprehensive Python test infrastructure and fix concurrent inser…
tjgreen42 Jun 23, 2025
590f78c
Update .gitignore for Python development
tjgreen42 Jun 23, 2025
32aeed0
Update upload-artifact from v3 to v4 in Python tests workflow
tjgreen42 Jun 23, 2025
a34c87f
Fix cargo build command to avoid multiple PostgreSQL feature conflicts
tjgreen42 Jun 23, 2025
56e260e
Add cargo clean and PGRX_PG_VERSION env var to fix feature conflicts
tjgreen42 Jun 23, 2025
8a67501
Fix PostgreSQL version mismatch in pgrx initialization
tjgreen42 Jun 23, 2025
65199dd
Replace Docker PostgreSQL service with custom-built PostgreSQL
tjgreen42 Jun 23, 2025
cfd8a69
Fix shellcheck issues in shell scripts
tjgreen42 Jun 23, 2025
846c9ef
Fix PostgreSQL authentication using trust for all connections
tjgreen42 Jun 23, 2025
b3260da
Use transaction-level lock; cleanup
tjgreen42 Jun 23, 2025
f973b0d
Flatten tests directory structure and update documentation
tjgreen42 Jun 23, 2025
670d321
Update CI/CD configuration for flattened test structure
tjgreen42 Jun 23, 2025
63bbc69
Tidy up in preparation for review
tjgreen42 Jun 24, 2025
cae0f4c
Simplify Python version matrix in CI workflow
tjgreen42 Jun 24, 2025
627e8bd
More polish
tjgreen42 Jun 24, 2025
0242610
Merge branch 'main' into tj/concurrent_insertion_safety
tjgreen42 Jun 24, 2025
4467eca
Address PR feedback; fix test instability when running locally due to…
tjgreen42 Jun 26, 2025
fd68dfa
Fix shellcheck warnings in Python test script
tjgreen42 Jun 26, 2025
79f99f6
Merge branch 'main' into tj/concurrent_insertion_safety
tjgreen42 Jun 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pgrx_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
fail-fast: false
matrix:
pgvector:
- version: 0.7.4
- version: 0.8.0
pg:
- major: 13
minor: 16
Expand Down
125 changes: 125 additions & 0 deletions .github/workflows/python_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
name: Python Integration Tests
on: [push, pull_request, workflow_dispatch]

permissions:
contents: read

jobs:
python-tests:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
pgvector:
- version: 0.8.0
pg:
- major: 13
minor: 16
- major: 14
minor: 13
- major: 15
minor: 7
- major: 16
minor: 3
- major: 17
minor: 0

env:
PG_SRC_DIR: pgbuild
PG_INSTALL_DIR: postgresql
MAKE_JOBS: 6
PG_CONFIG_PATH: postgresql/bin/pg_config
PGDATA: /tmp/pgdata
PGPORT: 5432

steps:
- name: Checkout pgvectorscale
uses: actions/checkout@v4

- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: '3.11'

- name: Cache pip dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('tests/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-

- name: Install Linux Packages
uses: ./.github/actions/install-packages

- name: Install PostgreSQL ${{ matrix.pg.major }}
uses: ./.github/actions/install-postgres
with:
pg-version: ${{ matrix.pg.major }}.${{ matrix.pg.minor }}
pg-src-dir: ~/${{ env.PG_SRC_DIR }}
pg-install-dir: ~/${{ env.PG_INSTALL_DIR }}

- name: Install pgvector ${{ matrix.pgvector.version }}
uses: ./.github/actions/install-pgvector
with:
pgvector-version: ${{ matrix.pgvector.version }}
pg-install-dir: ~/${{ env.PG_INSTALL_DIR }}

- name: Install pgrx
uses: ./.github/actions/install-pgrx
with:
pg-install-dir: ~/${{ env.PG_INSTALL_DIR }}
pgrx-version: 0.12.9

- name: Build and install pgvectorscale
run: |
cd pgvectorscale
cargo clean
# Ensure we use the correct PostgreSQL version that was installed
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH
export PG_CONFIG=~/${{ env.PG_INSTALL_DIR }}/bin/pg_config
# Reinitialize pgrx to ensure it uses the correct PostgreSQL version
cargo pgrx init --pg${{ matrix.pg.major }}=$PG_CONFIG
# Install with explicit version matching
cargo pgrx install --no-default-features --features pg${{ matrix.pg.major }}

- name: Install Python dependencies
run: |
pip install -r tests/requirements.txt

- name: Initialize and start PostgreSQL
run: |
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH
# Initialize the database with trust authentication for all connections
initdb -D ${{ env.PGDATA }} --auth-local=trust --auth-host=trust
# Start PostgreSQL server
pg_ctl -D ${{ env.PGDATA }} -l /tmp/postgres.log start
# Wait for PostgreSQL to start
sleep 5
# Create test user and database (using current user, no password needed with trust auth)
createuser -s postgres || true # may already exist
createdb test_db || true # may already exist

- name: Setup test database with extensions
run: |
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH
# Install extensions in the test database (no -U needed with trust auth)
psql -h localhost -p 5432 -d test_db -c "CREATE EXTENSION IF NOT EXISTS vector;"
psql -h localhost -p 5432 -d test_db -c "CREATE EXTENSION IF NOT EXISTS vectorscale;"

- name: Run Python tests
env:
DATABASE_URL: postgresql+asyncpg://postgres@localhost:5432/test_db
run: |
export PATH=~/${{ env.PG_INSTALL_DIR }}/bin:$PATH
pytest tests/ -v --tb=short

- name: Upload test results
uses: actions/upload-artifact@v4
if: always()
with:
name: python-test-results-pg${{ matrix.pg.major }}
path: |
pytest.log
test-results.xml
retention-days: 7
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,17 @@ Cargo.lock
*.pdb

.idea

# macOS
.DS_Store

# Claude Code workspace
.claude/

# Python cache files
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
.pytest_cache/
88 changes: 88 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Development Commands

### Building and Installation
- **Build development version**: `cd pgvectorscale && cargo pgrx install --features pg17`
- **Build release version**: `cd pgvectorscale && cargo pgrx install --release --features pg17`
- **Package extension**: `cd pgvectorscale && cargo pgrx package --features pg17`
- **Initialize PGRX environment**: `cd pgvectorscale && cargo pgrx init --pg17 pg_config`

### Testing
- **Run Rust unit tests**: `cd pgvectorscale && cargo test`
- **Run PGRX integration tests**: `cd pgvectorscale && cargo pgrx test pg17`
- **Run specific test**: `cd pgvectorscale && cargo pgrx test pg17 test_name`
- **Run tests for specific PostgreSQL version**: `cd pgvectorscale && cargo pgrx test -- pg16` (or pg13, pg14, pg15, pg17)

### Code Quality
- **Format code**: `cd pgvectorscale && cargo fmt`
- **Check formatting**: `cd pgvectorscale && cargo fmt --check`
- **Run linter**: `cd pgvectorscale && cargo clippy --all-targets --features pg17`
- **Format shell scripts**: `make shfmt`
- **Check shell scripts**: `make shellcheck`

### Makefile Commands
- **Format Rust code**: `make format`
- **Build debug**: `make build`
- **Install debug**: `make install-debug`
- **Install release**: `make install-release`

## Architecture Overview

pgvectorscale is a PostgreSQL extension written in Rust using the PGRX framework that provides high-performance vector indexing and search capabilities. It builds on pgvector with new index types and compression methods.

### Core Components

**Access Method Implementation** (`src/access_method/`):
- **StreamingDiskANN Index**: Main index algorithm based on Microsoft's DiskANN research
- **SBQ (Statistical Binary Quantization)**: Compression method for memory-efficient storage
- **Plain Storage**: Uncompressed vector storage option
- **Label-based Filtering**: Efficient filtered vector search using smallint arrays

**Key Modules**:
- `access_method/mod.rs`: Main access method registration and interface
- `access_method/build.rs`: Index building logic and construction algorithms
- `access_method/scan.rs`: Query execution and graph traversal during search
- `access_method/sbq/`: Statistical Binary Quantization implementation for compression
- `access_method/plain/`: Plain (uncompressed) storage implementation
- `access_method/labels/`: Label-based filtering system for efficient metadata filtering
- `access_method/distance/`: Optimized distance calculations with SIMD support

**Storage Architecture**:
- Uses PostgreSQL's access method API for integration
- Supports both compressed (SBQ) and uncompressed (plain) storage layouts
- Graph-based index structure stored across PostgreSQL pages
- Label arrays stored as smallint[] for efficient filtering

**Distance Support**:
- Cosine distance (`<=>`) with `vector_cosine_ops`
- L2 distance (`<->`) with `vector_l2_ops`
- Inner product (`<#>`) with `vector_ip_ops`

### Build Configuration

The project uses a workspace structure with the main extension in `pgvectorscale/` and derives in `pgvectorscale_derive/`. PostgreSQL version support is controlled via Cargo features (pg13-pg17). The default feature is pg17, but you can build for other versions using `--features pg16`, `--features pg15`, etc.

**Version Dependencies**:
- PGRX version: 0.12.9 (must match cargo-pgrx version)
- Supports PostgreSQL 13, 14, 15, 16, and 17
- Requires pgvector extension as a dependency

### Testing Strategy

Tests are primarily Rust unit tests within modules and PGRX integration tests. The extension includes benchmark suites for distance calculations and graph operations.

**CI/CD Process**:
- Code formatting is checked via `cargo fmt --check` in CI
- Clippy linting runs on all PostgreSQL versions (pg13-pg17)
- Full test suite runs on both AMD64 and ARM64 platforms
- Tests run against multiple PostgreSQL versions and pgvector 0.7.4

### Development Notes

**Important Limitations**:
- Building on macOS X86 (Intel) is currently not supported (use ARM Mac, Linux, or Docker)
- Index creation on UNLOGGED tables is not yet implemented
- The StreamingDiskANN index uses relaxed ordering (results may be slightly out of order by distance)
31 changes: 31 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -83,5 +83,36 @@ shellcheck:
shfmt:
shfmt -w -i 4 test scripts

# Python test targets
.PHONY: test-python-setup test-python test-concurrency test-integration test-all

# Setup Python test environment
test-python-setup:
@echo "Setting up Python test environment..."
python3 -m venv .venv || true
.venv/bin/pip install -r tests/requirements.txt

# Run Python integration tests
test-python: test-python-setup
@echo "Running Python tests..."
./scripts/run-python-tests.sh

# Run specific test categories
test-concurrency: test-python-setup
@echo "Running concurrency tests..."
PYTEST_ARGS="-v -m concurrency" ./scripts/run-python-tests.sh

test-integration: test-python-setup
@echo "Running integration tests..."
PYTEST_ARGS="-v -m integration" ./scripts/run-python-tests.sh

# Run all tests (existing + Python)
test-all: test test-python
@echo "All tests completed!"

# Development helper - run tests with database cleanup
test-python-dev: test-python-setup
@echo "Running Python tests with cleanup..."
PYTEST_ARGS="-v --tb=short -x" ./scripts/run-python-tests.sh

.PHONY: release rust test prove install clean
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ offering the PostgreSQL community a new avenue for contributing to vector suppor
* [Install pgvectorscale](#installation)
* [Get started using pgvectorscale](#get-started-with-pgvectorscale)

If you **want to contribute** to this extension, see how to [build pgvectorscale from source in a developer environment](./DEVELOPMENT.md).
If you **want to contribute** to this extension, see how to [build pgvectorscale from source in a developer environment](./DEVELOPMENT.md) and our [testing guide](./TESTING.md).

For production vector workloads, get **private beta access to vector-optimized databases** with pgvector and pgvectorscale on Timescale. [Sign up here for priority access](https://timescale.typeform.com/to/H7lQ10eQ).

Expand Down
46 changes: 46 additions & 0 deletions TESTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Testing Guide for pgvectorscale

pgvectorscale has two main types of tests:

1. **Rust Tests** - Using PGRX's `#[pg_test]` framework (can be in any source file)
2. **Python Tests** - Using pytest for multi-process concurrency testing

## Rust Tests

```bash
# Run all Rust tests
cd pgvectorscale && cargo pgrx test pg17

# Run specific test
cd pgvectorscale && cargo pgrx test pg17 test_name
```

## Python Tests

```bash
# Setup (creates .venv virtual environment)
make test-python-setup

# Run all Python tests
make test-python

# Run specific categories
pytest tests/ -m concurrency -v # Multi-process concurrency tests
pytest tests/ -m integration -v # Basic integration tests

# For PGRX development (custom port)
DB_PORT=28817 ./scripts/run-python-tests.sh
```

### Test Markers

- `@pytest.mark.concurrency` - Multi-process concurrency tests
- `@pytest.mark.integration` - Basic integration tests

### Prerequisites

For PGRX development:
```bash
cd pgvectorscale && cargo pgrx start pg17
cargo pgrx install --features pg17
```
Loading
Loading