A Python script for updating language fields in a Solr index based on a mapping file.
This script connects to a Solr instance and updates language field values according to mappings defined in a language_mapping.txt
file. It's designed to handle bulk updates efficiently with configurable batch sizes.
- Processes language mappings from a simple text file
- Updates documents in configurable batch sizes (default: 1000)
- Only updates documents where the language exactly matches the source value
- Provides progress reporting
- Handles commit behavior carefully
- Python 3.x
- pysolr library (
pip install pysolr
) - Access to a Solr instance
- Edit the Solr connection URL in the script:
solr = pysolr.Solr('http://localhost:8983/solr/biblio', always_commit=False)
- Create a language_mapping.txt file with your language mappings in the format:
old_language_code = new_language_code
-
Install dependencies:
pip install pysolr
-
Prepare your language_mapping.txt file
-
Run the script:
python update_languages.py
The script will print progress information including:
- Number of documents found for each language mapping
- Update progress for each batch
- Total number of updated records
- The script only updates documents where the language field exactly matches the source value
- By default, commits are deferred until all updates are complete (always_commit=False)
- Make sure to backup your Solr index before running bulk updates
eng=English
fre=French
ger=German