Matricula Online is a non-profit initiative that digitizes parish records in Central Europe and hosts them for free on their website. matricula-online-scraper
is a command-line interface (CLI) tool that enables you to download data directly from it.
This includes the scanned parish registers (scanned books about baptism, marriage, death records etc.) as well as metadata about the parishes and their registers. Data can be downloaded in CSV or JSON, images are generally provided as JPEG files.
Make sure to meet the minimum required version of Python. You can install
this tool via pip
:
$ pip install -u matricula-online-scraper
Or use a container ๐ณ
For every version, an OCI container image is built and published to GHCR.io (GitHub's own container registry).
This is especially useful if you do not want to deal with Python environments, multiple Python versions and package managers.
Or you could use matricula-online-scraper
in an automated environment this way.
The image can also be used as a disposable container, leaving no dependencies or build artifacts on your system.
Simply copy and paste the following command into your terminal, it will automatically pull the latest image and run it:
$ docker run --rm -it ghcr.io/lsg551/matricula-online-scraper:latest
This will print the default help message and exit โ but from the container and the output will be visible in your terminal.
You can append any command of matricula-online-scraper
to the end to run it directly, e.g. to list all parishes:
# docker run --rm -it <IMAGE> <SUBCOMMAND>
$ docker run --rm -it ghcr.io/lsg551/matricula-online-scraper:latest parish list --place Paderborn -h
If you want to scrape data and save it to your local filesystem, you will have to create a bind mount via the -v
flag though.
Otherwise, the data would be saved inside the container, but not on your own machine.
Let's say you want to scrape this parish register and save it to your current working directory.
$ docker run -v "$(pwd):/data" --rm -it ghcr.io/lsg551/matricula-online-scraper:latest \
parish fetch https://data.matricula-online.eu/de/deutschland/muenster/anholt-st-pankratius/KB001_1/?pg=1 \
-o /data/matricula
It will write directly from the container to a subfolder in your current working directory (pwd
) called matricula
, which is mounted to /data/
inside the container.
Lastly, you can also get an interactive shell in the container
$ docker run --rm -it --entrypoint /bin/ash ghcr.io/lsg551/matricula-online-scraper:latest
root@abc123:/app# matricula-online-scraper --version
0.8.0
This will keep the container running until you exit it with exit
, so you can run any command inside as you like.
NOTE: You could also use podman
, a drop-in replacement for docker
, if you like. The commands are the same.
Or build from source
If you want to get the latest version or just build from source, you can clone the repository and install it manually,
favorably via uv
:
$ git clone https://github.com/lsg551/matricula-online-scraper.git
$ cd matricula-online-scraper
$ uv venv && uv sync
If you do not have uv
installed, you can install it via pip
:
$ pip install -r requirements.txt
Once installed, you can can append the --help
flag to any command to see its usage and options.
$ matricula-online-scraper --help
Usage: matricula-online-scraper [OPTIONS] COMMAND [ARGS]...
Command Line Interface (CLI) for scraping Matricula Online
https://data.matricula-online.eu.
You can use this tool to scrape the three primary entities from Matricula:
1. Scanned parish registers (โ images of baptism, marriage, and death records)
2. A list of all available parishes (โ location metadata)
3. A list for each parish with metadata about its registers, including dates ranges,
type etc.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --verbose,--debug -v Enable verbose logging (DEBUG). โ
โ --quiet -q Suppress all output (CRITICAL). โ
โ --version Show the CLI's version. โ
โ --install-completion Install completion for the current shell. โ
โ --show-completion Show completion for the current shell, to copy it or โ
โ customize the installation. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ parish Scrape parish registers (1), a list with all available parishes (2) or a โ
โ list of the available registers in a parish (3). โ
โ newsfeed Scrape Matricula Online's Newsfeed. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Attach the --help flag to any subcommand for further help and to see its options. Press
CTRL+C to exit at any time.
See https://github.com/lsg551/matricula-online-scraper for more information.
(1) Download a scanned parish register (i.e. images)
Imagine you opened a certain parish register on Matricula and want to download the entire book or a single page.
Let's say you want to download the death register of Bautzen, Germany,
starting from 1661. Copy the URL of the register when you are in the image viewer, this might look like https://data.matricula-online.eu/en/deutschland/dresden/bautzen/11/?pg=1
.
Then run the following command and paste the URL into the prompt:
$ matricula-online-scraper parish fetch https://data.matricula-online.eu/en/deutschland/dresden/bautzen/11/?pg=1
Run matricula-online-scraper parish fetch --help
to see all available options.
(2) List all available parishes on Matricula
$ matricula-online-scraper parish list
This command will fetch all parishes from Matricula Online, effectively scraping the entire "Fonds" page. The resulting data looks like:
country , region , name , url , longitude , latitude
Deutschland, "Passau, rk. Bistum" , Arbing-bei-Neuoetting, https://data.matricula-online.eu/en/deutschland/passau/arbing-bei-neuoetting/, 12.7081934381511 , 48.32953342002908
รsterreich , Oberรถsterreich: Rk. Diรถzese Linz, Eberschwang , https://data.matricula-online.eu/en/oesterreich/oberoesterreich/eberschwang/ , 13.5620985 , 48.15550995
Polen , "Breslau/Wroclaw, Staatsarchiv" , Hermsdorf , https://data.matricula-online.eu/en/polen/breslau/hermsdorf/ , 15.642741683666767, 50.84699257482722
It may take a few minutes to complete and will yield a few thousand rows. Each url
value leads to the main page of the parish
and can bepiped into the next command (3) to fetch metadata about the parish's registers.
Run matricula-online-scraper parish list --help
to see all available options.
NOTE: The data only changes rarely. A GitHub workflow automatically executes this command once a week
and pushes to cache/parishes
.
This has the advantage that you can download the data without having to run and waiting for the command yourself
as well as taking some load off the Matricula servers.
Click here to download the entire CSV: ๐ parishes.csv
๐
Or with cURL:
curl -L https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz | gunzip > parishes.csv
(3) List all registers available in a specific parish
This command will download a list of all available registers for a single parish, including certain metadata such as the type of register, the date range, and the URL to the register itself etc.
$ matricula-online-scraper parish show https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/
A sample from the output (here JSON Lines) looks like this:
{
"name": "Taufen",
"url": "https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/KB001/",
"accession_number": "KB001",
"date": "1715 - 1800",
"register_type": "Taufen",
"date_range_start": "Jan. 1, 1715",
"date_range_end": "Dec. 31, 1800"
}
Run matricula-online-scraper parish show --help
to see all available options.
(4) Combine and chain these commands to download all registers within a certain region.
The three examples above only highlight a single command for different data types each. However, this data is not unconnected and can be linked together. The CLI is designed with this in mind, so you can easily combine commands, pipe data around, and chain them together to achieve more complex tasks.
For example, after you have obtained a complete list of all parishes (2), you can filter that list to only include parishes within a certain region, such as "Paderborn" in Germany, and then pipe these parish URLs from that list into the next command to download a list for each parish with metadata about its registers (3). Finally, you can pipe the URLs of the registers into the next command to download the images of the registers (1).
The following command will download the cached list with all parishes (2) (faster than matricula-online-scraper parish list
), filter all parishes within the region "Paderborn", and pipe the parish URLs to matricula-online-scraper parish show
to get the metadata about the registers for each parish (3). Then, matricula-online-scraper parish fetch
will be called for all registers of each parish and proceeds to download the images of the registers (1).
curl -sL https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz \
| gunzip \
| csvgrep -c region -m "Paderborn" \
| csvcut -c url \
| csvformat --skip-header \
| xargs -n 1 -P 4 matricula-online-scraper parish show -o - \
| jq -r ".url // empty" \
| matricula-online-scraper parish fetch
It uses csvkit
for processing the CSV data. Make sure to install it via pip install csvkit
or your package manager of choice if you want to replicate this example. Also make sure to have jq
installed, as it is used to parse and manipulate the JSON output of some commands.
These examples are obviously not exhaustive, but they should give you an idea of how to use the CLI tool and how to combine commands to achieve more complex tasks. With the data from Matricula Online, matricula-online-scraper
and 3rd party tools like csvkit
and jq
, you could build geolocation searches form the coordinates provided for each parish, filter the parishes within a certain data range and region, narrow down the registers to a specific type (e.g. only baptism records), regularly backup your most important parish registers, and so on.
This project is licensed under the MIT License - see the LICENSE file for details.
You can read more about Matricula Online's terms of use and data licenses
on their page or
check out their robots.txt
file at
data.matricula-online.eu/robots.txt
regarding restrictions of the use of automated tools (as of March 2025, they
have none).