Skip to content

Add dataset: Berlin_State_Library_extracted_illustrations #87

@davanstrien

Description

@davanstrien

A URL for this dataset

https://zenodo.org/record/2602431

Dataset description

The dataset consists of various illustrations extracted from 26,233 historical books and other media offered in the Berlin State Library's Digitized Collections. The media objects are older than 1920.

Version 1.0 contains of 594,890 extracted illustrations in total.

The extraction of illustrations is driven by the coordinates given by the ABBYY FineReader OCR engine (in ALTO XML) . The extracted illustrations have not been resized but compressed and saved in JPEG format.

The extracts for each media object are stored in separated sub-folders and tar files named after the PPN (a unique ID used in the library) to facilitate further processing. Additional metadata can be obtained with help of the PPN as described here: https://github.com/elektrobohemian/StabiHacks/blob/master/ppn-howto.md .

Dataset modality

Image

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

10GB

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    datasetDataset to be added

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions