This project requires a Saxon EE license for XML processing. The license file (saxon-license.lic
) is not included in this repository for security reasons.
For CI/CD environments, set the SAXON_LICENSE
environment variable with the license content. For local development, place your saxon-license.lic
file in the lib/
directory.
make -j $(nproc) test
make -j $(nproc) test index
INDEX=./target/dnf.index docker compose -p defako --profile=lite -f korap4dnb-compose.yml up -d
xdg-open http://localhost:4001/?q=Test
ssh -L 4001:localhost:4001 korap.dnb.de
xdg-open http://localhost:4001/?q=Test
docker compose -p defako down
This is actually the first step, but usually not necessary, as the comparatively expensive TEI P5 files in p5
folder are not deleted by make clean
.
docker run --rm --init -v ./grobid.yaml:/opt/grobid/grobid-home/config/grobid.yaml --ulimit core=0 -e JAVA_OPTS=-Xmx400g -p 8070:8070 grobid/grobid:0.8.1
java -jar lib/org.grobid.client-0.5.4-SNAPSHOT.one-jar.jar -n 100 -in /mnt/data/Diss-Sample/PDF -out p5
Configure Apache2 to proxy requests to the local KorAP server:
ProxyPass /defako http://localhost:4001
ProxyPassReverse /defako http://localhost:4001
Kupietz, Marc/Leinen, Peter/Diewald, Nils (2024): Towards a Very Large German Academic Corpus: Step 1: Building and Making Available a Corpus of 10,000 Doctoral Dissertations. Talk given at the Workshop on Comparable and Interoperable Corpora of Academic Texts @CLARIN2024 on 2024-10-18, Barcelona. https://corpora.ids-mannheim.de/slides/2024-10-17-Towards-a-German-Academic-Corpus/#/.