Skip to content

sqrtpapi2001/libgen_librarian

Repository files navigation

NOTE that requirements.txt is missing for every module. Currently written in Python using ChatGPT.

The main directory folder contains the GUI interface for the Libgen Librarian scraper.

The tgram_scraper connects to HegelBot on the ALS T-gram channel who leaves a trophy reaction once the image, text message, or audio file is finished scraping.

  • The channel IDs are hardcoded in. You'll see its just a list we can continue expanding.

The cover_classifier sorts covers and text-heavy images out of the scrape.

What remains to be done:

  1. extracting text using DataLab Marker OCR from images and PDFs
  2. full translation of PDFs by processing the OCR output through subscription AI
  3. partially formatting the translation along with images best we can before sending off to Fiverr for manual workup
  4. extraction of bibliographies and citations using subscription AI
  5. connecting output back into GUI interface for manual prioritization sorting

5a. Separate tabs for separate workflows

5b. Connection to remote server; use a web interface instead of a Tkinter interface

About

Acquisition system for THE AUGUSTE LAURENT SOCIETY

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages