GitHub - sqrtpapi2001/libgen_librarian: Acquisition system for THE AUGUSTE LAURENT SOCIETY

NOTE that requirements.txt is missing for every module. Currently written in Python using ChatGPT.

The main directory folder contains the GUI interface for the Libgen Librarian scraper.

The tgram_scraper connects to HegelBot on the ALS T-gram channel who leaves a trophy reaction once the image, text message, or audio file is finished scraping.

The channel IDs are hardcoded in. You'll see its just a list we can continue expanding.

The cover_classifier sorts covers and text-heavy images out of the scrape.

What remains to be done:

extracting text using DataLab Marker OCR from images and PDFs
full translation of PDFs by processing the OCR output through subscription AI
partially formatting the translation along with images best we can before sending off to Fiverr for manual workup
extraction of bibliographies and citations using subscription AI
connecting output back into GUI interface for manual prioritization sorting

5a. Separate tabs for separate workflows

5b. Connection to remote server; use a web interface instead of a Tkinter interface

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cover_classifier		cover_classifier
tgram_scraper		tgram_scraper
README.md		README.md
app.py		app.py
books.json		books.json
last_host.txt		last_host.txt
last_json_path.txt		last_json_path.txt
missing.png		missing.png
queue.json		queue.json
spinner.gif		spinner.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

sqrtpapi2001/libgen_librarian

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages