Skip to content

Mind-Ext/PDFoutliner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFoutliner

Automatically extract outlines from software-generated PDF documents based on layout and text styles

Work in progress to improve the algorithm and add more test examples

CLI

# Requires Node 20+
npm install -g pdfoutliner
pdfoutliner -h
# Install globally and see options
# Alternatively run `npx pdfoutliner -h` without installation
pdfoutliner example.pdf
# outline will be added to new file example_outlined.pdf
pdfoutliner example.pdf -o txt
pdfoutliner example.pdf --fromtxt
# first save outline to example_outline.txt for manual edit
# then add outline from txt file to pdf

Web

Demo

Motivation

Some scientific papers (particularly preprints) don't include outline in the PDF, making it inconvenient to jump between sections. This tool analyzes the layout of the document and extracts certain text as outline based on some heuristics. The result may not be perfect, but can still be useful.

It only works on software-generated PDF and does not support scanned PDF. It is primarily tested on papers (see example folder for some open access ones), but may also work on longer documents such as books.

A Zotero plugin was originally planned, but a similar feature has been built into Zotero.

Other tools with similar functionality

  • Google Scholar PDF Reader (not written to file)
  • Zotero 7 (not written to file)
  • github.com/hueyy/pdf_scout (an inspiration)
  • github.com/cdevereaux/automatic_pdf_outline (semi-automatic)
  • Some PDF suites maybe

About

automatically extract outline from PDF

Resources

License

Stars

Watchers

Forks

Packages

No packages published