GitHub - Mind-Ext/PDFoutliner: automatically extract outline from PDF

PDFoutliner

Automatically extract outlines from software-generated PDF documents based on layout and text styles

Work in progress to improve the algorithm and add more test examples

CLI

# Requires Node 20+
npm install -g pdfoutliner
pdfoutliner -h
# Install globally and see options
# Alternatively run `npx pdfoutliner -h` without installation

pdfoutliner example.pdf
# outline will be added to new file example_outlined.pdf

pdfoutliner example.pdf -o txt
pdfoutliner example.pdf --fromtxt
# first save outline to example_outline.txt for manual edit
# then add outline from txt file to pdf

Web

Demo

Motivation

Some scientific papers (particularly preprints) don't include outline in the PDF, making it inconvenient to jump between sections. This tool analyzes the layout of the document and extracts certain text as outline based on some heuristics. The result may not be perfect, but can still be useful.

It only works on software-generated PDF and does not support scanned PDF. It is primarily tested on papers (see example folder for some open access ones), but may also work on longer documents such as books.

A Zotero plugin was originally planned, but a similar feature has been built into Zotero.

Other tools with similar functionality

Google Scholar PDF Reader (not written to file)
Zotero 7 (not written to file)
github.com/hueyy/pdf_scout (an inspiration)
github.com/cdevereaux/automatic_pdf_outline (semi-automatic)
Some PDF suites maybe

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
cli		cli
example		example
mupdf		mupdf
shared		shared
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDFoutliner

CLI

Web

Motivation

Other tools with similar functionality

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Mind-Ext/PDFoutliner

Folders and files

Latest commit

History

Repository files navigation

PDFoutliner

CLI

Web

Motivation

Other tools with similar functionality

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages