Skip to content

Commit 6c243e0

Browse files
committed
fix for a bug introduced in the last commit.
1 parent 08ef6c0 commit 6c243e0

File tree

4 files changed

+7
-6
lines changed

4 files changed

+7
-6
lines changed

PDFScraper/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "1.1.7"
1+
__version__ = "1.1.8"
22

33
import logging
44

PDFScraper/core.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -278,19 +278,21 @@ def convert_to_pdf(document: Document, tessdata_location: str, config_options=""
278278
logger.error(e)
279279
sys.exit(1)
280280
pdf_writer = PdfFileWriter()
281+
pdf_files = []
281282
for filename in pdf_pages:
282283
pdf_file = open(filename, 'rb')
284+
pdf_files.append(pdf_file)
283285
pdf_reader = PdfFileReader(pdf_file)
284286
for i in range(pdf_reader.numPages):
285287
page = pdf_reader.getPage(i)
286288
pdf_writer.addPage(page)
287-
pdf_file.close()
288289
with open(tempfile.gettempdir() + "/PDFScraper" + "/" + document.filename + ".pdf", 'w+b') as out:
289290
pdf_writer.write(out)
290291
out.close()
291292
document.ocr_path = tempfile.gettempdir() + "/PDFScraper" + "/" + document.filename + ".pdf"
292293
# cleanup temporary files
293-
for filename in pdf_pages:
294+
for file, filename in zip(pdf_files, pdf_pages):
295+
file.close()
294296
os.remove(filename)
295297

296298

@@ -497,4 +499,3 @@ def find_words_tables(tables, search_mode, search_words, match_score):
497499
if found:
498500
result.append(table)
499501
return result
500-

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# PDFScraper
22
[![PyPI version](https://badge.fury.io/py/PDFScraper.svg)](https://badge.fury.io/py/PDFScraper)
33

4-
CLI program for searching text and tables inside of PDF documents and displaying results in HTML. It combines [Pdfminer.six](https://github.com/pdfminer/pdfminer.six), [Camelot](https://github.com/camelot-dev/camelot) and [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) in a single program, which is simple to use.
4+
CLI program and library for extraction of PDF elements, which implements a search functionality that outputs summary in an HTML format. It combines [Pdfminer.six](https://github.com/pdfminer/pdfminer.six), [Camelot](https://github.com/camelot-dev/camelot) and [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) in a single program, which is simple to use.
55

66
# How to use
77
### Install using pip

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
"yattag==1.14.0",
5252
],
5353
name="PDFScraper",
54-
version="1.1.7",
54+
version="1.1.8",
5555
author="Erik Kastelec",
5656
author_email="erikkastelec@gmail.com",
5757
description="PDF text and table search",

0 commit comments

Comments
 (0)