Skip to content

xp-forge/pdf-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Parser

Build status on GitHub XP Framework Module BSD Licence Requires PHP 7.4+ Supports PHP 8.0+ Latest Stable Version

Parses PDF files to extract text and images.

Example

Low-level usage:

use com\adobe\pdf\PdfReader;
use util\cmd\Console;
use io\streams\FileInputStream;

$reader= new PdfReader(new FileInputStream($argv[1]));

// Create objects lookup table while streaming
$objects= $trailer= [];
foreach ($reader->objects() as $kind => $value) {
  if ('object' === $kind) {
    $objects[$value['id']->hashCode()]= $value['dict'];
  } else if ('trailer' === $kind) {
    $trailer+= $value;
  }
}

Console::writeLine('Trailer: ', $trailer);

// Optional meta information like author and creation date
if ($info= ($trailer['Info'] ?? null)) {
  Console::writeLine('Info: ', $objects[$info->hashCode()]);
}

// Root catalogue and pages enumeration
Console::writeLine('Root: ', $objects[$trailer['Root']->hashCode()]);
Console::writeLine('Pages: ', $objects[$trailer['Pages']->hashCode()]);

See also

About

PDF Parser

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages