Skip to content

ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.

License

Notifications You must be signed in to change notification settings

riodevnet/elephscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐘 ElephScraper

ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.

Fast. Clean. Eleph-style scraping. 🐘⚡


🚀 Features

  • ✅ Extract metadata: title, description, keywords, author, charset, canonical, and more
  • ✅ Supports Open Graph, Twitter Card, CSRF tokens, and HTTP-equiv headers
  • ✅ Extract headings, paragraphs, images, lists, and links
  • ✅ Powerful filter() method with support for class/ID/tag-based selectors
  • ✅ Return raw HTML or clean plain text
  • ✅ Clean return types: string, array, or associative array
  • ✅ Built with Guzzle + Symfony DomCrawler + CssSelector

📦 Installation

Install via Composer:

composer require riodevnet/elephscraper

Requires PHP 7.4 or newer.


🛠️ Basic Usage

<?php

require_once __DIR__ . '/vendor/autoload.php';

use Riodevnet\Elephscraper\ElephScraper;

$scraper = new ElephScraper("https://example.com");

echo $scraper->title(); // "Welcome to Example.com"
echo $scraper->description(); // "Example site for testing"
print_r($scraper->h1()); // ["Main Title", "News"]
print_r($scraper->openGraph());

🧪 Available Methods

🔹 Page Metadata

$scraper->title();
$scraper->description();
$scraper->keywords();
$scraper->keywordString();
$scraper->charset();
$scraper->canonical();
$scraper->contentType();
$scraper->author();
$scraper->csrfToken();
$scraper->image();

🔹 Open Graph & Twitter Card

$scraper->openGraph();                 // All OG meta
$scraper->openGraph("og:title");      // Specific OG tag

$scraper->twitterCard();              // All Twitter tags
$scraper->twitterCard("twitter:title");

🔹 Headings & Text

$scraper->h1();
$scraper->h2();
$scraper->h3();
$scraper->h4();
$scraper->h5();
$scraper->h6();
$scraper->p();

🔹 Lists

$scraper->ul(); // all <ul><li> text
$scraper->ol(); // all <ol><li> text

🔹 Images

$scraper->images();         // just src URLs
$scraper->imageDetails();   // src, alt, title

🔹 Links

$scraper->links();        // just hrefs
$scraper->linkDetails();  // full detail

🔍 Custom DOM Filtering

▸ Example: Filter Single Element

$scraper->filter(
    element: 'div',
    attributes: ['id' => 'main'],
    multiple: false,
    extract: ['.title', '#desc', 'p'],
    returnHtml: false
);

▸ Example: Filter Multiple Elements

$scraper->filter(
    element: 'div',
    attributes: ['class' => 'card'],
    multiple: true,
    extract: ['h2', '.subtitle', '#info'],
    returnHtml: false
);

▸ Example: Return HTML Content

$scraper->filter(
    element: 'section',
    attributes: ['class' => 'hero'],
    returnHtml: true
);

Extract selectors support:

  • Tag names: h1, p, span, etc.
  • Class: .className
  • ID: #idName

Output keys auto-normalized to original selector.

🤝 Contributing

Found a bug? Want to add features? Open an issue or create a pull request!


📄 License

MIT License © 2025 — ElephScraper


🔗 Related Libraries


💡 Why ElephScraper?

ElephScraper is your dependable PHP elephant — strong, smart, and always ready to extract the right data.

About

ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages