Skip to content

Jumping-Beaver/HTML_Inspector_for_PHP

Repository files navigation

HTML Inspector for PHP

These are PHP bindings for HTML Inspector.

Example

<?php

function extract_anchors(string $html_utf8, string $document_uri)
{
    $doc = new HtmlInspector\HtmlDocument($html_utf8);
    $base_node = $doc->select(0)->child()->name('html')->child()->name('head')->child()
        ->name('base')->iterate();
    $base = HtmlInspector\resolve_iri($doc->get_attribute($base_node, 'href'), $document_uri);
    $base ??= $document_uri;
    $selector = $doc->select(0)->descendant()->name('a')->attribute_starts_with('href', '#')->not();
    while (($node_a = $selector->iterate()) !== -1) {
        $href = $doc->get_attribute($node_a, 'href');
        $uri = HtmlInspector\resolve_iri($href, $base);
        print("$uri\n");
    }
}

Design decisions

PHP iterators are currently not implemented

I have thought back and forth whether to implement PHP iterators to loop through nodes. How PHP implements iterators is awkward. Firstly, two redundant implementations are needed to support looping with foreach and to implement the Iterator interface. Moreover, it needs the two methods next (with no return value) and current instead of just one, we have to implement a caching of both the current value and of the validity state of the iterator, and in current we conditionally have to make one implicit iteration. Python is an example where iteration is implemented more elegantly using a single __next__ method that both iterates and then returns the current value. Another complication is how to encode the non-existence of a node. With PHP iterators, we need to use the value false and implement union type hints and a respective check for the get_* methods to enable a concise syntax. Without iterators, we can use the value -1 and pass it to the C functions without further checks.

About

PHP bindings for HTML Inspector, a fast HTML parser and IRI resolver and normalizer. Mirrored from Codeberg.org.

Topics

Resources

Stars

Watchers

Forks

Languages