HTML Inspector for PHP

These are PHP bindings for HTML Inspector.

Example

<?php

function extract_anchors(string $html_utf8, string $document_uri)
{
    $doc = new HtmlInspector\HtmlDocument($html_utf8);
    $base_node = $doc->select(0)->child()->name('html')->child()->name('head')->child()
        ->name('base')->iterate();
    $base = HtmlInspector\resolve_iri($doc->get_attribute($base_node, 'href'), $document_uri);
    $base ??= $document_uri;
    $selector = $doc->select(0)->descendant()->name('a')->attribute_starts_with('href', '#')->not();
    while (($node_a = $selector->iterate()) !== -1) {
        $href = $doc->get_attribute($node_a, 'href');
        $uri = HtmlInspector\resolve_iri($href, $base);
        print("$uri\n");
    }
}

Design decisions

PHP iterators are currently not implemented

I have thought back and forth whether to implement PHP iterators to loop through nodes. How PHP implements iterators is awkward. Firstly, two redundant implementations are needed to support looping with foreach and to implement the Iterator interface. Moreover, it needs the two methods next (with no return value) and current instead of just one, we have to implement a caching of both the current value and of the validity state of the iterator, and in current we conditionally have to make one implicit iteration. Python is an example where iteration is implemented more elegantly using a single __next__ method that both iterates and then returns the current value. Another complication is how to encode the non-existence of a node. With PHP iterators, we need to use the value false and implement union type hints and a respective check for the get_* methods to enable a concise syntax. Without iterators, we can use the value -1 and pass it to the C functions without further checks.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
composer.json		composer.json
config.m4		config.m4
html_inspector.c		html_inspector.c
html_inspector_php.c		html_inspector_php.c
test-html.html		test-html.html
test.php		test.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HTML Inspector for PHP

Example

Design decisions

PHP iterators are currently not implemented

About

Uh oh!

Languages

Jumping-Beaver/HTML_Inspector_for_PHP

Folders and files

Latest commit

History

Repository files navigation

HTML Inspector for PHP

Example

Design decisions

PHP iterators are currently not implemented

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages