Skip to content

date_published incorrectly uses current date when <abbr class="published"> contains valid datetime #762

@julia2404

Description

@julia2404

Bug: Incorrect date_published when parsing valid <abbr class="published">

Description:

When parsing this page:
👉 https://www.progressive-charlestown.com/2011/04/peeps-wrap-up-for-2011.html

The parser returns:
"date_published": "2025-05-15T00:23:00.000Z"

However, the HTML clearly includes:
< abbr class='published' title='2011-04-25T00:23:00-04:00'>12:23:00 AM< /abbr >

This means the correct UTC datetime would be:
"date_published": "2011-04-25T04:23:00.000Z"

It seems the parser extracts the time from but incorrectly replaces the date with the current system date.

Expected behavior

The parser should correctly parse both date and time from the title attribute in , not just the time part.

Steps to reproduce

Use the latest version of the parser (npm or hosted) and parse the provided URL.

Environment:

Parser version: latest (GitHub)

Runtime: Node.js

Used via: Node script / Web API

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions