Skip to content

Add support for CC-NEWS and validation for crawl reference on the CLI interface #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 18, 2025

Conversation

pjox
Copy link
Member

@pjox pjox commented Mar 7, 2025

Description

This PR introduces support for CC-NEWS and adds validations for the crawl or snapshot references. This PR also updates some libraries and bumps the rust edition to 2024 and the latest 1.85 version. It also bumps the library version to 0.6.0.

Breaking Changes

No Breaking changes

Notes & open questions

This PR closes issues #8 and #10.

pjox added 7 commits March 6, 2025 13:53
…sn't exist and cc-downloader downloaded the body of the response

Now this action will produce an error
…the 4XX error message when downloading paths, adds validation to the cli input for the crawl reference
… automatically fix the casing of the crawl reference
CC-NEWS support and validation for crawl reference
…d files and updated the README.md in order to prepare the next release
@pjox pjox added bug Something isn't working enhancement New feature or request labels Mar 7, 2025
@pjox pjox requested a review from thunderpoot March 7, 2025 09:11
@pjox pjox self-assigned this Mar 7, 2025
@pjox
Copy link
Member Author

pjox commented Mar 12, 2025

@thunderpoot, don't accept or review the PR yet, the reqwest crate introduced a regression in the latest point upgrade that breaks reqwest-middleware and thus breaks cc-downloader:

I think it is going to get fixed soon:

But this is indeed a problem. I'm thinking of including the Cargo.lock file as a solution, since I'm converting this crate to a library anyway.

reqwest-middleware is also working on a fix now:

… the reqwest deprecated API

TODO: We need to monitor the the open PRs in reqwest-middleware and bump the version of it here as soon as they are merged
@pjox
Copy link
Member Author

pjox commented Mar 13, 2025

@thunderpoot It should be safe to review now. I'll track the PR on reqwest-middleware and fix the problem long-term in a future point update

Copy link
Member

@thunderpoot thunderpoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Nice error messaging, and fantastic that it supports my laziness (not typing CC-MAIN or CC-NEWS in all caps). Approved 💯

@pjox pjox merged commit 5fb6ff4 into main Mar 18, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for CC-NEWS Incorrect Handling of Nonexistent or Mis-cased Paths
2 participants