-
Notifications
You must be signed in to change notification settings - Fork 401
Open
Description
Here's a minimal-ish repro:
from fsspec.implementations.http import HTTPFileSystem
remote_path = "https://huggingface.co/api/datasets/abisee/cnn_dailymail/parquet/3.0.0/train/0.parquet"
expected_data_size = 256540614
filesystem = HTTPFileSystem()
with filesystem.open(remote_path) as file:
total_read = 0
while data := file.read(256 * 1024):
total_read += len(data)
assert (
total_read == expected_data_size
), f"Data mismatch: {total_read} != {expected_data_size}"
# AssertionError: Data mismatch: 5767168 != 256540614
This issue causes Ray Data's from_huggingface
API to break. See ray-project/ray#54101
Metadata
Metadata
Assignees
Labels
No labels