Skip to content

Commit 8f48f7d

Browse files
committed
compact some of the input and output cells into one block
1 parent 9b6f15d commit 8f48f7d

File tree

1 file changed

+11
-14
lines changed

1 file changed

+11
-14
lines changed

parquet-cdc.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -624,26 +624,23 @@ df = pd.read_json(src)
624624

625625

626626
```python
627-
dst = "hf://datasets/kszucs/pq/hermes-2.5-cdc.parquet"
628-
df.to_parquet(dst, use_content_defined_chunking=True)
627+
>>> dst = "hf://datasets/kszucs/pq/hermes-2.5-cdc.parquet"
628+
>>> df.to_parquet(dst, use_content_defined_chunking=True)
629+
New Data Upload: 100%|███████████████████████████████████████████████| 799MB / 799MB, 197kB/s
630+
Total Bytes: 799M
631+
Total Transfer: 799M
629632
```
630633

631-
New Data Upload: 100%|███████████████████████████████████████████████| 799MB / 799MB, 197kB/s
632-
Total Bytes: 799M
633-
Total Transfer: 799M
634-
635-
636634

637635
```python
638-
short_df = df[[len(c) < 10 for c in df.conversations]]
639-
short_dst = "hf://datasets/kszucs/pq/hermes-2.5-cdc-short.parquet"
640-
short_df.to_parquet(short_dst, use_content_defined_chunking=True)
636+
>>> short_df = df[[len(c) < 10 for c in df.conversations]]
637+
>>> short_dst = "hf://datasets/kszucs/pq/hermes-2.5-cdc-short.parquet"
638+
>>> short_df.to_parquet(short_dst, use_content_defined_chunking=True)
639+
New Data Upload: 100%|███████████████████████████████████████████████| 21.9MB / 21.9MB, 45.4kB/s
640+
Total Bytes: 801M
641+
Total Transfer: 21.9M
641642
```
642643

643-
New Data Upload: 100%|███████████████████████████████████████████████| 21.9MB / 21.9MB, 45.4kB/s
644-
Total Bytes: 801M
645-
Total Transfer: 21.9M
646-
647644

648645

649646
```python

0 commit comments

Comments
 (0)