Fix incorrect header size assumption #22

ngicks · 2025-06-26T12:22:34Z

Closes #21

I propose fix for the issue but I'm not sure this should be merged, since it needs copying Go's std library and reconstruction of sparse infomation.
This adds complexity to codebase but I believe the part I've copied from std is stable enough to maintain.

The current implementation assumes headers are always only single block long.

But as per spec and archive/tar implementation, some kind of headers may span multiple blocks.

PAXHeader Records
GNULongName
Sparse headers: see GNU Tar Manual: Storing Sparse Files

All of them may span multiple blocks or modification header which only relevant to a next tar entry.

I think we have roughly 3 options:

1. ultimately go different way. Implement our own tar parser
1. Read entire *tar.Reader after each time *tar.Reader.Next is called, and collect offset of start and end of header.
1. Skip reading *tar.Reader, only collect header end offsets but reconstruct sparse infomation. And then calculate actual body length by subtracting sparse hole sizes from body size.

1. and 2. are not option because they are too large to have or too slow for large tar archives.

I took 3.. This needs copying of Go std library.

I've first added tests where it reads some tar archives from $(go env GOROOT)/src/archive/tar/testdata using both archive/tar and *go-tarfs.Fs, then compare bytes read.

There was only one, but apparantly failing case.

$ /go/pkg/mod/golang.org/toolchain@v0.0.1-go1.17.linux-amd64/bin/go test ./...
--- FAIL: TestFs_testdata_from_stdlib (0.22s)
    fs_std_test.go:131: 
                Error Trace:    fs_std_test.go:131
                Error:          Received unexpected error:
                                archive/tar: invalid tar header
                Test:           TestFs_testdata_from_stdlib
                Messages:       reading "/home/watage/.local/go/src/archive/tar/testdata/pax-nil-sparse-data.tar" with *Fs
    --- FAIL: TestFs_testdata_from_stdlib/pax-nil-sparse-data.tar (0.00s)
        testing.go:1169: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
FAIL
FAIL    github.com/nlepage/go-tarfs     0.263s
ok      github.com/nlepage/go-tarfs/benchmarks  (cached) [no tests to run]
FAIL

After fix, it passes.

$ /go/pkg/mod/golang.org/toolchain@v0.0.1-go1.17.linux-amd64/bin/go test ./...
ok      github.com/nlepage/go-tarfs     0.282s
ok      github.com/nlepage/go-tarfs/benchmarks  1.849s [no tests to run]

What I did is basically same thing I've done in my own tarfs implementation with slight simplification.

ngicks added 2 commits June 26, 2025 21:51

add failing test

48ce39f

fix incorrect header size assumption

89beded

ngicks force-pushed the fix-incorrect-header-size-assumption branch from 6419bab to 89beded Compare June 26, 2025 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix incorrect header size assumption #22

Fix incorrect header size assumption #22

ngicks commented Jun 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fix incorrect header size assumption #22

Are you sure you want to change the base?

Fix incorrect header size assumption #22

Conversation

ngicks commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngicks commented Jun 26, 2025 •

edited

Loading