Skip to content

Conversation

ngicks
Copy link

@ngicks ngicks commented Jun 26, 2025

Closes #21

I propose fix for the issue but I'm not sure this should be merged, since it needs copying Go's std library and reconstruction of sparse infomation.
This adds complexity to codebase but I believe the part I've copied from std is stable enough to maintain.

The current implementation assumes headers are always only single block long.

But as per spec and archive/tar implementation, some kind of headers may span multiple blocks.

All of them may span multiple blocks or modification header which only relevant to a next tar entry.

I think we have roughly 3 options:

    1. ultimately go different way. Implement our own tar parser
    1. Read entire *tar.Reader after each time *tar.Reader.Next is called, and collect offset of start and end of header.
    1. Skip reading *tar.Reader, only collect header end offsets but reconstruct sparse infomation. And then calculate actual body length by subtracting sparse hole sizes from body size.

1. and 2. are not option because they are too large to have or too slow for large tar archives.

I took 3.. This needs copying of Go std library.

I've first added tests where it reads some tar archives from $(go env GOROOT)/src/archive/tar/testdata using both archive/tar and *go-tarfs.Fs, then compare bytes read.

There was only one, but apparantly failing case.

$ /go/pkg/mod/golang.org/toolchain@v0.0.1-go1.17.linux-amd64/bin/go test ./...
--- FAIL: TestFs_testdata_from_stdlib (0.22s)
    fs_std_test.go:131: 
                Error Trace:    fs_std_test.go:131
                Error:          Received unexpected error:
                                archive/tar: invalid tar header
                Test:           TestFs_testdata_from_stdlib
                Messages:       reading "/home/watage/.local/go/src/archive/tar/testdata/pax-nil-sparse-data.tar" with *Fs
    --- FAIL: TestFs_testdata_from_stdlib/pax-nil-sparse-data.tar (0.00s)
        testing.go:1169: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
FAIL
FAIL    github.com/nlepage/go-tarfs     0.263s
ok      github.com/nlepage/go-tarfs/benchmarks  (cached) [no tests to run]
FAIL

After fix, it passes.

$ /go/pkg/mod/golang.org/toolchain@v0.0.1-go1.17.linux-amd64/bin/go test ./...
ok      github.com/nlepage/go-tarfs     0.282s
ok      github.com/nlepage/go-tarfs/benchmarks  1.849s [no tests to run]

What I did is basically same thing I've done in my own tarfs implementation with slight simplification.

@ngicks ngicks force-pushed the fix-incorrect-header-size-assumption branch from 6419bab to 89beded Compare June 26, 2025 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect assumption for header size
1 participant