Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tar file is empty with a size of zero bytes for small tar entry sizes #836

Closed
HofmeisterAn opened this issue Jun 16, 2023 · 4 comments
Closed
Labels
tar Related to TAR file format

Comments

@HofmeisterAn
Copy link

HofmeisterAn commented Jun 16, 2023

Describe the bug

The attached reproducer code shows an issue that occurs when the size of a tar entry is very small, just a few bytes. When attempting to create a tarball for a collection of small files, the tar file turns out to be empty with a size of zero bytes (calling flush etc. does not help). However, if the size of the tar entry is increased, the problem does not occur and the tar file is created correctly.

Reproduction Code

https://dotnetfiddle.net/QLhzBV

Steps to reproduce

  1. Run the .NET Fiddle reproducer

Expected behavior

The tar file should have a size greater than 0 bytes.

Operating System

Windows, macOS, Linux

Framework Version

.NET 7, .NET 6

Tags

Tar

Additional context

If you change the multiplier in the linked reproducer, for example to 10 (at line 12), the test will execute successfully. Furthermore, if the TarOutputStream does not own the underlying stream, it also works properly:

IsStreamOwner = false;
...
Close();
Assert.True(_stream.Length > 0, "The tar file has a size of zero bytes."); // Runs successfully.
@github-actions github-actions bot added the tar Related to TAR file format label Jun 16, 2023
@icsharpcode icsharpcode deleted a comment from SourceproStudio Jun 16, 2023
@piksel
Copy link
Member

piksel commented Jun 16, 2023

The tar data is not fully written until the TarOutputStream is closed (or when the buffer is flushed, which happens when the entry content is large enough). The .Length is the number of bytes that have been written to the output stream, which doesn't necessarily correlate to the number of bytes written to it.

Furthermore, if the TarOutputStream does not own the underlying stream, it also works properly.

Yes, that is how you should use this with a memory stream.
The underlying stream is normally closed to avoid leaking stream handles, but if you want to opt out of that you add the IsStreamOwner = false and take responsibility for disposing of the stream yourself.

@piksel piksel removed the bug label Jun 16, 2023
@HofmeisterAn
Copy link
Author

The tar data is not fully written until the TarOutputStream is closed (or when the buffer is flushed, which happens when the entry content is large enough).

I can not use a closed stream anymore (and for the open stream the data is not flushed). This becomes very difficult and inconvenient when dealing with small tar entries. Using my own stream (where the TarOutputStream is not the stream owner) can be used as a workaround, but it is inconvenient for developers and they may not even be aware of this requirement in the first place.

@piksel
Copy link
Member

piksel commented Jun 16, 2023

I can not use a closed stream anymore (and for the open stream the data is not flushed).

I don't know what you are trying to do, but you cannot create a valid tar file without closing it, since it needs to add the EOF blocks to the end. If you are extending the TarOutputStream like you are doing in the reproduction code example, then why don't you instead use a TarOutputStream?

I also don't understand how adding IsStreamOwner = false and then using the MemoryStream would be any less convenient. It doesn't matter how large the entries are, after the EOF chunks no new entries can be added, which is why the stream is closed.

Updated .NETFiddle with suggested usage:
https://dotnetfiddle.net/cAH5F2

@HofmeisterAn
Copy link
Author

since it needs to add the EOF blocks to the end.

Yes, you are correct. There is something I overlooked. For some reason, I thought that closing the stream would only add the EOF marker. However, since I need to send the stream to an HTTP endpoint, I cannot simply close it and be finished like I would usually do with a file stream.

I also don't understand how adding IsStreamOwner = false and then using the MemoryStream would be any less convenient.

By finalizing the tar stream (similar to CloseEntry) while keeping it open, I could save some lines of code. I may have been distracted by noticing that few bytes were neither written nor flushed (without closing it). By the way, that is exactly what I currently do. Thank you for your response and clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tar Related to TAR file format
Projects
None yet
Development

No branches or pull requests

2 participants