Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming to unseekable stream produces invalid archive #545

Closed
VanCoding opened this issue Dec 4, 2020 · 11 comments
Closed

Streaming to unseekable stream produces invalid archive #545

VanCoding opened this issue Dec 4, 2020 · 11 comments

Comments

@VanCoding
Copy link

There seems to be a Regression in SharpZipLib when upgrading from netcoreapp2 to net5.
In net5, creating a ziparchive on the fly, targeting an unseekable stream, produces an invalid archive.
In netcoreapp2, this is not the case.

I've documented this in the following repo:
https://github.com/VanCoding/SharpZipLib-unseekable-bug

This should make it as convenient as possible to reproduce it.

@Numpsy
Copy link
Contributor

Numpsy commented Dec 5, 2020

Looks to me like the .NET Core 2.0 version of ZipArchive actually creates the entry with Deflate compression rather than storing it, whereas the Core 3.1 and .NET 5 versions do store it.

With .NET Core 2:

image

With .NET 5:

image

ZipInputStream can't extract stored entries that have a descriptor (#517), so an error would be expected in this case.

As to why the .NET versions might behave differently - dotnet/corefx#33345 perhaps?

@piksel
Copy link
Member

piksel commented Dec 6, 2020

Yeah, as @Numpsy pointed out, that is probably the reason for why they are different.
When you are using an unseekable stream the Descriptor flag is set and the size of the entries are added after the entry data. This is because it's not possible to seek back to before the data and put the length after you are done writing.
Before the change in dotnet/runtime this was solved by always using a deflate compression stream which itself contains enough information to find the Descriptor.

with this change the content is no longer wrapped in a useless deflate stream

...unless you're reading the archive as a stream :/

I am a bit confused though, you create the zip file with System.IO.Compression.ZipArchive and read it with ICSharpCode.SharpZipLib.Zip.ZipInputStream, but you say that we have a regression in creating them?

@Numpsy
Copy link
Contributor

Numpsy commented Dec 6, 2020

fwiw, I tried building the SharpZipLib unit tests as .NET 5, and they all passed.

@piksel
Copy link
Member

piksel commented Dec 6, 2020

Yeah, I have been running almost exclusively .NET 5 since first RC.

There is a fix for this, but it needs to be done in System.IO.Compression.ZipArchive; and that is to write the correct sizes in the header for Store entries. Normally that is not possible for unseekable streams, but if there is no compression, that means that Compressed Size and Uncompressed Size are the same (and you can write the size before writing the data).

This is what they output now: .NET 5 unseekable.zip

@VanCoding
Copy link
Author

VanCoding commented Dec 6, 2020

I am a bit confused though, you create the zip file with System.IO.Compression.ZipArchive and read it with ICSharpCode.SharpZipLib.Zip.ZipInputStream, but you say that we have a regression in creating them?

Aaahh, yes, I didn't notice this. Does this mean I could do the same using SharpZipLib and it'd work?

But yeah, seems to be a bug of System.IO.Compression.ZipArchive then. Feel free to close this.

@piksel
Copy link
Member

piksel commented Dec 7, 2020

If I recall correctly, yes, but I'd have to check the source since I am not sure what has been merged and not. I believe we still do the deflate "trick" for non-compressed data.

@Numpsy
Copy link
Contributor

Numpsy commented Dec 7, 2020

Think that's the bit that #407 would have changed, before it was dropped for this sort of reason?

@piksel
Copy link
Member

piksel commented Dec 7, 2020

Yeah, I mean, #407 would've done the same as the change in System.IO.Compression. And it was dropped for this reason, the archives could not be read by ZipInputStream anymore.
We could implement the "correct" fix in ZipFile by having a special case where we pre-calculate the CRC and add the correct sizes to the entry. Then if those conditions are met, we can skip the Descriptor tag and just write the header normally.
This won't work for ZipOutputStream though, as the data is just written into it and it has no chance of calculating the CRC.

@Numpsy
Copy link
Contributor

Numpsy commented Dec 11, 2020

This won't work for ZipOutputStream though, as the data is just written into it and it has no chance of calculating the CRC.

You could maybe have a variant of PutNextEntry that took the details of an entry and the data, then compressed the data in memory, did the CRC and such, and wrote the data to the output stream in a one-shot operation? (not sure if that quite fits with the architecture though, and could need an arbitrarty amount of memory to just compress in memory)

@piksel
Copy link
Member

piksel commented Dec 12, 2020

@Numpsy, sure, but you could also do that today by just writing to a MemoryStream instead and then flushing it to the output stream after each added entry. That way we don't need to add extra complexity to the internals either. It would also be more transparent to the consumer that it's buffering gigabytes of data (if that is the case) instead of it just crashing when using a set of options that you can't corelate to the data being buffered.

@piksel
Copy link
Member

piksel commented Feb 6, 2021

Closing this as it is due to changes in System.IO.Compression that makes it produce non-standard files that needs the Central Directory to be decompressed . Support for handling uncompressed data could still be improved, but this issue is probably not the place to track it.

@piksel piksel closed this as completed Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants