Streaming to unseekable stream produces invalid archive #545

VanCoding · 2020-12-04T17:59:47Z

There seems to be a Regression in SharpZipLib when upgrading from netcoreapp2 to net5.
In net5, creating a ziparchive on the fly, targeting an unseekable stream, produces an invalid archive.
In netcoreapp2, this is not the case.

I've documented this in the following repo:
https://github.com/VanCoding/SharpZipLib-unseekable-bug

This should make it as convenient as possible to reproduce it.

Numpsy · 2020-12-05T22:00:47Z

Looks to me like the .NET Core 2.0 version of ZipArchive actually creates the entry with Deflate compression rather than storing it, whereas the Core 3.1 and .NET 5 versions do store it.

With .NET Core 2:

With .NET 5:

ZipInputStream can't extract stored entries that have a descriptor (#517), so an error would be expected in this case.

As to why the .NET versions might behave differently - dotnet/corefx#33345 perhaps?

piksel · 2020-12-06T21:30:37Z

Yeah, as @Numpsy pointed out, that is probably the reason for why they are different.
When you are using an unseekable stream the Descriptor flag is set and the size of the entries are added after the entry data. This is because it's not possible to seek back to before the data and put the length after you are done writing.
Before the change in dotnet/runtime this was solved by always using a deflate compression stream which itself contains enough information to find the Descriptor.

with this change the content is no longer wrapped in a useless deflate stream

...unless you're reading the archive as a stream :/

I am a bit confused though, you create the zip file with System.IO.Compression.ZipArchive and read it with ICSharpCode.SharpZipLib.Zip.ZipInputStream, but you say that we have a regression in creating them?

Numpsy · 2020-12-06T22:11:53Z

fwiw, I tried building the SharpZipLib unit tests as .NET 5, and they all passed.

piksel · 2020-12-06T22:16:45Z

Yeah, I have been running almost exclusively .NET 5 since first RC.

There is a fix for this, but it needs to be done in System.IO.Compression.ZipArchive; and that is to write the correct sizes in the header for Store entries. Normally that is not possible for unseekable streams, but if there is no compression, that means that Compressed Size and Uncompressed Size are the same (and you can write the size before writing the data).

This is what they output now: .NET 5 unseekable.zip

VanCoding · 2020-12-06T23:57:18Z

I am a bit confused though, you create the zip file with System.IO.Compression.ZipArchive and read it with ICSharpCode.SharpZipLib.Zip.ZipInputStream, but you say that we have a regression in creating them?

Aaahh, yes, I didn't notice this. Does this mean I could do the same using SharpZipLib and it'd work?

But yeah, seems to be a bug of System.IO.Compression.ZipArchive then. Feel free to close this.

piksel · 2020-12-07T10:45:06Z

If I recall correctly, yes, but I'd have to check the source since I am not sure what has been merged and not. I believe we still do the deflate "trick" for non-compressed data.

Numpsy · 2020-12-07T12:18:27Z

Think that's the bit that #407 would have changed, before it was dropped for this sort of reason?

piksel · 2020-12-07T15:30:27Z

Yeah, I mean, #407 would've done the same as the change in System.IO.Compression. And it was dropped for this reason, the archives could not be read by ZipInputStream anymore.
We could implement the "correct" fix in ZipFile by having a special case where we pre-calculate the CRC and add the correct sizes to the entry. Then if those conditions are met, we can skip the Descriptor tag and just write the header normally.
This won't work for ZipOutputStream though, as the data is just written into it and it has no chance of calculating the CRC.

Numpsy · 2020-12-11T12:13:20Z

This won't work for ZipOutputStream though, as the data is just written into it and it has no chance of calculating the CRC.

You could maybe have a variant of PutNextEntry that took the details of an entry and the data, then compressed the data in memory, did the CRC and such, and wrote the data to the output stream in a one-shot operation? (not sure if that quite fits with the architecture though, and could need an arbitrarty amount of memory to just compress in memory)

piksel · 2020-12-12T12:45:29Z

@Numpsy, sure, but you could also do that today by just writing to a MemoryStream instead and then flushing it to the output stream after each added entry. That way we don't need to add extra complexity to the internals either. It would also be more transparent to the consumer that it's buffering gigabytes of data (if that is the case) instead of it just crashing when using a set of options that you can't corelate to the data being buffered.

piksel · 2021-02-06T17:10:13Z

Closing this as it is due to changes in System.IO.Compression that makes it produce non-standard files that needs the Central Directory to be decompressed . Support for handling uncompressed data could still be improved, but this issue is probably not the place to track it.

piksel closed this as completed Feb 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming to unseekable stream produces invalid archive #545

Streaming to unseekable stream produces invalid archive #545

VanCoding commented Dec 4, 2020

Numpsy commented Dec 5, 2020

piksel commented Dec 6, 2020 •

edited

Numpsy commented Dec 6, 2020

piksel commented Dec 6, 2020

VanCoding commented Dec 6, 2020 •

edited

piksel commented Dec 7, 2020

Numpsy commented Dec 7, 2020

piksel commented Dec 7, 2020

Numpsy commented Dec 11, 2020

piksel commented Dec 12, 2020

piksel commented Feb 6, 2021

Streaming to unseekable stream produces invalid archive #545

Streaming to unseekable stream produces invalid archive #545

Comments

VanCoding commented Dec 4, 2020

Numpsy commented Dec 5, 2020

piksel commented Dec 6, 2020 • edited

Numpsy commented Dec 6, 2020

piksel commented Dec 6, 2020

VanCoding commented Dec 6, 2020 • edited

piksel commented Dec 7, 2020

Numpsy commented Dec 7, 2020

piksel commented Dec 7, 2020

Numpsy commented Dec 11, 2020

piksel commented Dec 12, 2020

piksel commented Feb 6, 2021

piksel commented Dec 6, 2020 •

edited

VanCoding commented Dec 6, 2020 •

edited