Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem reading archives containg Zip64 files #698

Open
MatthewSteeples opened this issue Dec 16, 2021 · 9 comments
Open

Problem reading archives containg Zip64 files #698

MatthewSteeples opened this issue Dec 16, 2021 · 9 comments

Comments

@MatthewSteeples
Copy link

Steps to reproduce

  1. Please see the test located at MatthewSteeples@c76de92

Expected behavior

File should extract normally and read 1 byte from each file
(we're experiencing this problem even when reading to the end of the Stream, this is just for illustrative purposes)

Actual behavior

When seeking to the end of file 2 (the large file) the following exception is thrown
ICSharpCode.SharpZipLib.Zip.ZipException : Data descriptor signature not found

Version of SharpZipLib

1.3.3 but also verified against master

Obtained from (only keep the relevant lines)

  • Compiled from source, commit: ff64d0a
  • Package installed using NuGet (1.3.3)

I'm afraid I can't spot anything obvious about what it might be. 7Zip happily opens the generated file and marks the 2 small files as version 20, with the large file being a version 45 and having a Zip64 descriptor (in Characteristics)

Hope that's enough information, but please let me know if there's anything else I can provide

Please note that this test will spit out 50mb tmp files that you'll need to clean up afterwards

@piksel
Copy link
Member

piksel commented Dec 17, 2021

Could you upload the generated zip file to https://archivediag.azurewebsites.net? I could generate it from the test as well, but I don't think I have the time to do it for a while, but with the report I can take a look. Unfortunately, it still says that the blob could not be found instead of "waiting for azure function to pick up the job". It shouldn't take more than 5 min though.

@MatthewSteeples
Copy link
Author

Hi @piksel,

It won't upload to there as the file is too large (50mb)

Failed to load resource: the server responded with a status of 413 (Request Entity Too Large)

@piksel
Copy link
Member

piksel commented Dec 18, 2021

Aight. I'll take a look.

@piksel
Copy link
Member

piksel commented Dec 18, 2021

Well, here is the report:
https://pub.p1k.se/sharpziplib/archivediag/issue-698.zip.html

The local header for the large entry has bit 4 (the Descriptor bit) set, which means that the actual size and CRC will follow after the compressed data. But there is no such descriptor following it. Instead, the sizes and CRC are only written to the "Central Header" (which is like a look-up directory for the file in the archive). This means that the zip file is corrrupt (or rather, out of spec) and cannot be read in a streaming matter. If it is accessed in a random-access way instead, it's technically possible to read it (which is why 7z for example can read it, since it only works with random-access files, not streams).
Actually, 7zip does show the file as having an error:
image
and running "Test" fails.

@piksel
Copy link
Member

piksel commented Dec 18, 2021

I'm not sure exactly what System.IO.Compression.ZipArchive does here, but it seems like a bug in their end. But in any case, if you use ICSharpCode.SharpZipLib.Zip.ZipFile instead of ZipInputStream it will use the central headers instead of the local ones (and it managed to extract the file perfectly fine when testing just now).

@piksel
Copy link
Member

piksel commented Dec 18, 2021

I altered your test code to use ZipOutputStream to generate the zip file, and it actually compressed it better (~25MiB vs ~50MiB), but slower (we are fully managed after all).
Now, the resulting file was actually also not possible to read using ZipInputStream (the last test), so there might be some bug here in any case...
Here is the report for that file, which shows the descriptor sections that are missing from the ZipArchive version of the file: https://pub.p1k.se/sharpziplib/archivediag/issue-698-expected.zip.html

@MatthewSteeples
Copy link
Author

@piksel Thanks for taking the time to have a look. I can't get 7zip to show me the same screen that you've got there. The file has a CRC, none of the files (large or small) have local in the characteristics, and running "Test" in 7zip reports that there are no errors. If I can reproduce what you're seeing then I'll happily take it to Microsoft. Are you sure the file had flushed by the time you're loading it?

@geracosta
Copy link

@MatthewSteeples Hello, did you fix this problem ? I'm having the same issue and I can't found the problem. I think that the problem is with the file size I'm trying to compress...

@piksel
Copy link
Member

piksel commented Nov 1, 2023

@geracosta Could you upload a file that shows the problem to https://archivediag.piksel.se/ ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants