Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SharpZipLib .FastZip.ExtractZip for zip file with 2 or more big data files ~1.5GB (which are not zip) outputs a partial result #729

Open
WaseemK88 opened this issue Feb 21, 2022 · 11 comments

Comments

@WaseemK88
Copy link

Steps to reproduce

1.Create a ZIP with 3 folders.
2.Put in one of the folders 2 big (1.5GB) data files or more (txt, File or any type but not a zip)
3.Extract the Zip using FastZip.ExtractZip method

Expected behavior

Getting the same files as in the zip before extracting it

Actual behavior

Only one of the big files is being extracted, the others do not appear.

Version of SharpZipLib 1.3.1

@piksel
Copy link
Member

piksel commented Feb 22, 2022

Is this a single archive, or can you reproduce it with different content?

@WaseemK88
Copy link
Author

I can reproduce with different content.
Try "the steps to reproduce".

@piksel
Copy link
Member

piksel commented Feb 23, 2022

How are you creating the file? Using SharpZipLib or something else?

@piksel
Copy link
Member

piksel commented Feb 23, 2022

Tried to reproduce using these steps:

$ mkdir -p /tmp/006/foo
$ mkdir -p /tmp/006/bar
$ mkdir -p /tmp/006/baz

$ dd if=/dev/urandom of=/tmp/006/bar/file1 bs=1M count=1500
1500+0 records in
1500+0 records out
1572864000 bytes (1.6 GB, 1.5 GiB) copied, 27.5715 s, 57.0 MB/s

$ dd if=/dev/urandom of=/tmp/006/bar/file2 bs=1M count=1500
1500+0 records in
1500+0 records out
1572864000 bytes (1.6 GB, 1.5 GiB) copied, 28.0475 s, 56.1 MB/s

$ cd /tmp/006

$ zip -rv 006.zip foo/ bar/ baz/
  adding: foo/    (in=0) (out=0) (stored 0%)
  adding: bar/    (in=0) (out=0) (stored 0%)
  adding: bar/file1 ......................................................................................................................................................     (in=1572864000) (out=1573118152) (deflated 0%)
  adding: bar/file2 ......................................................................................................................................................     (in=1572864000) (out=1573118278) (deflated 0%)
  adding: baz/    (in=0) (out=0) (stored 0%)
total bytes=3145728000, compressed=3146236430 -> 0% savings

Analyzing that file using ArchiveDiag produces the following report:
https://pub.p1k.se/sharpziplib/archivediag/issue729.html

Everything seems to be read fine, and all files are listed.

@WaseemK88
Copy link
Author

WaseemK88 commented Feb 23, 2022

@piksel , I am not seeing the "Extract the Zip using FastZip.ExtractZip method" step.
The problem is when I extract the files using FastZip.ExtractZip part of the files are not extracted.

@piksel
Copy link
Member

piksel commented Feb 23, 2022

PS> dotnet new console
The template "Console App" was created successfully.

Processing post-creation actions...
Running 'dotnet restore' on C:\wrk\006\tester\tester.csproj...
  Determining projects to restore...
  Restored C:\wrk\006\tester\tester.csproj (in 63 ms).
Restore succeeded.

PS C:\wrk\006\tester> dotnet add package sharpziplib
  Determining projects to restore...
  Writing C:\Users\nilma.CONFIGURA\AppData\Local\Temp\tmpB871.tmp
info : Adding PackageReference for package 'sharpziplib' into project 'C:\wrk\006\tester\tester.csproj'.
info :   GET https://api.nuget.org/v3/registration5-gz-semver2/sharpziplib/index.json
info :   OK https://api.nuget.org/v3/registration5-gz-semver2/sharpziplib/index.json 139ms
info : Restoring packages for C:\wrk\006\tester\tester.csproj...
info : Package 'sharpziplib' is compatible with all the specified frameworks in project 'C:\wrk\006\tester\tester.csproj'.
info : PackageReference for package 'sharpziplib' version '1.3.3' added to file 'C:\wrk\006\tester\tester.csproj'.
info : Writing assets file to disk. Path: C:\wrk\006\tester\obj\project.assets.json
log  : Restored C:\wrk\006\tester\tester.csproj (in 64 ms).

Program.cs:

new ICSharpCode.SharpZipLib.Zip.FastZip().ExtractZip("../006.zip", "output", "");
PS> dotnet run
PS> ls .\output\bar\
    Directory: C:\wrk\006\tester\output\bar

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          2022-02-23    13:32     1572864000 file1
-a---          2022-02-23    13:33     1572864000 file2

@WaseemK88
Copy link
Author

WaseemK88 commented Feb 24, 2022

Thanks for the quick response, @piksel.
My tests also were ok when I used FastZip to create and extract the zip file, no matter what the data size or the file structure of it.

But had issues when tried to extract a zip file that was zipped by Windows Compression (By right clicking on the files/folders you want to zip -> Send to -> Compressed (zipped) folder. Of course used FastZip.ExtractZip method.
Note:
When the files are small, FastZip has no issues to extract them.

Here is the code I used to create the data files:

using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
/// <summary>
/// This will be used by developers "ONLY" in order to play with the code during development.
/// </summary>
namespace DevPlayground
{
    public class Program
    {
        static void Main()
        {
            List<int> fileSizesInMB = new List<int> { 1000, 2000, 1000, 100, 50 };
            string folderPath = $"PATH_IN_YOUR_DRIVE";
            CreateRandomFiles(fileSizesInMB, folderPath);
        }

        private static void CreateRandomFiles(IEnumerable<int> filesSize, string baseFolder)
        {
            Directory.CreateDirectory(baseFolder);
            var filesAndSizeDic = new Dictionary<string, int>();
            foreach (var size in filesSize)
            {
                string fileName = Guid.NewGuid().ToString();
                string filePath = Path.Combine(baseFolder, fileName);
                filesAndSizeDic[filePath] = size;
            }

            CreateRandomFiles(filesAndSizeDic);
        }

        private static void CreateRandomFiles(Dictionary<string,int> filesAndsizeInMb)
        {
            const int blockSize = 1024 * 8;
            const int blocksPerMb = (1024 * 1024) / blockSize;

            byte[] data = new byte[blockSize];
            foreach (var fileAndSizeInMB in filesAndsizeInMb)
            {
                using (RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider())
                {
                    using (FileStream stream = File.OpenWrite(fileAndSizeInMB.Key))
                    {
                        for (int i = 0; i < fileAndSizeInMB.Value * blocksPerMb; i++)
                        {
                            crypto.GetBytes(data);
                            stream.Write(data, 0, data.Length);
                        }
                    }
                }
            }
        }
    }
}

@piksel
Copy link
Member

piksel commented Feb 25, 2022

Windows zip support is awful. I couldn't even create an archive:
image

Even if I were able to create one, it might be some other specific thing about the .zip that causes the issue. You can generate a report of your file using the ArchiveDiag tool and put the result in a gist (it will create a .zip.html of the output). That way I can take a look at what the structure of your file, and perhaps determine what the problem is.

@WaseemK88
Copy link
Author

WaseemK88 commented Feb 28, 2022

Thanks @piksel, I ran the tool on a 5GB+3GB zip file that was created using "Windows Compression" feature.
See attached output files
ArchiveDiagOut.zip
.

@piksel
Copy link
Member

piksel commented Feb 28, 2022

Yeah, those files use Deflate64 which is a proprietary format that is not supported by SharpZipLib.

...also, what do you mean by "Windows Encryption"? That the files existed on a bitlocker-encrypted NTFS volume? That has no impact on the zip-file, as the files are transparently decrypted on being read.

@WaseemK88
Copy link
Author

@piksel , thank you!
I wanted to say "Windows Compression".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants