Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is Sharpcompress slow to compress large files #818

Open
oufu99 opened this issue Mar 20, 2024 · 7 comments
Open

Why is Sharpcompress slow to compress large files #818

oufu99 opened this issue Mar 20, 2024 · 7 comments

Comments

@oufu99
Copy link

oufu99 commented Mar 20, 2024

I used Sharpcompress to compress a folder into a zip file. There is only one file in this folder, which is a 1.8G mkv file. It took me 2 minutes to compress it. Why does it take so long? Is there a problem with my code or environment
code: sharpcompress version : 0.36.0
using (var archive = ZipArchive.Create())
{
archive.AddAllFromDirectory("E:\test\");
archive.SaveTo("E:\temp.zip", CompressionType.Deflate);
}

@btomblinson
Copy link
Contributor

btomblinson commented Mar 22, 2024

I don’t care what file type it is 1.8G is massive. I’ve compressed a lot less that had similar performance times using 7Zip or native Windows zip util. Or try using VLC to convert that to a .mp4 and clock that. With that size of file and doing any type of CPU intensive task hardware is the most important factor.

If you upload the file I can try and benchmark it but unless @adamhathcock disagrees you need to provide more evidence that this library compresses that much slower

@adamhathcock
Copy link
Owner

I agree: a 1.8 file can take a long time. I'm not saying this library will be the fastest but any compression on a file that size will take a long.

This library allows more fine grained control and forward-only access which matters so you don't have to buffer files in memory. It's not going to be the fastest for raw time in compression, especially over a C++ implementation.

7zip as a format and library might be better for you as it allows multi-threaded compression but at the cost you cannot access the format in a forward-only manner.

@btomblinson
Copy link
Contributor

An alternative would be to wrap the sharpcompress logic in a method and use Task and asynchronous programming in your app if compressing the file is blocking it. Sharpcompress itself does not have asynchronous methods but it can be wrapped inside one.

@adamhathcock
Copy link
Owner

I've been reluctant to try async methods because Streams are often not really implementing async and compression is CPU bound so doesn't help.

Putting things on it's own thread can help with percieved performance if you don't want to lock your UI or something.

@oufu99
Copy link
Author

oufu99 commented Mar 27, 2024

I don’t care what file type it is 1.8G is massive. I’ve compressed a lot less that had similar performance times using 7Zip or native Windows zip util. Or try using VLC to convert that to a .mp4 and clock that. With that size of file and doing any type of CPU intensive task hardware is the most important factor.

If you upload the file I can try and benchmark it but unless @adamhathcock disagrees you need to provide more evidence that this library compresses that much slower

What confuses me is that using the same compression method to compress data of different sizes takes an exponential increase in time. For example, compressing 100M files only takes 1 second, 500M files takes 8 seconds, 1GB takes 30 seconds, and 2GB takes about 2 minutes

@adamhathcock
Copy link
Owner

In that case, there's probably something holding onto memory when it shouldn't be. Pooling might fix it.

using dotMemory or the like can reveal it.

@abelbraaksma
Copy link

abelbraaksma commented May 5, 2024

@oufu99, you may be hitting paging issues (responding to your "exponential" comment). But there are so much variables at play with performance that is impossible to say unless you help us help you.

Please post a minimal repro here in code that shows what methods you're using for the timings, and gives your hardware setup. If we cannot repro it, is very hard to give any but the most general advice, I'm sure you'll understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants