Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve zip file iteration performance #843

Merged
merged 3 commits into from Aug 12, 2023

Conversation

lahma
Copy link
Contributor

@lahma lahma commented Aug 10, 2023

When traversing large zip files I noticed that SharpZipLib was allocating quite a lot of memory. So I found out that by pooling of costly Inflator instances a lot of that can be rectified. There's now new PooledInflator type that can be used by the library itself and is known to be safe to pool (it's rented from pool or is otherwise instance created by the library itself).

I also changed ZipFile to follow the common public struct enumerator pattern found in BCL data structures that removes unnecessary allocations

Benchmark test case updated and now targets oldest supported runtimes (.NET 4.6.2 and .NET 6.0). Unsupported runtimes are not installed by default.

Results


BenchmarkDotNet v0.13.7, Windows 11 (10.0.23516.1000)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 7.0.400
  [Host]     : .NET 6.0.21 (6.0.2123.36311), X64 RyuJIT AVX2
  Job-ERDPDJ : .NET 6.0.21 (6.0.2123.36311), X64 RyuJIT AVX2
  Job-HFJNUX : .NET Framework 4.8.1 (4.8.9179.0), X64 RyuJIT VectorSize=256


ICSharpCode.SharpZipLib.Benchmark.Zip.ZipInputStream

Diff Method Toolchain Mean Error Allocated
Old ReadZipInputStream .NET 6.0 151.4 ms 0.60 ms 104.54 KB
New .NET 6.0 151.7 ms (0%) 0.84 ms 72.15 KB (-31%)
Old ReadZipInputStream .NET Framework 4.6.2 150.2 ms 0.50 ms 106.05 KB
New .NET Framework 4.6.2 150.8 ms (0%) 0.70 ms 74 KB (-30%)
Old ReadLargeZipFile .NET 6.0 1,337.3 ms 8.18 ms 2412174.27 KB
New .NET 6.0 1,292.3 ms (-3%) 9.08 ms 548513.15 KB (-77%)
Old ReadLargeZipFile .NET Framework 4.6.2 1,499.4 ms 11.41 ms 2421273.81 KB
New .NET Framework 4.6.2 1,452.4 ms (-3%) 3.35 ms 556703.89 KB (-77%)

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

* change ZipFile enumeration to use common public struct pattern to virtual dispatch
Copy link
Member

@piksel piksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pooling looks sound in general, but I think it should be opt-in due to how it might affect more resource constrained setups.

@lahma
Copy link
Contributor Author

lahma commented Aug 12, 2023

Thank you for the review. As I mentioned in comment, the ZipFile scenario is hard to use when you cannot control to internal input stream creation. I created a fix commit that proposes to have InflaterPool.Size which is only public member and allows to disable the pooling by supplying a value of 0 or less.

What do you think about the approach? I would also gladly any commits from you into this part shaping the solution towards your liking, but also hoping that the benchmark's ZipFile API usage would be covered.

@piksel
Copy link
Member

piksel commented Aug 12, 2023

I think that would be a good compromise. I am not sure about making the pool public though... Perhaps there should be a public singleton that maps to the internals of the pool.

That way we have a clear consumer API for global library tweaks... Or perhaps I am just overthinking it.

The initial value would need to be 0 though, as the goal is to not accidently break anything (but again, maybe I am making this into a larger issue than it needs to be..)

* Make InflaterPool size configurable via SharpZipLibOptions.InflaterPoolSize
@lahma
Copy link
Contributor Author

lahma commented Aug 12, 2023

It's always good to keep downstream consumers happy! 😉 I've force pushed a proposal of adding root level static class for global options, SharpZipLibOptions, which has now sole property of InflaterPoolSize which defaults to 0. Benchmark updated to set the value to 4 (allocations still down 70%).

Copy link
Member

@piksel piksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The performance improvements are nice and this should not interfere with any existing usage (afaict).

@piksel piksel merged commit ff2d7c3 into icsharpcode:master Aug 12, 2023
15 checks passed
@lahma lahma deleted the pooled-inflator branch August 12, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants