Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] LZMA2 Compression? / LZMA Compression level? #773

Open
Shivansps opened this issue Oct 6, 2023 · 8 comments
Open

[Question] LZMA2 Compression? / LZMA Compression level? #773

Shivansps opened this issue Oct 6, 2023 · 8 comments

Comments

@Shivansps
Copy link

Hi, first off i wanted to thank everyone involved in this project for all the amazing work, ive been using this lib for a while and its great.

-First i wanted to ask about the status of LZMA2 compression, to me knowledge there is only LZMA compression support, it may already be in and i might not know about it.

-Second, there is any LZMA compression level option? because comparing to the final size of a file compressed with 7z or the LZMA C# SDK at max level the files produced by SharpCompress are considerably bigger.

Thank you.

@adamhathcock
Copy link
Owner

LZMA2 could be added. Would need some source and a PR

I think the compression level is setable directly on LZMAStream but not exposed outwardly. Needs a PR

@GamingCity
Copy link

GamingCity commented Oct 11, 2023

I took a look here

new LzmaEncoderProperties(!originalStream.CanSeek),

Default LZMA settings are 32 fastbytes and 1MB dictionary (1<<20)

I testes compressing a single 26MB file:
SharpCompress
5.30MB 32 fastbytes / 1MB dictionary
5.29MB 32 fastbytes / 8MB dictionary
2.92MB 32 fastbytes / 16MB dictionary
2.79MB 64 fastbytes / 16MB dictionary
5.13MB 128 fastbytes / 1MB dictionary
5.13MB 128 fastbytes / 8MB dictionary
2.80MB 128 fastbytes / 16MB dictionary
2.79MB 160 fastbytes / 16MB dictionary
2.79MB 192 fastbytes / 16MB dictionary
2.79MB 256 fastbytes / 16MB dictionary

7z(22.01)
2.82MB 48 wordsize / Ultra / LZMA / 64MB dictionary
2.79MB 64 wordsize / Ultra / LZMA / 16MB dictionary
2.79MB 64 wordsize / Ultra / LZMA / 64MB dictionary
2.79MB 64 wordsize / Ultra / LZMA2 / 64MB dictionary
2.78MB 128 wordsize / Ultra / LZMA / 64MB dictionary

So its not only the compression level (fastbytes) but the dictionary size as well. Be warned that compressing with a 16MB dictionary uses a lot more ram in sharpcompress.

But that was a single file, i went ahead and compressed the 940MB folder
2:38 - 152MB 7z(22.01) LZMA Max 16MB Dictionary 64 words
2:58 - 148MB 7z(22.01) LZMA Ultra 16MB Dictionary 64 words
2:03 - 143MB 7z(22.01) LZMA2 Ultra 64MB Dictionary 64 words

16:23 - 184MB Sharpcompress default
25:43 - 153MB Sharpcompress 64 fastbytes / 16MB dictionary

35m - 152MB Sharpcompress 128 fastbytes / 16MB dictionary

And ive been unable to get it down to 148MB like 7z Ultra does by increasing the fastbytes more, not sure why.

Yeah compression speed is what has me worried here because it is unusuable, not sure what kind of magic 7z does, but this is way too slow, and the c# LZMA sdk is just as slow as well.

@TimLCondor
Copy link

TimLCondor commented Oct 11, 2023

Note that the fastBytes parameter is not a higher=higher compression. The default in LZMA is 128 (of 273) and in my case 100 to 128 often resulted in a tiny bit better compression than the highest value (but mostly the same). Together with a dictionary of size (1<<23) I get nearly the same compression as 7zip.

SharpCompress is slow because it uses the official LZMA C# SDK, which is an unoptimized never-updated sloppy translation from 2013. 7zip uses the algorithm written and optimized in ANSI-C.

@GamingCity
Copy link

1<<24 dictionary size seems to be giving me the best results along with a fastbytes of at least 64 (default is 32)
7z has another setting called "wordsize" that seems to have a huge impact in final size, but im not seeing it on the lzma sdk.

I guess i will have to profile this to see were is wasting so much time.

@Shivansps
Copy link
Author

Shivansps commented Oct 13, 2023

Ive been looking at the methods were the LZMA SDK spends so much time and i only managed to get marginal gains in speed. Maybe someone with more C# experience than me can figure something out.

I cant belive the state of the LZMA SDK for C#, no LZMA2, no XZ and LZMA compression is just unusable and it was like this 10 years ago from what im seeing. C# is slower than C++ but come on, you cant take over 35 minutes to compress a 900mb folder.
Sorry for the little rant, but i just cant belive this.

@adamhathcock
Copy link
Owner

if you find an implementation that is faster then let me know.

@Shivansps
Copy link
Author

Shivansps commented Oct 13, 2023

Ive looked but i dont think there is one. But not giving up the attempts to fix it myself it is one method thats seems to be causing all the problems.

public uint GetMatches(uint[] distances)

This is one of those things were SIMD extensions maybe applied. But im not experienced in any of that.

@adamhathcock
Copy link
Owner

Good luck. I've never looked at compression algorithms myself. I was more concerned about the interface and maybe some archive formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants