Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Improve best compression's match selection #705

Merged
merged 1 commit into from Dec 2, 2022

Commits on Dec 2, 2022

  1. zstd: Improve best compression's match selection

    The best encoder selects matches based on the criterion
    
    	a.est+(a.s-b.s)*bitsPerByte>>10 < b.est+(b.s-a.s)*bitsPerByte>>10
    
    If this were computed on the reals, it would be equivalent to
    a.est < b.est, so the added terms only capture round-off error
    (this is also why CSE doesn't eliminate them).
    
    Changing the formula to
    
    	a.est-b.est+(a.s-b.s)*bitsPerByte>>10 < 0
    
    captures the intention better, I think, and improves compression:
    
    	enwik9           260989017 259699309 -0.4942%
    	silesia/dickens    3233958   3222189 -0.3639%
    	silesia/mozilla   16980973  16912341 -0.4042%
    	silesia/mr         3505223   3505553  0.0094%
    	silesia/nci        2313702   2289871 -1.0300%
    	silesia/ooffice    2915199   2896410 -0.6445%
    	silesia/osdb       3364752   3390871  0.7763%
    	silesia/reymont    1658404   1656006 -0.1446%
    	silesia/samba      4330660   4326783 -0.0895%
    	silesia/sao        5399736   5416932  0.3185%
    	silesia/webster    9987784   9966351 -0.2146%
    	silesia/xml         542081    538378 -0.6831%
    	silesia/x-ray      5756210   5733061 -0.4022%
    
    ... as well as throughput:
    
    	name                              old speed      new speed      delta
    	Encoder_EncodeAllSimple/best-8    12.1MB/s ± 1%  12.2MB/s ± 1%  +1.17%  (p=0.000 n=18+20)
    	Encoder_EncodeAllSimple4K/best-8  10.4MB/s ± 1%  10.5MB/s ± 1%  +0.82%  (p=0.000 n=20+20)
    greatroar committed Dec 2, 2022
    Copy the full SHA
    2b47341 View commit details
    Browse the repository at this point in the history