Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark comparisons against ISA-L #1478

Open
danielhrisca opened this issue Apr 28, 2023 · 16 comments
Open

benchmark comparisons against ISA-L #1478

danielhrisca opened this issue Apr 28, 2023 · 16 comments
Labels

Comments

@danielhrisca
Copy link

Hello all,

do you have comparison benchmarks against https://github.com/intel/isa-l ?

@KungFuJesus
Copy link
Contributor

Issues probably isn't the correct place to put this question. I have run benchmarks against their adler32 checksum implementation, and there at least, we are several times faster. This is only a small portion of the deflate/inflate algorithm, though. Does ISA-L provide a zlib-compatible interface to test against?

@danielhrisca
Copy link
Author

https://github.com/pycompression/Python-isal provides zlib functions

@KungFuJesus
Copy link
Contributor

We'd need C bindings to directly plug this into our existing benchmarks. That having been said, it's definitely possible to leverage python's zlib interfaces with zlib-ng for a direct comparison. That, however, would not been MR worthy into the repo. Unless you're suggesting we add benchmark data into the repo.

@KungFuJesus
Copy link
Contributor

KungFuJesus commented Apr 28, 2023

For your edification, I managed to build this project (though, it seems rely on Makefiles that aren't present, so I had to switch it to point to my distro's package of isa-l). Here's their benchmark output with --all when zlib-ng is injected instead of zlib for the comparison:

adam@eggsbenedict ~/scratch/python-isal/benchmark_scripts $ python ./benchmark.py --all
CRC32
name	isal	zlib	ratio
0b	0.03	0.04	0.92
8b	0.05	0.05	1.03
128b	0.06	0.07	0.93
1kb	0.08	0.1	0.78
8kb	0.33	0.41	0.8
16kb	0.58	0.72	0.81
32kb	1.94	1.38	1.41
64kb	5.87	2.63	2.23
Adler32
name	isal	zlib	ratio
0b	0.04	0.04	1.03
8b	0.05	0.06	0.88
128b	0.06	0.06	1.06
1kb	0.1	0.07	1.4
8kb	0.47	0.21	2.26
16kb	0.84	0.31	2.7
32kb	1.62	0.62	2.61
64kb	3.13	1.01	3.09
zlib compression
name	isal	zlib	ratio
0b	1.75	6.9	0.25
8b	2.1	6.68	0.31
128b	3.36	6.74	0.5
1kb	6.35	10.67	0.59
8kb	23.9	43.8	0.55
16kb	42.64	84.23	0.51
32kb	83.99	169.72	0.49
64kb	179.51	364.98	0.49
zlib decompression
name	isal	zlib	ratio
0b	1.04	0.24	4.27
8b	0.97	0.34	2.84
128b	1.72	2.01	0.85
1kb	3.7	4.53	0.82
8kb	13.6	19.47	0.7
16kb	24.77	35.96	0.69
32kb	47.17	84.41	0.56
64kb	100.46	175.69	0.57
gzip compression
name	isal	zlib	ratio
0b	2.75	7.7	0.36
8b	3.07	7.51	0.41
128b	4.48	7.61	0.59
1kb	7.35	10.96	0.67
8kb	24.58	45.29	0.54
16kb	43.75	86.28	0.51
32kb	84.87	173.04	0.49
64kb	180.26	370.43	0.49
gzip decompression
name	isal	zlib	ratio
0b	2.48	1.98	1.25
8b	2.66	2.25	1.18
128b	3.57	3.72	0.96
1kb	5.66	6.0	0.94
8kb	15.83	20.73	0.76
16kb	29.04	41.06	0.71
32kb	52.48	79.28	0.66
64kb	105.28	182.12	0.58
zlib Compress instantiation
name	isal	zlib	ratio
	0.19	6.46	0.03
zlib Decompress instantiation
name	isal	zlib	ratio
	0.14	0.13	1.13
Gzip Writer instantiation
name	isal	zlib	ratio
	14.8	11.67	1.27
Gzip Reader instantiation
name	isal	zlib	ratio
	9.62	4.83	1.99
zlib sizes
name	-1	0	1	2	3	4	5	6	7	8	9
0b	8.0	11.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0
8b	2.0	2.375	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0
128b	0.68	1.086	0.805	0.688	0.68	0.68	0.68	0.68	0.656	0.656	0.672
1kb	0.522	1.011	0.785	0.53	0.522	0.522	0.522	0.522	0.52	0.52	0.522
8kb	0.476	1.001	0.745	0.489	0.481	0.478	0.476	0.476	0.472	0.472	0.492
16kb	0.47	1.001	0.734	0.486	0.476	0.471	0.471	0.47	0.466	0.466	0.489
32kb	0.467	1.0	0.722	0.487	0.476	0.469	0.468	0.467	0.464	0.463	0.488
64kb	0.461	1.0	0.715	0.487	0.474	0.466	0.464	0.461	0.457	0.457	0.482
isal sizes
name	0	1	2	3
0b	119.0	8.0	8.0	8.0
8b	15.875	2.0	2.0	2.0
128b	1.695	0.703	0.703	0.68
1kb	0.886	0.531	0.531	0.543
8kb	0.721	0.492	0.493	0.493
16kb	0.692	0.488	0.488	0.486
32kb	0.67	0.489	0.487	0.487
64kb	0.657	0.487	0.485	0.486

So, what's apparent is that we're winning handedly in crc and adler checksums (which is funny because I think a lot of distributions use this as a "fast" implementation for things like block devices specifically for the CRC checksum implementation).

Now, they are winning, seemingly, on the compression and decompression tests. However, I did not write these benchmarks nor do I know what type of data they are compressing. The compression and decompression speeds are very data dependent, so it's not exactly a definitive test. There's also the caveat that isa-l shrinks 9 levels to only 3 and doesn't offer anywhere near the same compression ratios it would seem. Given how the encoding works, it wouldn't be surprising that isa-l is faster with poorer compression ratios.

@KungFuJesus
Copy link
Contributor

Another interesting observation is if I substitute their genome based test dataset for the one we commonly use (the large Silesia test corpus), we start winning a few benchmarks for decompression:

CRC32
name	isal	zlib	ratio
0b	0.04	0.04	0.96
8b	0.05	0.06	0.86
128b	0.06	0.06	0.95
1kb	0.08	0.09	0.82
8kb	0.33	0.41	0.81
16kb	0.65	0.72	0.91
32kb	4.48	1.37	3.27
64kb	10.29	2.62	3.92
128kb	19.89	5.14	3.87
512kb	78.47	20.22	3.88
Adler32
name	isal	zlib	ratio
0b	0.04	0.04	0.97
8b	0.05	0.06	0.86
128b	0.07	0.06	1.06
1kb	0.1	0.07	1.43
8kb	0.47	0.2	2.29
16kb	0.84	0.3	2.77
32kb	1.61	0.62	2.62
64kb	3.12	1.14	2.74
128kb	6.16	2.16	2.85
512kb	24.27	8.21	2.95
zlib compression
name	isal	zlib	ratio
0b	1.81	7.22	0.25
8b	2.15	6.43	0.33
128b	2.49	6.3	0.4
1kb	4.37	6.51	0.67
8kb	19.22	22.91	0.84
16kb	35.38	60.46	0.59
32kb	76.0	141.14	0.54
64kb	182.99	380.05	0.48
128kb	372.82	852.71	0.44
512kb	1568.38	3779.49	0.41
zlib decompression
name	isal	zlib	ratio
0b	1.09	0.28	3.86
8b	1.06	0.39	2.74
128b	0.98	0.5	1.97
1kb	2.26	2.34	0.97
8kb	16.17	12.84	1.26
16kb	29.47	29.24	1.01
32kb	56.14	72.1	0.78
64kb	128.79	180.63	0.71
128kb	272.76	396.72	0.69
512kb	1127.0	1671.24	0.67
gzip compression
name	isal	zlib	ratio
0b	3.1	7.3	0.42
8b	3.13	7.07	0.44
128b	3.53	7.3	0.48
1kb	5.43	7.21	0.75
8kb	19.97	23.93	0.83
16kb	36.07	61.47	0.59
32kb	76.93	142.36	0.54
64kb	183.56	382.6	0.48
128kb	373.01	856.26	0.44
512kb	1576.21	3822.68	0.41
gzip decompression
name	isal	zlib	ratio
0b	2.53	1.98	1.28
8b	2.61	2.2	1.19
128b	2.72	2.36	1.15
1kb	4.23	4.07	1.04
8kb	15.54	13.48	1.15
16kb	29.71	28.35	1.05
32kb	58.57	66.17	0.89
64kb	135.74	181.28	0.75
128kb	283.65	382.42	0.74
512kb	1199.43	1578.74	0.76
zlib Compress instantiation
name	isal	zlib	ratio
	0.2	7.01	0.03
zlib Decompress instantiation
name	isal	zlib	ratio
	0.16	0.14	1.12
Gzip Writer instantiation
name	isal	zlib	ratio
	15.47	11.99	1.29
Gzip Reader instantiation
name	isal	zlib	ratio
	9.78	4.87	2.01
zlib sizes
name	-1	0	1	2	3	4	5	6	7	8	9
0b	8.0	11.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0
8b	2.0	2.375	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0
128b	0.266	1.086	0.281	0.266	0.266	0.266	0.266	0.266	0.266	0.266	0.258
1kb	0.122	1.011	0.145	0.127	0.124	0.126	0.121	0.122	0.123	0.117	0.118
8kb	0.292	1.001	0.334	0.297	0.294	0.293	0.292	0.292	0.292	0.292	0.283
16kb	0.384	1.001	0.452	0.393	0.387	0.385	0.384	0.384	0.382	0.382	0.369
32kb	0.434	1.0	0.512	0.447	0.438	0.436	0.434	0.434	0.431	0.431	0.418
64kb	0.5	1.0	0.598	0.516	0.505	0.503	0.5	0.5	0.497	0.497	0.485
128kb	0.531	1.0	0.64	0.549	0.537	0.534	0.531	0.531	0.528	0.527	0.515
512kb	0.566	1.0	0.686	0.585	0.572	0.569	0.566	0.566	0.562	0.562	0.549
isal sizes
name	0	1	2	3
0b	119.0	8.0	8.0	8.0
8b	15.75	2.0	2.0	2.0
128b	1.156	0.281	0.281	0.281
1kb	0.251	0.133	0.133	0.13
8kb	0.353	0.299	0.299	0.302
16kb	0.468	0.393	0.393	0.396
32kb	0.521	0.449	0.447	0.447
64kb	0.603	0.525	0.52	0.517
128kb	0.647	0.563	0.554	0.551
512kb	0.696	0.603	0.59	0.587

This is, perhaps, a slightly more compressible dataset? In the past I've noticed that we do pretty well in sequences that aren't mostly encoding literals.

@KungFuJesus
Copy link
Contributor

Honestly I think the testing methodology in calling through python's interfaces is perhaps flawed as well but I can't really assess that fairly without understanding the user's actual use case. Here's my invoking pigz with 8 threads (well, processes), compression level 4, with zlib-ng injected:

adam@eggsbenedict ~/scratch/python-isal/tests/data $ time pigz -4  -k -p 8 test.fastq.copy 

real	0m0.382s
user	0m2.430s
sys	0m0.129s

Here's igzip, level 3, with 8 threads:

adam@eggsbenedict ~/scratch/python-isal/tests/data $ time igzip -T 8 -3 -k -z test.fastq.copy 

real	0m0.700s
user	0m0.573s
sys	0m0.073s

Granted, there's also a lot of other crap in process launch that that anecdote should be taken with a grain of salt but, I don't think that isa-l walks away with that many wins. And in all fairness, I'm testing their latest release code against our develop branch (what is about to be the release soon, though). There might be quite a few improvements in isa-l as well.

I think ISA-L's approach is similar to IPP's, no? Custom deflate dictionaries, if I'm reading this correctly (with some SIMD acceleration for some of the hot spots). That having been said, their adler32 code hasn't improved and from what I remember, we were significantly winning on the internal benchmark distributed with ISA-L for that one. Performance wins were limited by the total memory bandwidth, but when doing the "hot in cache" test, we were winning by significant margins. It's also interesting to see their "fast zero" code use AVX512 when in most of our measurements, the clock penalty made even short zeroing sequences more expensive to switch to 512 bit copy operations (or anything too compute-lite).

@Dead2 Dead2 changed the title benckmarks benchmark comparisons against ISA-L May 9, 2023
@powturbo
Copy link

@rhpvorderman
Copy link

@KungFuJesus For an entirely fair comparison ISA-L level 1 should be compared with zlib-ng level 2 or ISA-L level 0 with zlib-ng level 1. The compression ratio at ISA-L level 1 is much better than that of zlib-ng level 1.

The python-isal benchmark suite tests ISA-L against zlib, this is mostly a test to see if the bindings are any good. ISA-L should of course be faster than the zlib, but sometimes there is an initialisation overhead as well as an overhead from the bindings itself. If I messed up the bindings, the overhead would be so much that zlib would actually win for most of te small data sizes. So I have this benchmark to make sure the bindings are really capable of being a drop-in replacement. The python zlib-ng bindings are also written by me and basically a copy of the python-isal bindings. So that is actually more of an apples-to-apples comparison.

The size to look for is 128KB these are the chunk sizes that are typically (de)compressed. (Used internally in pigz). I see ISA-L still wins there. This is not really surprising given the amount of custom assembly that was written.

@powturbo
Copy link

powturbo commented Jul 20, 2023

Extended benchmark TurboBench: Dynamic/Static web content compression benchmark including zstd and memory usage.
zlib-ng memory allocation must be revised to allocate only the minimum necessary!

@ghuls
Copy link

ghuls commented Jul 28, 2023

@KungFuJesus At least for me, decompressing big gzip files with igzip is a lot faster than with zlib-ng, although the 2.1.3 release got twice as fast as the 2.0.7 release, when using minigzip (with pigz the speed difference is less big):

# Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz = 814Mtime igzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null

real    0m4.745s
user    0m4.516s
sys     0m0.206s


❯  module load pigz

❯  module load zlib-ng/2.0.7-GCCcore-10.3.0

# pigz with zlib-ng 2.0.7:time pigz -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null

real    0m14.172s
user    0m17.501s
sys     0m1.189s

# minigzip with zlib-ng 2.0.7:time minigzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null

real    0m14.076s
user    0m13.721s
sys     0m0.320s


❯  module load zlib-ng/2.1.3-GCCcore-10.3.0

# pigz with zlib-ng 2.1.3:time pigz -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null

real    0m10.280s
user    0m10.573s
sys     0m1.202s

# minigzip with zlib-ng 2.1.3:time minigzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null

real    0m7.172s
user    0m6.915s
sys     0m0.234s


❯  module unload zlib-ng                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
❯  module load zlib/1.2.13                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                       
# pigz with just zlib 1.2.13:time pigz -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null                                                                                                        
                                                                                                                                                                                                                                                       
real    0m12.016s                                                                                                                                                                                                                                      
user    0m13.206s                                                                                                                                                                                                                                      
sys     0m1.186s  

igzip is by default compiled withouth pthreads support, unless you use make -f Makefile.unx : intel/isa-l#250

# igzip compiled with pthreads support: (when running make -f Makefile.unx)time programs/igzip -T 8 -3 -k -z hs.fa                                                                                                                                                                                                             
                                                                                                                                                                                                                                                       
real    0m2.086s                                                                                                                                                                                                                                       
user    0m9.264s                                                                                                                                                                                                                                       
sys     0m1.403s                                                                                                                                                                                                                                       

# igzip compiled without pthreads support (default if running just make):                                                                                                                                 time igzip -T 8 -3 -k -z hs.fa                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                       
real    0m10.289s                                                                                                                                                                                                                                      
user    0m8.890s                                                                                                                                                                                                                                       
sys     0m1.354s 

❯  time pigz -3 -k -p 8 hs.fa                                                                                               

real    0m8.491s
user    0m48.517s
sys     0m1.736s

❯  time pigz -2 -k -p 8 hs.fa                                                                                               

real    0m5.076s
user    0m38.331s
sys     0m1.885s

# File sizes                                                                                                              
3151425857  hs.fa 
927950066   hs.pigz_l3.fa.gz
998514930   hs.pigz_l2.fa.gz
977666017   hs.igzip_l3.fa.gz

@pablodelara
Copy link

For your edification, I managed to build this project (though, it seems rely on Makefiles that aren't present, so I had to switch it to point to my distro's package of isa-l). Here's their benchmark output with --all when zlib-ng is injected instead of zlib for the comparison:

adam@eggsbenedict ~/scratch/python-isal/benchmark_scripts $ python ./benchmark.py --all
CRC32
name	isal	zlib	ratio
0b	0.03	0.04	0.92
8b	0.05	0.05	1.03
128b	0.06	0.07	0.93
1kb	0.08	0.1	0.78
8kb	0.33	0.41	0.8
16kb	0.58	0.72	0.81
32kb	1.94	1.38	1.41
64kb	5.87	2.63	2.23
Adler32
name	isal	zlib	ratio
0b	0.04	0.04	1.03
8b	0.05	0.06	0.88
128b	0.06	0.06	1.06
1kb	0.1	0.07	1.4
8kb	0.47	0.21	2.26
16kb	0.84	0.31	2.7
32kb	1.62	0.62	2.61
64kb	3.13	1.01	3.09
zlib compression
name	isal	zlib	ratio
0b	1.75	6.9	0.25
8b	2.1	6.68	0.31
128b	3.36	6.74	0.5
1kb	6.35	10.67	0.59
8kb	23.9	43.8	0.55
16kb	42.64	84.23	0.51
32kb	83.99	169.72	0.49
64kb	179.51	364.98	0.49
zlib decompression
name	isal	zlib	ratio
0b	1.04	0.24	4.27
8b	0.97	0.34	2.84
128b	1.72	2.01	0.85
1kb	3.7	4.53	0.82
8kb	13.6	19.47	0.7
16kb	24.77	35.96	0.69
32kb	47.17	84.41	0.56
64kb	100.46	175.69	0.57
gzip compression
name	isal	zlib	ratio
0b	2.75	7.7	0.36
8b	3.07	7.51	0.41
128b	4.48	7.61	0.59
1kb	7.35	10.96	0.67
8kb	24.58	45.29	0.54
16kb	43.75	86.28	0.51
32kb	84.87	173.04	0.49
64kb	180.26	370.43	0.49
gzip decompression
name	isal	zlib	ratio
0b	2.48	1.98	1.25
8b	2.66	2.25	1.18
128b	3.57	3.72	0.96
1kb	5.66	6.0	0.94
8kb	15.83	20.73	0.76
16kb	29.04	41.06	0.71
32kb	52.48	79.28	0.66
64kb	105.28	182.12	0.58
zlib Compress instantiation
name	isal	zlib	ratio
	0.19	6.46	0.03
zlib Decompress instantiation
name	isal	zlib	ratio
	0.14	0.13	1.13
Gzip Writer instantiation
name	isal	zlib	ratio
	14.8	11.67	1.27
Gzip Reader instantiation
name	isal	zlib	ratio
	9.62	4.83	1.99
zlib sizes
name	-1	0	1	2	3	4	5	6	7	8	9
0b	8.0	11.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0
8b	2.0	2.375	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0
128b	0.68	1.086	0.805	0.688	0.68	0.68	0.68	0.68	0.656	0.656	0.672
1kb	0.522	1.011	0.785	0.53	0.522	0.522	0.522	0.522	0.52	0.52	0.522
8kb	0.476	1.001	0.745	0.489	0.481	0.478	0.476	0.476	0.472	0.472	0.492
16kb	0.47	1.001	0.734	0.486	0.476	0.471	0.471	0.47	0.466	0.466	0.489
32kb	0.467	1.0	0.722	0.487	0.476	0.469	0.468	0.467	0.464	0.463	0.488
64kb	0.461	1.0	0.715	0.487	0.474	0.466	0.464	0.461	0.457	0.457	0.482
isal sizes
name	0	1	2	3
0b	119.0	8.0	8.0	8.0
8b	15.875	2.0	2.0	2.0
128b	1.695	0.703	0.703	0.68
1kb	0.886	0.531	0.531	0.543
8kb	0.721	0.492	0.493	0.493
16kb	0.692	0.488	0.488	0.486
32kb	0.67	0.489	0.487	0.487
64kb	0.657	0.487	0.485	0.486

So, what's apparent is that we're winning handedly in crc and adler checksums (which is funny because I think a lot of distributions use this as a "fast" implementation for things like block devices specifically for the CRC checksum implementation).

Now, they are winning, seemingly, on the compression and decompression tests. However, I did not write these benchmarks nor do I know what type of data they are compressing. The compression and decompression speeds are very data dependent, so it's not exactly a definitive test. There's also the caveat that isa-l shrinks 9 levels to only 3 and doesn't offer anywhere near the same compression ratios it would seem. Given how the encoding works, it wouldn't be surprising that isa-l is faster with poorer compression ratios.

Could you share which architecture this benchmark ran on? Also, how did you plug in zlib-ng instead of zlib?

Thanks!

@KungFuJesus
Copy link
Contributor

Could you share which architecture this benchmark ran on? Also, how did you plug in zlib-ng instead of zlib?

Thanks!

Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz. I simply LD_PRELOAD'd zlib-ng into python so that it used zlib-ng's symbols instead of zlib.

@pablodelara
Copy link

Thanks for the quick response. Did you have to make any code changes? Because benchmark script uses libz.so, so how do you use zlib_ng.so instead?

@KungFuJesus
Copy link
Contributor

The same way you make any project use zlib-ng in place of zlib:

https://github.com/zlib-ng/zlib-ng#install

@pablodelara
Copy link

Thanks a lot for sharing. I confirm the Alder32 performance difference, but not the CRC32. Do you have turbo boost enabled? Also, it would be a good idea to increase the number of iterations to have some more stable results in benchmark.py.

@KungFuJesus
Copy link
Contributor

Yes. Then again my isa-l was provided by the distro, so it could have been a version or two behind. As far as I'm aware ISA-L does leverage (v)pclmulqdq in an extremely similar manner to us, so I wouldn't be surprised if those were basically on par.

Their adler32 implementation is pretty poor for a number of reasons, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants