Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown due to high SYS time during vips image compression when using LD_PRELOAD to force zlib-ng usage #947

Open
AndKe opened this issue May 5, 2021 · 30 comments

Comments

@AndKe
Copy link

AndKe commented May 5, 2021

This is the difference between standard zlib on a Rpi4 running at 2Ghz, and current zlib-ng master:

pi@pfd:/run/user/1000 $ time vips copy out--12.png  x.png

real	0m0.130s
user	0m0.104s
sys	0m0.048s

pi@pfd:/run/user/1000 $ time LD_PRELOAD=/home/pi/zlib-ng/libz.so.1.2.11.zlib-ng vips copy out--12.png  x.png

real	0m0.364s
user	0m0.073s
sys	0m0.306s

This is the test image:
test12.zip

@mtl1979
Copy link
Collaborator

mtl1979 commented May 5, 2021

zlib-ng is optimized for larger files. Anything that takes less than 1 second can't be reliably timed with time command.

@AndKe
Copy link
Author

AndKe commented May 5, 2021

well, I see reliable and averaged values for batch-sets of filers and compression just fine when optimizing filters for average time on 10ms but ok:
here is a bigger batch:
time LD_PRELOAD=/home/pi/zlib-ng/libz.so.1.2.11.zlib-ng ./do.sh
real 0m17.408s
user 0m9.413s
sys 0m8.228s

time ./do.sh
real 0m11.254s
user 0m8.619s
sys 0m2.842s

you can say it is inefficient on small files, but the time it takes is true.

@nmoinvaz
Copy link
Member

nmoinvaz commented May 5, 2021

These are the benchmarks which we have done. We don't use LD_PRELOAD when benchmarking. There is also a deflatebench repository which we use specifically for benchmarking.

Have you tried building minigzip in both zlib and zlib-ng and comparing the results?

@AndKe
Copy link
Author

AndKe commented May 7, 2021

@nmoinvaz - no I have not tried to involve minigzip.
Of course, my finding can have something with the low-color count, low resolution jobs, it can be as simple as the (synthetic(?)) benchmarks favor bigger/more colorful data.

@Dead2
Copy link
Member

Dead2 commented May 7, 2021

I would be interested in seeing how the output of cmake/configure looks when you compiled this, just to rule out an error in configuration detection.

It is also hard for us to reproduce this since you use a 3rd party program, so if you could try to reproduce it with minigzip that would be great (This also helps narrow down the problem)

@AndKe
Copy link
Author

AndKe commented May 7, 2021

@Dead2 it's nice that you want to look into it - That I can help with.
The full output is here: https://gist.github.com/AndKe/72b82592d7af1e2823d0d47bbc027788

I am also attaching my test-dataset:
472x288.zip

In case you find that zlib-ng can increase the performance of my Rpi4, please tell me how.
For the record: my RPI4 runs a current official distro, (with no GUI) is updated (using apt) - and overclocked+cooled (2Ghz) - (and no, it is not being throttled due to heat).

@Dead2
Copy link
Member

Dead2 commented May 7, 2021

@AndKe Unfortunately you only posted the output of make, not configure or cmake, so while it answers some questions it does not contain all the info I hoped to see.

I did some testing myself, this is on RPi 3, not overclocked, running raspbian 64bit.
Cloned develop from both madler/zlib and and zlib-ng.
Compiled zlib using configure without flags, compiled zlib-ng using cmake without flags.
Benchmarked using deflatebench with silesia-small and 6 runs, trim 2 worst.

zlib git-develop aarch64 1.2.11 cacf7f1

 Tool: minigzip   Levels: 1-9
 Runs: 6          Trim worst: 2

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.805%      1.205/1.215/1.235/0.014        0.252/0.269/0.280/0.012        7,050,618
 2     43.764%      1.325/1.342/1.357/0.014        0.263/0.275/0.285/0.012        6,886,856
 3     42.712%      1.675/1.691/1.709/0.016        0.259/0.266/0.276/0.007        6,721,288
 4     42.062%      1.854/1.860/1.876/0.011        0.238/0.257/0.269/0.014        6,619,035
 5     41.235%      2.564/2.580/2.589/0.012        0.247/0.253/0.259/0.005        6,488,854
 6     40.734%      3.678/3.693/3.707/0.012        0.244/0.256/0.270/0.011        6,409,974
 7     40.628%      4.438/4.452/4.463/0.011        0.244/0.254/0.268/0.010        6,393,327
 8     40.505%      6.654/6.659/6.667/0.006        0.246/0.259/0.275/0.012        6,374,005
 9     40.490%      7.207/7.210/7.215/0.003        0.254/0.261/0.270/0.007        6,371,682

 avg1  41.882%                        3.411                          0.261
 tot                                122.808                          9.401       59,315,639

zlib-ng git-develop aarch64 81f1c8a

 Tool: minigzip   Levels: 1-9
 Runs: 6          Trim worst: 2

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.792%      0.733/0.748/0.755/0.010        0.237/0.249/0.257/0.010        9,094,303
 2     43.871%      1.078/1.091/1.105/0.012        0.244/0.252/0.258/0.007        6,903,702
 3     42.488%      1.305/1.321/1.344/0.017        0.244/0.249/0.253/0.004        6,686,118
 4     41.463%      1.587/1.600/1.623/0.016        0.222/0.237/0.247/0.011        6,524,775
 5     41.218%      1.706/1.738/1.753/0.022        0.224/0.235/0.244/0.009        6,486,136
 6     41.039%      1.990/1.997/2.004/0.006        0.238/0.247/0.255/0.009        6,457,951
 7     40.778%      2.561/2.570/2.577/0.007        0.203/0.226/0.243/0.018        6,416,941
 8     40.704%      3.119/3.142/3.159/0.017        0.204/0.222/0.239/0.016        6,405,249
 9     40.696%      3.364/3.393/3.411/0.021        0.210/0.225/0.239/0.014        6,404,085

 avg1  43.339%                        1.956                          0.238
 tot                                 70.403                          8.570       61,379,260

As you can see, zlib-ng is a lot faster than zlib in these benchmarks, so I am not sure what is happening on your end.

Something I notice though is that when you run zlib-ng tests, most of the extra time is spent in sys (system, so kernel-side). In your first benchmark, user (you application) uses less time with zlib-ng. So somehow the kernel spends more time when you use zlib-ng. Not sure whether that can be caused by LD_PRELOAD perhaps.

Also, zlib-ng in compat mode is not really ABI compatible, only API compatible. What that means is the application should be re-compiled using zlib-ng, meaning that LD_PRELOAD might not work correctly. Whether this is a symptom of that or not is unknown.

@AndKe
Copy link
Author

AndKe commented May 7, 2021

@Dead2 my reasoning was that the sys time increased per picture processed.. while I assumed the the preload was done only once before running do.sh.
you are basically suggesting that I do "make install" - which is kind of warned against in https://github.com/zlib-ng/zlib-ng#install

  • given that I do not boot to desktop, it wopn't most likely make it unbootable.
  • but if I get the same performance drop, how do I revert the make install, and do back to the default zlib?

@Dead2
Copy link
Member

Dead2 commented May 7, 2021

@AndKe No, not at all. Do NOT run make install!

./configure or cmake . is run before make, and make test and/or make install would have come after that (but don't do that).

It is impossible to run make in zlib-ng without first running configure or cmake on a clean checkout of the repo:

root@rpi64:/usr/src/github/zlib-ng# make
make: *** No targets specified and no makefile found.  Stop.

@AndKe
Copy link
Author

AndKe commented May 8, 2021

@Dead2 Good morning, yes, I know that I did run ./configure before,(but did not post the results) here I run it again:

$ ./configure 
Checking for compiler... gcc
Checking for shared library support... Building shared library libz-ng.so.2.0.2 with gcc.
Checking for off64_t... Yes.
Checking for fseeko... Yes.
Checking for strerror... Yes.
Checking for unistd.h... Yes.
Checking for ptrdiff_t... Yes.
Checking for ANSI C compliant compiler...  Yes.
Checking for -fno-semantic-interposition... Yes.
Checking for -fno-lto... Yes.
Checking for attribute(visibility(hidden)) support... Yes.
Checking for attribute(visibility(internal)) support... Yes.
Checking for __builtin_ctz ... Yes.
Checking for __builtin_ctzll ... Yes.
Checking for SSSE3 intrinsics ... No.
Check whether -mfpu=neon is available ... Yes.
Checking for sys/sdt.h ... No.
HWCAP2_CRC32 not present in sys/auxv.h; cannot detect support at runtime.
HWCAP_NEON not present in sys/auxv.h; cannot detect support at runtime.
ARM floating point arch: -mfloat-abi=hard
ARCH: armv8-a+crc
Using arch directory: arch/arm
pi@pfd:~/zlib-ng $ 

make test reveals some problems:

pi@pfd:~/zlib-ng $ make test
make -C test
make[1]: Entering directory '/home/pi/zlib-ng/test'
hello world
zlib version 1.2.11.zlib-ng = 0x12bf, compile flags = 0x55
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek:  hello!
gzgets(): hello, hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
deflateBound(): OK
deflate_copy(): OK
deflateGetDictionary(): hello, hello!
deflateSetHeader(): OK
deflateTune(): OK
deflatePending(): OK
deflatePrime(): OK
		*** zlib test OK ***
hello world
zlib version 1.2.11.zlib-ng = 0x12bf, compile flags = 0x55
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek:  hello!
gzgets(): hello, hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
deflateBound(): OK
deflate_copy(): OK
deflateGetDictionary(): hello, hello!
deflateSetHeader(): OK
deflateTune(): OK
deflatePending(): OK
deflatePrime(): OK
		*** zlib shared test OK ***
../minigzip: <fd:0>: invalid literal/lengths set
          --- zlib not vulnerable to CVE-2002-0059 ---
../minigzip: <fd:0>: invalid bit length repeat
          --- zlib not vulnerable to CVE-2004-0797 ---
../minigzip: <fd:0>: invalid code -- missing end-of-block
          --- zlib not vulnerable to CVE-2005-1849 ---
../minigzip: <fd:0>: invalid code -- missing end-of-block
          --- zlib not vulnerable to CVE-2005-2096 ---
../minigzip -4 </home/pi/zlib-ng/test/GH-361/test.txt >/dev/null
gcc -O2  -std=c99 -Wall -D_LARGEFILE64_SOURCE=1 -DWITH_GZFILEOP -DHAVE_VISIBILITY_HIDDEN -DHAVE_VISIBILITY_INTERNAL -DHAVE_BUILTIN_CTZ -DHAVE_BUILTIN_CTZLL -DARM_FEATURES -mfloat-abi=hard -DUNALIGNED_OK -DUNALIGNED64_OK -DARM_ACLE_CRC_HASH -DARM_NEON_ADLER32 -DARM_NEON_CHUNKSET -DARM_NEON_SLIDEHASH -I.. -I/home/pi/zlib-ng -o switchlevels /home/pi/zlib-ng/test/switchlevels.c -L.. ../libz-ng.a
gcc: error: ../libz-ng.a: No such file or directory
make[1]: *** [Makefile:101: switchlevels] Error 1
make[1]: Leaving directory '/home/pi/zlib-ng/test'
make: *** [Makefile:161: test] Error 2
pi@pfd:~/zlib-ng $ 

The vulnerabilities are not an issue for this application, because the data I process is 100% self-generated and not user-provided.

How should I proceed?

@Dead2
Copy link
Member

Dead2 commented May 8, 2021

It is hard to help when we don't know anything about the application you use and you don't perform tests with known tools (minigzip and/or deflatebench).

  • If the application you are using is your own, you should try porting it to native zlib-ng.
  • If the application is not your own, but you have the source, you could try porting it (best option) or link it wity zlib-ng in compat mode statically (that might be easier or harder, depending on the build system of the application).
  • If it is closed source, then LD_PRELOAD or asking them to add support for zlib-ng would unfortunately be the only alternatives, but LD_PRELOAD is not really a good solution because zlib and zlib-ng-compat are not 100% ABI compatible with each other.

Unlikely to make any difference at all, but you could test compiling with cmake instead of configure, there are slight differences in detection and configuration between the two.

@AndKe
Copy link
Author

AndKe commented May 8, 2021

@Dead2 The application I use is a python script that uses vips/pyvips to convert the svg it generates (previously attached) to png
so basically: It is about speeding up vips .

@mtl1979
Copy link
Collaborator

mtl1979 commented May 8, 2021

HWCAP2_CRC32 not present in sys/auxv.h; cannot detect support at runtime.
HWCAP_NEON not present in sys/auxv.h; cannot detect support at runtime.

These two lines mean zlib-ng can't do run-time detection of optimizations to speed up the code... There is issue with the used libc.

@AndKe
Copy link
Author

AndKe commented May 8, 2021

@mtl1979 can you please suggest what to do with the "libc" ?
in case you mean libc.conf :

$ cat /etc/ld.so.conf.d/libc.conf 
# libc default configuration
/usr/local/lib
/usr/lib 

I've added the line with "local" because that's where visa compiles/installs to.

@mtl1979
Copy link
Collaborator

mtl1979 commented May 8, 2021

@AndKe Obviously I want to see what is defined in sys/auxv.h or any file it includes if it exists.

@AndKe
Copy link
Author

AndKe commented May 8, 2021

@mtl1979 This is the output of the only sys/auxv.h on the system:

$ cat /usr/include/arm-linux-gnueabihf/sys/auxv.h
/* Access to the auxiliary vector.
   Copyright (C) 2012-2018 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#ifndef _SYS_AUXV_H
#define _SYS_AUXV_H 1

#include <elf.h>
#include <bits/auxv.h>
#include <sys/cdefs.h>
#include <bits/hwcap.h>

__BEGIN_DECLS

/* Return the value associated with an Elf*_auxv_t type from the auxv list
   passed to the program on startup.  If TYPE was not present in the auxv
   list, returns zero and sets errno to ENOENT.  */
extern unsigned long int getauxval (unsigned long int __type)
  __THROW;

__END_DECLS

#endif /* sys/auxv.h */

@mtl1979
Copy link
Collaborator

mtl1979 commented May 9, 2021

@AndKe In my Linux, bits/hwcap.h has

#define HWCAP_ARM_SWP           1
#define HWCAP_ARM_HALF          2
#define HWCAP_ARM_THUMB         4
#define HWCAP_ARM_26BIT         8
#define HWCAP_ARM_FAST_MULT     16
#define HWCAP_ARM_FPA           32
#define HWCAP_ARM_VFP           64
#define HWCAP_ARM_EDSP          128
#define HWCAP_ARM_JAVA          256
#define HWCAP_ARM_IWMMXT        512
#define HWCAP_ARM_CRUNCH        1024
#define HWCAP_ARM_THUMBEE       2048
#define HWCAP_ARM_NEON          4096
#define HWCAP_ARM_VFPv3         8192
#define HWCAP_ARM_VFPv3D16      16384
#define HWCAP_ARM_TLS           32768
#define HWCAP_ARM_VFPv4         65536
#define HWCAP_ARM_IDIVA         131072
#define HWCAP_ARM_IDIVT         262144
#define HWCAP_ARM_VFPD32        524288
#define HWCAP_ARM_LPAE          1048576
#define HWCAP_ARM_EVTSTRM       2097152

@AndKe
Copy link
Author

AndKe commented May 9, 2021

@mtl1979 ok, that file looks the same,

$ cat /usr/include/arm-linux-gnueabihf/bits/hwcap.h 
/* Defines for bits in AT_HWCAP.  ARM Linux version.
   Copyright (C) 2012-2018 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#if !defined (_SYS_AUXV_H) && !defined (_LINUX_ARM_SYSDEP_H)
# error "Never include <bits/hwcap.h> directly; use <sys/auxv.h> instead."
#endif

/* The following must match the kernel's <asm/hwcap.h>.  */
#define HWCAP_ARM_SWP		1
#define HWCAP_ARM_HALF		2
#define HWCAP_ARM_THUMB		4
#define HWCAP_ARM_26BIT		8
#define HWCAP_ARM_FAST_MULT	16
#define HWCAP_ARM_FPA		32
#define HWCAP_ARM_VFP		64
#define HWCAP_ARM_EDSP		128
#define HWCAP_ARM_JAVA		256
#define HWCAP_ARM_IWMMXT	512
#define HWCAP_ARM_CRUNCH	1024
#define HWCAP_ARM_THUMBEE	2048
#define HWCAP_ARM_NEON		4096
#define HWCAP_ARM_VFPv3		8192
#define HWCAP_ARM_VFPv3D16	16384
#define HWCAP_ARM_TLS		32768
#define HWCAP_ARM_VFPv4		65536
#define HWCAP_ARM_IDIVA		131072
#define HWCAP_ARM_IDIVT		262144
#define HWCAP_ARM_VFPD32	524288
#define HWCAP_ARM_LPAE		1048576
#define HWCAP_ARM_EVTSTRM	2097152

@mtl1979
Copy link
Collaborator

mtl1979 commented May 9, 2021

@AndKe Then #952 should make zlib-ng somewhat faster... It tests for feature bits more aggressively...

@AndKe
Copy link
Author

AndKe commented May 9, 2021

@mtl1979
Please see attached "log"
it shows the speed with zlib, then unmodified zlib-ng , then patched zlib-ng (insignificant difference)
It also contains all configure and make output.
zlibpatch.log

@mtl1979
Copy link
Collaborator

mtl1979 commented May 9, 2021

@AndKe The configure errors are gone, but that doesn't mean run-time detection of NEON works, or if NEON optimizations are faster than non-optimized version of the equivalent functions.

@Dead2
Copy link
Member

Dead2 commented May 9, 2021

I don't think this problem is related to NEON detection really, since most of the extra CPU-time is in SYS. So this is probably related to LD_PRELOAD.

I know python has performance problems related to LD_PRELOAD if it was not compiled with -fno-semantic-interposition, so this might be just that. Several distros started to compile python with that flag last year I think, but not sure how widespread it is yet.

You could perhaps try installing pypy and using that to run vips (if possible) instead of python, that might possibly play nicer with LD_PRELOAD. Pypy is a JIT implementation of python.

@AndKe
Copy link
Author

AndKe commented May 9, 2021

@Dead2 please note that while my main app is python, and uses pyvips - but all these performance tests are done just with vips, and no python involved in them.

I too were thinking about LD_PRELOAD.. what dos not make sense to me is that the cost of LD_PAYLOAD should be once, not for every image in the shellscript.

Finally, my application will still need pyvips, (I assume some performance loss if I stop working with buffers in python and need to involve file system and vips.)

@mtl1979
Copy link
Collaborator

mtl1979 commented May 9, 2021

@Dead2 I only noticed that both ACLE and NEON tests in configure failed without the patch... Whole issue might be combination of several factors... As we fix or rule out them one by one, zlib-ng should eventually be faster than zlib.

@AndKe
Copy link
Author

AndKe commented May 9, 2021

did anyone try to run my svg file conversion script on some ARM and compared zlib vs zlib-ng ?
maybe the small pngs wit low colour count is somehow zlib-ng's cryptonite? :)

@Dead2
Copy link
Member

Dead2 commented May 9, 2021

Filetype/contents would not explain the increase in SYS time, that would have been USER time.
I am unable to reproduce the problem with minigzip on your svg file on an ARM machine.

@Dead2 Dead2 changed the title zlib-ng badly outperformed by the old zlib Slowdown due to high SYS time during vips image compression when using LD_PRELOAD to force zlib-ng usage May 9, 2021
@AndKe
Copy link
Author

AndKe commented May 9, 2021

@Dead2 where does minigzip come into the picture in this case? - is that an alternative that vips can use or can minigzip replace pyvips?

@Dead2
Copy link
Member

Dead2 commented May 9, 2021

@AndKe minigzip is an application we use extensively to test and benchmark zlib-ng.

We are not able to install, learn and test every application that uses zlib/zlib-ng in order to test/debug them, and most of the time it is irrelevant because a bug in zlib-ng will usually also be triggered by other programs, like minigzip.

@nmoinvaz
Copy link
Member

nmoinvaz commented May 9, 2021

@AndKe Can you attach your configure.log?

@AndKe
Copy link
Author

AndKe commented May 9, 2021

@nmoinvaz please see here:
configure.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants