Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minigzip compatibility with gzip #908

Open
ghuls opened this issue Mar 22, 2021 · 11 comments
Open

minigzip compatibility with gzip #908

ghuls opened this issue Mar 22, 2021 · 11 comments

Comments

@ghuls
Copy link

ghuls commented Mar 22, 2021

Can some options of gzip be supported?

gzip supports specifying options in -cd besides `-c -d:

-f overwrites ouput instead of "compress with Z_FILTERED"
-q surpresses warnings
-- for ending option processing
...

  -c, --stdout      write on standard output, keep original files unchanged
  -d, --decompress  decompress
  -f, --force       force overwrite of output file and compress links
  -h, --help        give this help
  -l, --list        list compressed file contents
  -L, --license     display software license
  -n, --no-name     do not save or restore the original name and time stamp
  -N, --name        save or restore the original name and time stamp
  -q, --quiet       suppress all warnings
  -r, --recursive   operate recursively on directories
  -S, --suffix=SUF  use suffix SUF on compressed files
  -t, --test        test compressed file integrity
  -v, --verbose     verbose mode
  -V, --version     display version number
  -1, --fast        compress faster
  -9, --best        compress better
    --rsyncable   Make rsync-friendly archive

Addition of zcat and gunzip like wrappers would be nice too.

@mtl1979
Copy link
Collaborator

mtl1979 commented Mar 22, 2021

minigzip doesn't try to be replacement for gzip... There was talks few days ago about making full replacement that has all the features of gzip but internally uses zlib-ng, but whether we will do it or not, depends on how many people would actually use it.

@ghuls
Copy link
Author

ghuls commented Mar 22, 2021

At least good to know that it might be supported in the future.
Supporting -- as option to stop processing arguments would still be nice though.

@mtl1979
Copy link
Collaborator

mtl1979 commented Mar 22, 2021

I'm not saying yet when we will add new features to minigzip as our main concern now is to have a stable release with as little bugs as possible... After all the bugs have been found and fixed, we will start adding new features.

@ghuls
Copy link
Author

ghuls commented Mar 22, 2021

I found a workaround in the meantime.

As pigz is linked to zlib, I was able to use it with zlib-ng instead. pigz should support all gzip options. When using pigz with multiple threads and zlib-ng instead of zlib, compression it is still 4 times faster :-).

# gzip.
$ time gzip -c -6 test.txt > test.txt.gz
real	7m12.472s
user	7m8.307s
sys	0m1.821s

# minigzip
$ time minigzip -c -6 test.txt > test.txt.gz
real	1m46.187s
user	1m43.936s
sys	0m1.301s


# pigz with 8 threads and zlib.
$ time pigz -p 8 -6 -c test.txt > test.txt.gz
real	0m59.679s
user	7m57.845s
sys	0m3.842s

# With zlib-ng

# pigz with 2 threads and zlib-ng
time pigz -p 2 -6 -c test.txt > test.txt.gz
real	1m4.082s
user	2m11.145s
sys	0m4.082s

# pigz with 4 threads and zlib-ng.
$ time pigz -p 4 -6 -c test.txt > test.txt.gz
real	0m32.674s
user	2m12.010s
sys	0m2.810s

# pigz with 6 threads and zlib-ng.
$ time pigz -p 6 -6 -c test.txt > test.txt.gz
real	0m22.818s
user	2m15.296s
sys	0m2.442s

# pigz with 8 threads and zlib-ng.
$ time pigz -p 8 -c test.txt > test.txt.gz
real	0m16.715s
user	2m14.397s
sys	0m2.336s

https://github.com/madler/pigz

@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 22, 2021

We also have pigzbench repository that can be used to build and benchmark pigz against different zlib forks.

@paravoid
Copy link

minigzip doesn't try to be replacement for gzip... There was talks few days ago about making full replacement that has all the features of gzip but internally uses zlib-ng, but whether we will do it or not, depends on how many people would actually use it.

It may be useful to note that NetBSD's gzip is actually a frontend to zlib (and FreeBSD's gzip is a fork of NetBSD's as well). It doesn't have all the options that GNU gzip has (e.g. --rsyncable), but it has most, and I'd expect at least the bulk of automated uses in Makefiles and build systems etc. to use options that exist in the BSDs. The code itself will probably have some BSD-specific quirks, but I'm sure it can be made portable (with or without the use of libbsd). So perhaps porting a widely deployed/battle-tested permissively licensed gzip implementation to zlib-ng's native API makes more sense than reimplementing gzip from scratch, or even with minigzip as the starting point.

Side-note: back in 2007, I forked NetBSD's code and heavily modified it, to create "zgz", part of Debian's pristine-tar package. Given how long ago this was, I don't remember what the porting effort entailed; it was also a fork of NetBSD's 20060927 revision, so it's dubious whether any lessons learned from back then would be relevant today anyway. Several others (Joey Hess, Josh Triplett etc.) have also modified the code since. The code is probably unrecognizable compared to the original NetBSD version, and it also is quite domain-specific, as its purpose is to have all kinds of "expert" flags, to be used to simulate various archivers, and recreate archives found in the wild. I'm not sure if that's of any use here, but I'm mentioning it for posterity and in case it gives you a bit of an insight of the variety of gzip archives that exist out there.

@nmoinvaz
Copy link
Member

I think one challenge with existing gzip source code is that it is not zlib licensed. In this repository we only allow zlib licensed source code.

@paravoid
Copy link

paravoid commented Jan 9, 2022

I think one challenge with existing gzip source code is that it is not zlib licensed. In this repository we only allow zlib licensed source code.

GNU gzip is under GPLv3, and including code using a strong copyleft license would be indeed a pretty big departure from the permissiveness of the zlib license that this project is uses - I concur.

The NetBSD/FreeBSD gzip code that I mentioned above on the other hand, is under the 2-clause BSD license. It's not the same as the zlib license, but it's not very far either.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 9, 2022

It would be best if the *BSD version would be dual-licensed or "official" zlib-ng adaptation would be released... Might be worth discussing with the maintainers...

@paravoid
Copy link

It would be best if the *BSD version would be dual-licensed or "official" zlib-ng adaptation would be released... Might be worth discussing with the maintainers...

(A year and a half later, reviving this)

I don't have any objections per se to this (especially given I'm not the one to be doing the work ;), but I am also curious what problem would it solve. The 2-clause BSD license is a fine license, about as permissive as the zlib license is, OSI-approved, FSF-approved, and popular enough to have been vetted already by corporate legal departments. Is there a perceived scenario where one would be OK with the zlib license but not the 2-clause BSD for their project?

Don't get me wrong, I appreciate homogeneity and consistency! But this would come at a significant expense of relicensing, dragging into the conversation potentially dozens of contributors, so I believe it's worthwhile to be asking about the benefits...

@paravoid
Copy link

For what it's worth, I tried compiling FreeBSD main's (freebsd/freebsd-src@0f8b2ba) gzip (usr.bin/gzip) under a Debian unstable, with zlib-ng.

I had to:

  1. Add this at the top (all BSD-isms):
#define nitems(x) (sizeof((x)) / sizeof((x)[0]))
#define __unused __attribute__((__unused__))
#define SIGINFO SIGUSR1
#define EFTYPE EINVAL
#define __COPYRIGHT(_s) static const char copyright[] __unused = _s
  1. Comment-out a single-line that checks sb.st_flags and calls fchflags.

  2. Build with:
    gcc -Wall -isystem /usr/include/bsd -DLIBBSD_OVERLAY -DNO_BZIP2_SUPPORT -DNO_XZ_SUPPORT -DNO_LZ_SUPPORT -DNO_ZSTD_SUPPORT -o gzip gzip.c -lz -lbsd

After that, the resulting binary, linked against zlib-ng, just works in my (limited) testing.

To productionize this, one could:
a) Use unifdef to strip the source code from bzip2/xz/LZMA/zstd.
b) Add the necessary configure options and/or ifdefs so that the two modifications above are conditional to Linux/glibc/etc.
c) Integrate into the build system.

The libbsd dependency can be further reduced if one where to inline a handful of functions (getprogname, le32dec, strlcpy etc.), but I would not recommend it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants