Regression when compressing a binary (mode=1) image #5188

j-towns · 2021-01-06T16:10:17Z

OS: MacOS 11.1
Python: 3.8.6
Pillow: 8.1.0 vs 5.4.1

I recently had to re-run a benchmark which I originally did in late 2018 and found a significant performance regression. The code below compresses a set of 10,000 1-bit (mode 1) 28x28 images using PNG and WebP and reports the average compression ratio in bits per pixel. Anything less than 1 means the images have been compressed, anything larger than 1 means the compression has failed for some reason, and resulted in a longer byte string than the raw images.

When running this with Pillow 8.1.0 I get

$ python benchmark_compressors.py
Dataset: binarized mnist. Compressor: PNG. Rate: 1.37 bits per pixel.
Dataset: binarized mnist. Compressor: WebP. Rate: 1.01 bits per pixel.

In Pillow 5.4.1 (the version I used back in 2018) I get

$ python benchmark_compressors.py
Dataset: binarized mnist. Compressor: PNG. Rate: 0.78 bits per pixel.
Dataset: binarized mnist. Compressor: WebP. Rate: 0.44 bits per pixel.

Please let me know if there's any more information that I can provide.

To run the code below you will need NumPy and also torchvision (part of PyTorch). Torchvision is used to obtain the benchmark image dataset and can be installed with pip install torchvision.

import io
import numpy as np

from torchvision import datasets, transforms
import PIL.Image as pimg


def mnist_raw():
    mnist = datasets.MNIST(
        'data/mnist', train=False, download=True,
        transform=transforms.Compose([transforms.ToTensor()]))
    return mnist.data.numpy()

def mnist_binarized(rng):
    raw_probs = mnist_raw() / 255
    return rng.random_sample(np.shape(raw_probs)) < raw_probs

def bench_compressor(compress_fun, compressor_name, images, images_name):
    byts = compress_fun(images)
    n_bits = len(byts) * 8
    bits_per_pixel = n_bits / np.size(images)
    print("Dataset: {}. Compressor: {}. Rate: {:.2f} bits per pixel.".
          format(images_name, compressor_name, bits_per_pixel))

def pimg_compress(format='PNG', **params):
    def compress_fun(images):
        compressed_data = bytearray()
        for image in images:
            image = pimg.fromarray(image)
            img_bytes = io.BytesIO()
            image.save(img_bytes, format=format, **params)
            compressed_data.extend(img_bytes.getvalue())
        return compressed_data
    return compress_fun


if __name__ == "__main__":
    rng = np.random.RandomState(0)
    images = mnist_binarized(rng)
    bench_compressor(
        pimg_compress("PNG", optimize=True), "PNG", images, 'binarized mnist')
    bench_compressor(
        pimg_compress('WebP', lossless=True, quality=100), "WebP", images, 'binarized mnist')

The text was updated successfully, but these errors were encountered:

radarhere · 2021-04-27T12:55:20Z

Testing, the difference for PNG and most of the difference for WebP is due to #3790

radarhere · 2021-04-27T13:37:24Z

So this was a bug fix. You're checking these images for size, not correctness. I think if you were to eyeball the output with 5.4.1, you would find that it is incorrect.

radarhere · 2021-04-27T13:51:09Z

I don't think there's any reason why the fixed version should be larger than the 5.4.1 version, apart from the fact that the image content becomes different - making it unfair to compare their resulting sizes.

If you have any additional thoughts this can be re-opened.

radarhere closed this as completed Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression when compressing a binary (mode=1) image #5188

Regression when compressing a binary (mode=1) image #5188

j-towns commented Jan 6, 2021

radarhere commented Apr 27, 2021

radarhere commented Apr 27, 2021

radarhere commented Apr 27, 2021

Regression when compressing a binary (mode=1) image #5188

Regression when compressing a binary (mode=1) image #5188

Comments

j-towns commented Jan 6, 2021

radarhere commented Apr 27, 2021

radarhere commented Apr 27, 2021

radarhere commented Apr 27, 2021