Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added image.entropy() method #3530

Closed
wants to merge 3 commits into from

Conversation

fish2000
Copy link
Contributor

@fish2000 fish2000 commented Jan 2, 2019

This calculates the entropy for the image, based on the histogram.

Because this uses image histogram data directly, the existing C function underpinning the image.histogram() method was abstracted into a macro, and a new C function was added that uses this macro.

The new image.entropy() method is based on image.histogram(), and will accept the same arguments to calculate the histogram data it will use to assess the entropy of the image.

The algorithm and methodology is based on existing Python code:

... A test case in the Tests/ directory, and doctest lines in selftest.py, have both been added and checked.

Changes proposed in this pull request:

  • The addition of an image.entropy() method,
  • The abstraction of the prologue of the C function _histogram into a macro, and
  • The use of that macro in both _histogram and _entropy.
  • Minor documentation addenda in the docstrings for both image.entropy() and image.histogram()

@fish2000
Copy link
Contributor Author

fish2000 commented Jan 2, 2019

The tests are failing, due to float-comparison issues… I will rewrite the test code to deal with that and re-push.

@fish2000 fish2000 force-pushed the image-entropy-method branch 2 times, most recently from 5300972 to a3b6bd4 Compare January 2, 2019 22:32
@fish2000
Copy link
Contributor Author

fish2000 commented Jan 2, 2019

I’m on Mac OS X, and I have no idea how to fix the Appveyor build scripts – can someone who knows windows help me out, or point me in the right direction?

Evidently it needs to link the C math library when linking the result of compiling _imaging.c in order to find the symbol for the _log2 function call.

This calculates the entropy for the image, based on the histogram.
Because this uses image histogram data directly, the existing C
function underpinning the `image.histogram()` method was abstracted
into a macro, and a new C function was added that uses this macro.
The new `image.entropy()` method is based on `image.histogram()`,
and will accept the same arguments to calculate the histogram
data it will use to assess the entropy of the image.
The algorithm and methodology is based on existing Python code:
    https://git.io/fhmIU
... A test case in the `Tests/` directory, and doctest lines in
`selftest.py`, have both been added and checked.

Subsequent commits:
* Using assertAlmostEqual() in entropy tests
* Added description of `extrema` arguments.
* Only test seven digits of float returned by im.entropy()
@radarhere
Copy link
Member

I have created fish2000#1

@homm
Copy link
Member

homm commented Jan 3, 2019

Any reason to implement it in C rather than in Python? I'm comparing with this implementation:

def entr(hist):
    from math import log
    fs = 1.0 / sum(hist)
    fentropy = 0.0
    for h in hist:
        if h != 0:
            h = h * fs
            fentropy += h * log(h, 2)
    return -fentropy

For 1024 × 640 image I get:

In: %timeit im.histogram()
100 loops, best of 3: 2.04 ms per loop

In: %timeit im.entropy()
100 loops, best of 3: 2.07 ms per loop

In: %timeit entr(im.histogram())
100 loops, best of 3: 2.31 ms per loop

The bigger image, the smaller the difference between Python and C implementation. Does it worth it?

src/_imaging.c Show resolved Hide resolved
@fish2000
Copy link
Contributor Author

fish2000 commented Jan 3, 2019

The bigger image, the smaller the difference between Python and C implementation. Does it worth it?

Short answer: yes, I believe so.

Many histogram-entropy functions are implemented in Python, using the PIL/Pillow histogram() method data – e.g.

One of PIL/Pillow’s primary uses is in web applications – processing images uploaded by users, avatar images, favicon.ico source material, minute changes to a minuscule images’ scale calculated on-the-fly, and so forth. In-memory, ad-hoc performance with smaller images is important in these cases.

Here’s the long answer: the case I would make for moving this algorithm into Pillow’s _imaging.c, where it can take full advantage of Pillow’s internal structures, is that the version I have submitted heap-allocates only one PyObject per invocation. Comparable Python implementations need to create at least one Python object for each entry in the histogram – 768 for an RGB-mode image – and add at least one frame to the interpreter stack, in the best possible scenario.

Using a C function, utilizing the Pillow internal structures (e.g. ImagingHistogramInstance) in an application where the function is iteratively called across 1-dimensional image slices, makes a big difference in the context of a web application, with their attendant expectations of efficient concurrency. That’s my take on it, at any rate, and my original motivation, basically. Yes!

@fish2000
Copy link
Contributor Author

This PR has been superseded by #3608, which uses @homm's static extrema-parsing function code instead of a prologue macro.

@fish2000 fish2000 closed this Jan 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants