Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unstable nature of qsort in Quant.c #5367

Merged
merged 5 commits into from Jun 20, 2021
Merged

Conversation

radarhere
Copy link
Member

@radarhere radarhere commented Mar 30, 2021

Resolves #5263

https://en.wikipedia.org/wiki/Quicksort

Efficient implementations of Quicksort are not a stable sort, meaning that the relative order of equal sort items is not preserved.

In other words, if qsort is given two equal items to sort, it doesn't have to return them in the same order on one platform vs another. This has lead to #5263.

#5264 solves this by adding a custom stable sort instead of qsort.

This PR is an alternative (and feel free to favour #5264 instead if you so choose). It instead keeps qsort but ensures that no two items are equal, by adding an index to the items, so that even if one dimension is equal, the second one will not be. It also reverts #5363 in the process.

Both PRs have the same code added to the test suite.

@raygard
Copy link
Contributor

raygard commented Mar 31, 2021

I have looked over the code changes to Quant.c. Looks correct to me. I have made sure it builds in Windows. I assume it's been built in Linux and assume it has been tested.

A small quibble: the Shell sort in #5264 is not stable. The object was to get the same sort order results on any platform, and incorporating the sort in Quant.c accomplishes that (assuming a deterministic sort!). Making the sort stable also makes it consistent, so that also works.

The Shell sort is only a little slower than Quicksort until the array gets fairly large (and can be faster, e.g. if the data are already nearly in order). Generally the compare function contributes heavily, so a more complex compare (as here) can slow things down. I have not tried to profile or time these two approaches so cannot say which will be faster in practice.

I'm the new guy here and @radarhere is a longtime member, so I feel I have to tread lightly. But I am curious as to the motivation. This seems to add more complexity than just using an embedded sort, and you've gone to rather a lot of effort to do it. Is there a concern about the performance, or maybe about the correctness of my Shell sort?

This approach is a sort of use of the decorate-sort-undecorate idiom. You need to allocate a new array of structs. I don't know how much memory this might take in the worst case, but based on the if (nPaletteEntries > UINT32_MAX / nPaletteEntries) { check on lines 1182 and 1389, it appears there can be up to 2**16 palette entries, so up to UINT32_MAX elements of DistanceWithIndex allocated. That's about 32GB! I don't know what other limits there are on nPaletteEntries so that may not be a problem in practice.

But I see that there are only nEntries elements sorted at a time, and that you "undecorate" those as you sort each batch, so I suspect there is a way to only "decorate" nEntries at a time as well, just before each call to qsort(), so you'd need a lot less of dwi.

I also don't know if the added time to "decorate" and "undecorate" the DistanceWithIndex elements is of any significance. Probably minuscule compared with the rest of the work in quantization.

The name _sort_ulong_ptr_keys() is now no longer appropriate for the compare function.

Your construction of the DistanceWithIndex elements uses compound literals, which I think came in C99. Aside from // -style comments, are there other non-C90 features used in PIL? Does it matter?

You've not made quantize() and quantize2() static, as I had, but maybe that's best. They are not currently called outside Quant.c but maybe they could be in the future.

I'm sure you'll choose one or the other of these approaches, and I am fine with either, but if you go with keeping the system qsort(), I think it would be good to see if you can maybe build the sort keys inside the loop that contains the call to qsort() and need only nEntries elements of dwi.

@raygard
Copy link
Contributor

raygard commented Mar 31, 2021

I have definitely not thought this through, but would this suffice? Build avgDistSortKey as was done originally in build_distance_tables(), then:

    for (i = 0; i < nEntries; i++) {
        for (j = 0; j < nEntries; j++) {
            dwi[j] = (DistanceWithIndex){
                avgDistSortKey[i * nEntries + j],
                j
            };
        }    // ; -- removed superfluous semicolon! rdg 20210402
        qsort(
            dwi,
            nEntries,
            sizeof(DistanceWithIndex),
            _sort_ulong_ptr_keys);
        for (j = 0; j < nEntries; j++) {
            avgDistSortKey[i * nEntries + j] = dwi[j].distance;
        }
    }

Only need nEntries elements in dwi. This does build in Windows and seems to work in a quick test.

@radarhere
Copy link
Member Author

You've not made quantize() and quantize2() static, as I had, but maybe that's best. They are not currently called outside Quant.c but maybe they could be in the future.

I've created #5374, to separate out that change from this discussion.

@radarhere
Copy link
Member Author

My hope with this PR is that is adds less complexity - the nuances of a sort function seem more complicated to me than temporarily replacing a single variable with a struct. If you count it up, fewer lines are modified. And if our compare function is leaving room for interpretation, then I'd rather clarify that to address what I think of as the root of the problem.

I can appreciate that your opinion is probably that inserting a custom sort function is simpler from a big picture perspective. I guess my other concern is that we're potentially re-inventing the wheel by doing so.

Don't feel you have to tread lightly (I mean, politeness is still good... you know what I mean). I don't have strong opinions here. If your PR is merged instead, then onto the next issue.

You make a good point about the memory allocation. I've pushed a commit for that and to rename the function.

@raygard
Copy link
Contributor

raygard commented Apr 2, 2021

@radarhere With your change to reduce memory (which reduces complexity as well), I have withdrawn my PR. Your newer version generates exactly the same results as my similar suggestion above; I tested. So I feel good about this version.

I do get your concerns about the sort function. My Shell sort was adapted for use as qsort() in the uClibc stdlib (https://git.uclibc.org/uClibc/tree/libc/stdlib/stdlib.c or https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/stdlib/stdlib.c near line 700) and various other places, and has been used for over 30 years, so I'm pretty confident in it :). Use it wherever you need a compact qsort().

@radarhere radarhere added the Platform A catchall for platform-related label Apr 9, 2021
@raygard
Copy link
Contributor

raygard commented Jun 15, 2021

@radarhere , I would like to see this get merged in the upcoming release. Then I will not have to make my own Windows build anymore. Are there any blockers keeping this from getting merged?

@radarhere
Copy link
Member Author

Just a case of waiting for someone else from the Pillow team to review this.

@hugovk
Copy link
Member

hugovk commented Jun 20, 2021

Are there any discernible performance impacts from this, for example resizing large images?

@radarhere
Copy link
Member Author

Using https://photojournal.jpl.nasa.gov/jpeg/PIA24472.jpg (large enough that it triggers a DecompressionBombWarning), and

import timeit
from PIL import Image

def test():
	im = Image.open("PIA24472.jpg")
	im.resize((int(im.width/2), int(im.height/2)))
	im.close()

print(timeit.timeit(test, number=100))

With this PR, I get 108.989, 108.575 and 108.075
With master, I get 109.324, 109.241 and 107.471

So no obvious differences in terms of speed.

@hugovk
Copy link
Member

hugovk commented Jun 20, 2021

How about resizing to something small, say (10, 10)?

@radarhere
Copy link
Member Author

On master, 62.347, 61.433 and 61.384.
With this PR, 62.190, 61.543 and 61.649.

So still no obvious difference.

@hugovk
Copy link
Member

hugovk commented Jun 20, 2021

Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Platform A catchall for platform-related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image.resize() can give different results in Windows than in Linux
3 participants