Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary check in tight loop in ImagingFill2 #6649

Closed
time4tea opened this issue Oct 8, 2022 · 1 comment · Fixed by #6650
Closed

Unnecessary check in tight loop in ImagingFill2 #6649

time4tea opened this issue Oct 8, 2022 · 1 comment · Fixed by #6650

Comments

@time4tea
Copy link

time4tea commented Oct 8, 2022

While looking at the performance of font rendering, and investigating the possible use of the FreeType cache code - something caught my eye in ImagingFill2 - fill_mask_L - A loop-invariant is calculated in each loop.

https://github.com/python-pillow/Pillow/blob/main/src/libImaging/Paste.c#L441

When pasting into an RGBA buffer, this code appears to call strcmp() 5 times for each pixel.

Looking at callgrind I get:

3,338,238,474 (26.97%)  ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2 [/lib/x86_64-linux-gnu/libc.so.6]
2,880,392,864 (23.27%)  /tmp/pip-install-qq5i177l/pillow_285763a52f8c48c0885d3a94de619b1e/src/libImaging/Paste.c:ImagingFill2 [/home/richja/dev/gopro-graphics/venv/lib/python3.10/site-packages/PIL/_imaging.cpython-310-x86_64-linux-gnu.so]

(you don't see fill_mask_L in this output as is is inline )

Something like this might be more efficient:

  int special_mode = (strcmp(imOut->mode, "RGBa") == 0 || strcmp(imOut->mode, "RGBA") == 0 || strcmp(imOut->mode, "La") == 0 || strcmp(imOut->mode, "LA") == 0 || strcmp(imOut->mode, "PA") == 0);

        for (y = 0; y < ysize; y++) {
            UINT8 *out = (UINT8 *)imOut->image[y + dy] + dx * pixelsize;
            UINT8 *mask = (UINT8 *)imMask->image[y + sy] + sx;
            for (x = 0; x < xsize; x++) {
                for (i = 0; i < pixelsize; i++) {
                    UINT8 channel_mask = *mask;
                    if ( special_mode && i != 3 && channel_mask != 0) {
                        channel_mask = 255 - (255 - channel_mask) * (1 - (255 - out[3]) / 255);
                    }
                    out[i] = BLEND(channel_mask, out[i], ink[i], tmp1);
                }
                out += pixelsize;
                mask++;
            }
        }
@radarhere
Copy link
Member

Thanks. I've created PR #6650 to resolve this.

@radarhere radarhere changed the title Performance: unnecessary check in tight loop in ImagingFill2 Unnecessary check in tight loop in ImagingFill2 Oct 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants