Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Use subclass for parsing EXIF data #3663

Closed
wants to merge 6 commits into from

Conversation

Glandos
Copy link
Contributor

@Glandos Glandos commented Feb 18, 2019

The old method (using dict() and _fixup_dict) create clones from IFD_v1, but it read all items when iterating, losing all benefits from the lazyness of IFD.
The new method use a custom subclass that try to prevent all iterations as much as possible, keeping all tags undecoded.

The return value isn't a dict anymore, but a subclass of ImageFileDirectory_v1 that doesn't support tov2(). All decoded tags are formatted as before.

With a test image, I'm going from:

%timeit Image.open("Tests/images/exif-dpi-zerodivision.jpg") 1000 loops, best of 5: 485 µs per loop

to

%timeit Image.open("Tests/images/exif-dpi-zerodivision.jpg") 1000 loops, best of 5: 257 µs per loop

With a more common JPEG image from my Panasonic TZ10 camera, I'm going from:

%timeit Image.open("/home/photos/P1040526.JPG") 1000 loops, best of 5: 3.24 ms per loop

to

%timeit Image.open("/home/photos/P1040526.JPG") 1000 loops, best of 5: 871 µs per loop

I hope that even if the return type of _getexif is not the same, its compatibility with the previous format will make it merged.

@radarhere
Copy link
Member

Would you be able to add tests?

@Glandos
Copy link
Contributor Author

Glandos commented Feb 18, 2019

Since no external interfaces were modified, I don't see the point to add any tests.

Of course, we could imagine tests for this new class, but they should already be covered by the existing EXIF tests. In fact, existing tests helped me a lot to avoid making useless mistakes.

Could you be more precise on which part of the code could be tested?

@radarhere
Copy link
Member

radarhere commented Feb 19, 2019

I would think that adding a new class constitutes a new interface, and so I would suggest adding independent tests for it. Feel free to disagree, but on the other hand, if you think that this class should not be used by end users, then why have the ExifImageFileDirectory.to_v2 method?

@Glandos
Copy link
Contributor Author

Glandos commented Feb 20, 2019

My goal was to return lazy version of the current dict in _getexif, with some optimized version of ImageFileDirectory_v1 that is currently used.

So normally, the output class should be transparent. However, the to_v2 override is here to prevent misuse. I can also simply delete the method, so that it is impossible to call.

I don't know if it's possible (or even desirable) to hide this class declaration.

In case you really want tests for that new class, I will need some help, because there is currently no test for ImageFileDirectory_v1, there is only tests for _getexif that uses this class.

Glandos and others added 5 commits March 29, 2019 21:38
The old method (using dict() and _fixup_dict) create clones from IFD_v1,
but it read all items when iterating, losing all benefits from the lazyness
of IFD.
The new method use a custom subclass that try to prevent all iterations
as much as possible, keeping all tags undecoded.

The return value isn't a dict anymore, but a subclass of ImageFileDirectory_v1
that doesn't support tov2(). All decoded tags are formatted as before.
The _fixup does exactly the opposite
Co-Authored-By: Glandos <bugs-github@antipoul.fr>
Co-Authored-By: Glandos <bugs-github@antipoul.fr>
src/PIL/JpegImagePlugin.py Outdated Show resolved Hide resolved
Co-Authored-By: Glandos <bugs-github@antipoul.fr>
@Glandos
Copy link
Contributor Author

Glandos commented Apr 18, 2019

It seems that the merge of #3625 make this request invalid, but not irrelevant.

I still have the following performance, when opening 1000 times the same JPEG image with a lot of EXIF tags produced by a camera:

Test

It seems that I can adapt this request to the new layout. Do you have any objections on that before I spend some of my spare time on it?

@radarhere
Copy link
Member

The image link from your last comment no longer works.

I've created PR #4031 as a new version of this PR - see what you think.

@Glandos
Copy link
Contributor Author

Glandos commented Aug 20, 2019

The image link from your last comment no longer works.

I really dislike Github for their obscure retention policy.

I've created PR #4031 as a new version of this PR - see what you think.

Many, many thanks for taking care of that. It is really different from mine, but the code has changed too, and I think it does the job.

@Glandos Glandos closed this Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants