Performance: Use subclass for parsing EXIF data #3663

Glandos · 2019-02-18T15:12:18Z

The old method (using dict() and _fixup_dict) create clones from IFD_v1, but it read all items when iterating, losing all benefits from the lazyness of IFD.
The new method use a custom subclass that try to prevent all iterations as much as possible, keeping all tags undecoded.

The return value isn't a dict anymore, but a subclass of ImageFileDirectory_v1 that doesn't support tov2(). All decoded tags are formatted as before.

With a test image, I'm going from:

%timeit Image.open("Tests/images/exif-dpi-zerodivision.jpg") 1000 loops, best of 5: 485 µs per loop

to

%timeit Image.open("Tests/images/exif-dpi-zerodivision.jpg") 1000 loops, best of 5: 257 µs per loop

With a more common JPEG image from my Panasonic TZ10 camera, I'm going from:

%timeit Image.open("/home/photos/P1040526.JPG") 1000 loops, best of 5: 3.24 ms per loop

to

%timeit Image.open("/home/photos/P1040526.JPG") 1000 loops, best of 5: 871 µs per loop

I hope that even if the return type of _getexif is not the same, its compatibility with the previous format will make it merged.

radarhere · 2019-02-18T19:10:38Z

Would you be able to add tests?

Glandos · 2019-02-18T20:25:12Z

Since no external interfaces were modified, I don't see the point to add any tests.

Of course, we could imagine tests for this new class, but they should already be covered by the existing EXIF tests. In fact, existing tests helped me a lot to avoid making useless mistakes.

Could you be more precise on which part of the code could be tested?

radarhere · 2019-02-19T08:30:02Z

I would think that adding a new class constitutes a new interface, and so I would suggest adding independent tests for it. Feel free to disagree, but on the other hand, if you think that this class should not be used by end users, then why have the ExifImageFileDirectory.to_v2 method?

Glandos · 2019-02-20T09:24:58Z

My goal was to return lazy version of the current dict in _getexif, with some optimized version of ImageFileDirectory_v1 that is currently used.

So normally, the output class should be transparent. However, the to_v2 override is here to prevent misuse. I can also simply delete the method, so that it is impossible to call.

I don't know if it's possible (or even desirable) to hide this class declaration.

In case you really want tests for that new class, I will need some help, because there is currently no test for ImageFileDirectory_v1, there is only tests for _getexif that uses this class.

src/PIL/JpegImagePlugin.py

The old method (using dict() and _fixup_dict) create clones from IFD_v1, but it read all items when iterating, losing all benefits from the lazyness of IFD. The new method use a custom subclass that try to prevent all iterations as much as possible, keeping all tags undecoded. The return value isn't a dict anymore, but a subclass of ImageFileDirectory_v1 that doesn't support tov2(). All decoded tags are formatted as before.

The _fixup does exactly the opposite

Co-Authored-By: Glandos <bugs-github@antipoul.fr>

src/PIL/JpegImagePlugin.py

Co-Authored-By: Glandos <bugs-github@antipoul.fr>

Glandos · 2019-04-18T20:10:36Z

It seems that the merge of #3625 make this request invalid, but not irrelevant.

I still have the following performance, when opening 1000 times the same JPEG image with a lot of EXIF tags produced by a camera:

It seems that I can adapt this request to the new layout. Do you have any objections on that before I spend some of my spare time on it?

radarhere · 2019-08-18T14:01:44Z

The image link from your last comment no longer works.

I've created PR #4031 as a new version of this PR - see what you think.

Glandos · 2019-08-20T13:25:45Z

The image link from your last comment no longer works.

I really dislike Github for their obscure retention policy.

I've created PR #4031 as a new version of this PR - see what you think.

Many, many thanks for taking care of that. It is really different from mine, but the code has changed too, and I think it does the job.

Glandos mentioned this pull request Feb 18, 2019

Performance: Do not store attribute in TIFF new API saimn/sigal#365

Closed

radarhere added the Exif label Feb 18, 2019

radarhere added the Performance label Feb 19, 2019

radarhere reviewed Mar 23, 2019

View reviewed changes

src/PIL/JpegImagePlugin.py Outdated Show resolved Hide resolved

radarhere reviewed Mar 23, 2019

View reviewed changes

src/PIL/JpegImagePlugin.py Outdated Show resolved Hide resolved

Glandos and others added 5 commits March 29, 2019 21:38

py2 compatibility

60c5614

remove conversion to tuple

475b417

The _fixup does exactly the opposite

Update src/PIL/JpegImagePlugin.py

56dca01

Co-Authored-By: Glandos <bugs-github@antipoul.fr>

Update src/PIL/JpegImagePlugin.py

d48b98a

Co-Authored-By: Glandos <bugs-github@antipoul.fr>

radarhere force-pushed the jpeg_exif_lazy_ifd branch from b0355d7 to d48b98a Compare March 29, 2019 10:39

hugovk reviewed Mar 30, 2019

View reviewed changes

src/PIL/JpegImagePlugin.py Outdated Show resolved Hide resolved

Fix typo

c20d301

Co-Authored-By: Glandos <bugs-github@antipoul.fr>

radarhere added the Needs Rebase label May 18, 2019

radarhere mentioned this pull request Aug 18, 2019

Lazily use ImageFileDirectory_v1 values from Exif #4031

Merged

Glandos closed this Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Use subclass for parsing EXIF data #3663

Performance: Use subclass for parsing EXIF data #3663

Glandos commented Feb 18, 2019

radarhere commented Feb 18, 2019

Glandos commented Feb 18, 2019

radarhere commented Feb 19, 2019 •

edited

Glandos commented Feb 20, 2019

Glandos commented Apr 18, 2019

radarhere commented Aug 18, 2019

Glandos commented Aug 20, 2019 •

edited by hugovk

Performance: Use subclass for parsing EXIF data #3663

Performance: Use subclass for parsing EXIF data #3663

Conversation

Glandos commented Feb 18, 2019

radarhere commented Feb 18, 2019

Glandos commented Feb 18, 2019

radarhere commented Feb 19, 2019 • edited

Glandos commented Feb 20, 2019

Glandos commented Apr 18, 2019

radarhere commented Aug 18, 2019

Glandos commented Aug 20, 2019 • edited by hugovk

radarhere commented Feb 19, 2019 •

edited

Glandos commented Aug 20, 2019 •

edited by hugovk