Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable JPEG2000 format support #410

Merged
merged 1 commit into from Feb 2, 2022
Merged

enable JPEG2000 format support #410

merged 1 commit into from Feb 2, 2022

Conversation

caerulescens
Copy link
Contributor

@caerulescens caerulescens commented Feb 1, 2022

Summary

When a JPEG2000 image is loaded with pillow and run using pytesseract, an exception is raised: TypeError: Unsupported image format/type. pillow and tesseract support JPEG2000 format images, and pytesseract should support the union of their behavior. Support for JPEG2000 images using pillow is enabled by adding JPEG2000 to SUPPORTED_FORMATS in pytesseract.


I included a file test.jpeg2000 image for a jpeg2000 test; the image was created by taking the test.png in tests/data and converting it to JPEG2000 format using the below:

import io
from PIL import Image

with open('test.png', 'rb') as f:
    image_data = f.read()
buffer = io.BytesIO(image_data)
image = Image.open(buffer)
image.save("test.jpeg2000", "JPEG2000")

@caerulescens caerulescens changed the title feat: enable JPEG2000 support enable JPEG2000 format support Feb 1, 2022
@bozhodimitrov bozhodimitrov merged commit d32bbb5 into madmaze:master Feb 2, 2022
@bozhodimitrov
Copy link
Collaborator

Thanks for your contribution @caerulescens

@caerulescens caerulescens deleted the enable-jpeg2000-support branch February 3, 2022 18:41
@caerulescens caerulescens restored the enable-jpeg2000-support branch February 3, 2022 18:42
@caerulescens caerulescens deleted the enable-jpeg2000-support branch February 3, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'JPEG2000' images are supported by PIL and Tesseract-OCR, but not pytesseract
2 participants