You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whereas calling the tesseract process on the image will generate the correct output containing each page.
Source of the bug:
When calling save on the in-memory data, pillow requires the save_all=True parameter (pillow docs) to save multiframe images on the disk. The parameter is not specified, thus the image gets truncated to the first frame.
Thanks for the reply. Nice suggestion, that actually helps me get around this in the meantime.
Using the path to the image, while image_to_osd returns the proper string, when changing the output_type to dict, the information will only contain the last frame. This made me realise the osd_to_dict function needs to be changed as well. Either with a bigger dict with page_number as primary key, or a list of current dicts. However, both these approaches break existing code using the library due to change in structure, unless the function returns different dict structures per case (singlepage/multipage). What do you think?
To confirm, tesseract supports multiframe images, as such, I've attached a sample osd generated from a 9 frame TIFF. test_osd.txt
Hello,
I have given one complete Tif file, for almost all pages img_to_data runs fine and gives ocr, but for few pages data in image_to_data returns blank dictionary.
It has no levels, blocks or any text ....blank output.
What could be the possibile reason..
Any help or hint will be appreciated.
Reproduce:
pytesseract.image_to_osd
Whereas calling the tesseract process on the image will generate the correct output containing each page.
Source of the bug:
When calling
save
on the in-memory data, pillow requires thesave_all=True
parameter (pillow docs) to save multiframe images on the disk. The parameter is not specified, thus the image gets truncated to the first frame.pytesseract/pytesseract/pytesseract.py
Line 201 in 45fe798
Possible solution
Check
Image.n_frames
before saving and set thesave_all
parameter accordinglyI can create a PR with the changes if solution sounds good enough
The text was updated successfully, but these errors were encountered: