You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importfitzprint(fitz.__doc__)
doc=fitz.open('2-p1.pdf')
forpageindoc:
allSpans=page.get_texttrace()
print(f"{page.number}, # of spans={len(allSpans)}")
on some pdf file, for example, 2-p1.pdf
will cause
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 0: invalid continuation byte
The above exception was the direct cause of the following exception:
SystemError: <class 'UnicodeDecodeError'> returned a result with an error set
<same messages repeats many times>
File "D:\Program Files\Python\Python37\lib\site-packages\fitz\fitz.py", line 6278, in get_texttrace
val = _fitz.Page_get_texttrace(self)
SystemError: <built-in function Page_get_texttrace> returned a result with an error set
My system
PyMuPDF 1.21.0: Python bindings for the MuPDF 1.21.0 library.
Version date: 2022-11-08 00:00:01.
Built for Python 3.7 on win32 (64-bit).
I tried also 1.19.6 and 1.20.2. All give this same error.
The text was updated successfully, but these errors were encountered:
This is being caused by a font name in the file, that cannot be interpretated as UTF-8. So a fallback to escape decoding must be used - which happens for (hopefully) all other places where font names are extracted.
This occasion was previously undetected, but it is an easy change.
Python C function `Py_BuildValue("s", fontname)` will fail if fontname is not UTF8-encoded.
Use PyUnicodeRawEscape function for fontnames instead - like everywhere else in PyMuPDF.
Running this script
on some pdf file, for example, 2-p1.pdf
will cause
My system
I tried also 1.19.6 and 1.20.2. All give this same error.
The text was updated successfully, but these errors were encountered: