SystemError: <built-in function Page_get_texttrace> returned a result with an error set #2045

shiu886 · 2022-11-14T10:24:36Z

Running this script

import fitz
print(fitz.__doc__)
doc = fitz.open('2-p1.pdf')
for page in doc:
        allSpans = page.get_texttrace()
        print(f"{page.number}, # of spans={len(allSpans)}")

on some pdf file, for example, 2-p1.pdf
will cause

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 0: invalid continuation byte

The above exception was the direct cause of the following exception:

SystemError: <class 'UnicodeDecodeError'> returned a result with an error set

<same messages repeats many times>

  File "D:\Program Files\Python\Python37\lib\site-packages\fitz\fitz.py", line 6278, in get_texttrace
    val = _fitz.Page_get_texttrace(self)
SystemError: <built-in function Page_get_texttrace> returned a result with an error set

My system

PyMuPDF 1.21.0: Python bindings for the MuPDF 1.21.0 library.
Version date: 2022-11-08 00:00:01.
Built for Python 3.7 on win32 (64-bit).

I tried also 1.19.6 and 1.20.2. All give this same error.

The text was updated successfully, but these errors were encountered:

julian-smith-artifex-com · 2022-11-14T11:20:18Z

Thanks for reporting this. I've reproduced it, will investigate some more later today.

JorjMcKie · 2022-11-14T14:42:43Z

This is being caused by a font name in the file, that cannot be interpretated as UTF-8. So a fallback to escape decoding must be used - which happens for (hopefully) all other places where font names are extracted.
This occasion was previously undetected, but it is an easy change.

pprint(doc.get_page_fonts(0))
[(6578, 'ttf', 'TrueType', 'ABCDEE+ËÎÌå', 'F1', 'WinAnsiEncoding'),  # this one!
 (6580, 'ttf', 'Type0', 'ABCDEE+ËÎÌå', 'F2', 'Identity-H'),  # this one!
 (4, 'ttf', 'TrueType', 'ABCDEE+Calibri', 'F6', 'WinAnsiEncoding')]

Python C function `Py_BuildValue("s", fontname)` will fail if fontname is not UTF8-encoded. Use PyUnicodeRawEscape function for fontnames instead - like everywhere else in PyMuPDF.

julian-smith-artifex-com · 2022-12-13T14:33:38Z

Fixed in PyMuPDF-1.21.1.

JorjMcKie added bug Fixed in next release labels Nov 14, 2022

This was referenced Nov 14, 2022

Fixes for #2013 and #2045 #2046

Closed

Fixes #2035 #2047

Closed

julian-smith-artifex-com removed the Fixed in next release label Dec 13, 2022

julian-smith-artifex-com closed this as completed Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SystemError: <built-in function Page_get_texttrace> returned a result with an error set #2045

SystemError: <built-in function Page_get_texttrace> returned a result with an error set #2045

shiu886 commented Nov 14, 2022

julian-smith-artifex-com commented Nov 14, 2022

JorjMcKie commented Nov 14, 2022

julian-smith-artifex-com commented Dec 13, 2022

SystemError: <built-in function Page_get_texttrace> returned a result with an error set #2045

SystemError: <built-in function Page_get_texttrace> returned a result with an error set #2045

Comments

shiu886 commented Nov 14, 2022

julian-smith-artifex-com commented Nov 14, 2022

JorjMcKie commented Nov 14, 2022

julian-smith-artifex-com commented Dec 13, 2022