Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in fitz.py #2076

Closed
rikfaith opened this issue Nov 21, 2022 · 6 comments
Closed

Segfault in fitz.py #2076

rikfaith opened this issue Nov 21, 2022 · 6 comments
Labels
upstream bug bug outside this package

Comments

@rikfaith
Copy link

Describe the bug (mandatory)

When iterating over fitz.Document(), a segfault occurs:

Fatal Python error: Segmentation fault

Current thread 0x00007ff9fbecd040 (most recent call first):
File "/usr/lib/python3/dist-packages/fitz/fitz.py", line 4241 in page_count
File "/usr/lib/python3/dist-packages/fitz/fitz.py", line 5418 in contains
File "/usr/lib/python3/dist-packages/fitz/fitz.py", line 5440 in getitem
File "/home/faith/git/lustro/lustro/epub.py", line 51 in _extract_text
File "/home/faith/git/lustro/lustro/epub.py", line 167 in read
File "/home/faith/git/lustro/lustro/identify.py", line 161 in identify
File "/home/faith/git/lustro/lustro/main.py", line 135 in main
File "/home/faith/git/lustro/bin/lustro", line 11 in

Extension modules: fitz._fitz, unrardll.unrar, PIL._imaging, tesserocr, psutil._psutil_linux, psutil._psutil_posix (total: 6)

To Reproduce (mandatory)

Try to read an epub that causes https://bugs.ghostscript.com/show_bug.cgi?id=706093

Sorry, I cannot provide the epub.

Expected behavior (optional)

No segfault.

Screenshots (optional)

None.

Your configuration (mandatory)

python 3.11.0+ (main, Nov 4 2022, 09:23:33) [GCC 12.2.0]
PyMuPDF 1.21.0 (20221108000001), from debian package "python3-fitz"
Linux Linux-5.19.0-2-amd64-x86_64-with-glibc2.36
Debian GNU/Linux bookworm/sid

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

3.10.8 (main, Nov 4 2022, 09:21:25) [GCC 12.2.0]
linux
PyMuPDF 1.21.0: Python bindings for the MuPDF 1.21.0 library.
Version date: 2022-11-08 00:00:01.
Built for Python 3.10 on linux (64-bit).

Additional context (optional)

Mostly I wanted to call your attention to https://bugs.ghostscript.com/show_bug.cgi?id=706093

When PyMuPDF builds, does it depend on a fully built mupdf tree to access the libmupdf.a static library? I.e., vs. using a dynamic linkage that I could patch around until upstream fixes the problem.

@julian-smith-artifex-com
Copy link
Collaborator

Default builds of PyMuPDF (e.g. the wheels on pypi.org) statically link with an internally-built libmupdf.a library. So you can't use dynamic linkage to use a patched MuPDF.

Instead, you will need to build PyMuPDF yourself, setting PYMUPDF_SETUP_MUPDF_BUILD to the path of your patched mupdf/ directory. Also see: https://pymupdf.readthedocs.io/en/latest/installation.html#install-from-source-without-using-an-sdist

@JorjMcKie
Copy link
Collaborator

JorjMcKie commented Nov 22, 2022

@rikfaith - it looks like you already have identified this as a MuPDF problem, haven't you?
At least I couldn't see what went wrong in PyMuPDF specifically.
It this is true, we would like to appropriately flag this issue as "upstream" problem.

@rikfaith
Copy link
Author

rikfaith commented Nov 22, 2022

@JorjMcKie Yes, I believe this is an upstream problem, and can be flagged as such.

@JorjMcKie JorjMcKie added the upstream bug bug outside this package label Nov 22, 2022
@rikfaith
Copy link
Author

@julian-smith-artifex-com Thanks for the name of the environment variable and the pointer to the build instructions. I did a "make shared" in the patched mupdf tree, then a "python3 setup.py build" in the PyMuPDF tree, and successfully extracted text from the epub without getting a segfault.

@julian-smith-artifex-com
Copy link
Collaborator

Great, i'm glad it worked for you.

[You didn't actually need to do make shared in mupdf - PyMuPDF/setup.py does its own mupdf build, typically in mupdf/build/pymupdf-x86_64-release.]

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in PyMuPDF-1.21.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream bug bug outside this package
Projects
None yet
Development

No branches or pull requests

3 participants