Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump when importing pyhanko in pdfarranger #13

Closed
plenaerts opened this issue Apr 14, 2021 · 16 comments
Closed

Coredump when importing pyhanko in pdfarranger #13

plenaerts opened this issue Apr 14, 2021 · 16 comments
Labels
bug Something isn't working

Comments

@plenaerts
Copy link

I'm trying to include signing features in pdfarranger using pyhanko, but on my first import of pyhanko modules the application coredumps. This is the very specific line which causes the coredump: https://github.com/plenaerts/pdfarranger/blob/a96db16096ca80e2af01be4f151c5e41fc58b1a6/pdfarranger/signer.py#L104

I tested using the pyhanko API in this script but once I try reusing this code in the gtk application I don't get to import pyhanko modules.

I don't have a clue where to start looking what causes this. Anyone else?

@plenaerts
Copy link
Author

I tried using gdb to debug the python process. I got that idea here: https://stackoverflow.com/a/46228627
So, I started gdb using gdb python3 in my activated venv and loaded the GTK application.
At importing pyhanko I get this:

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fffafab98ea in load_fribidi () from /home/pieter/src/pdfarranger/venv/lib/python3.8/site-packages/PIL/_imagingft.cpython-38-x86_64-linux-gnu.so

So, I'm guessing PIL is where I should look. Potentially PIL getting imported twice?

@MatthiasValvekens
Copy link
Owner

Interesting... I'm also assuming that PIL is the culprit here. I have no idea how PIL / Pillow is structured internally, but there was a relatively recent change to PIL that changed the way FriBiDi is loaded, which may or may not be related: python-pillow/Pillow#5062.

Which version of PIL are you running? My dev environment is on 8.1.0, for what it's worth.

Second question: do you actually need bitmap image support? Because if not, I could try to lazify the PIL import so that at least it won't cause segfaults in unrelated code. PyHanko only depends on PIL for importing background images, basically.

@MatthiasValvekens
Copy link
Owner

I updated Pillow to 8.2.0, but I'm not seeing the behaviour you're reporting on my end. According to my logs, the last CI run also used 8.2.0, and so far I'm not seeing anything there.

I'm just guessing here, but it could be that the GTK version you use depends on a version of FriBiDi that's binary incompatible with the one that this version of Pillow expects, probably because the latter is more recent. Perhaps combing through the dependencies with ldd would shed some light here? I haven't tested pyHanko with pre-8.x versions of Pillow, but there's a good chance that pyHanko doesn't care if you downgrade Pillow.

Again, if you don't need the image support, I'm happy to restructure my imports to make sure that PIL only gets imported on-demand as opposed to automatically :)

MatthiasValvekens added a commit that referenced this issue Apr 14, 2021
The pyhanko.stamp module now defers importing the modules for barcode
and image support until it needs them. See issue #13.
@MatthiasValvekens
Copy link
Owner

Could you check if the commit I just pushed fixes your issue? Or at least delays the segfault until you try to use an image / QR code ;)

Thanks!

@MatthiasValvekens MatthiasValvekens added the bug Something isn't working label Apr 14, 2021
@plenaerts
Copy link
Author

plenaerts commented Apr 14, 2021 via email

@MatthiasValvekens
Copy link
Owner

I agree, but if I'm not mistaken, the fix I just pushed should at least make it so that importing pyHanko doesn't crash the interpreter.

Could you paste your Python version and the output of pip freeze here? Then I could also try my hand at reproducing the issue you're seeing :)

@plenaerts
Copy link
Author

plenaerts commented Apr 14, 2021

Sorry, last comments crossed :-)
Your commit delaying the import of PIL seems to fix my issue at first sight, but :-)
There may be scenario's where we still run into the same?

I had to fiddle a bit to get a clean requirements list because of pdfarranger depencies on DistUtilsExtra, gi and cairo, which I've not installed via pip but used three respective variants of cp -rs /usr/lib/python3/dist-packages/gi. (I don't know if pip can work for these libs?)

But, I have:

$ dpkg -l python3-gi python3-gi-cairo python3-distutils-extra 
Gevraagd=(U)onbekend/(I)nstalleren/ve(R)wijderen/(P)wissen/(H)ouden
| Status=Niet/Inst/Conf/Uitgep/halF-geconf/Halfgeïnst/verWacht-trig/Trig-bezig
|/ Fout?=(geen)/heRinst. nodig/ (Status,Fout: hoofdletter=ernstig)
||/ Naam                    Versie       Architectuur Omschrijving
+++-=======================-============-============-=====================================================
ii  python3-distutils-extra 2.45         all          enhancements to the Python3 build system
ii  python3-gi              3.38.0-1     amd64        Python 3 bindings for gobject-introspection libraries
ii  python3-gi-cairo        3.38.0-1     amd64        Python 3 Cairo bindings for the GObject library

and

$ pip freeze
asn1crypto==1.4.0
cached-property==1.5.2
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
cryptography==3.4.7
fonttools==4.22.0
idna==2.10
lxml==4.6.3
oscrypto==1.2.1
pdfarranger==1.7.1
pikepdf==2.11.1
Pillow==8.2.0
pycparser==2.20
pyHanko==0.6.0.dev1
pyhanko-certvalidator==0.14.1
python-barcode==0.13.1
python-dateutil==2.8.1
python-pkcs11==0.7.0
pytz==2021.1
PyYAML==5.4.1
qrcode==6.1
requests==2.25.1
six==1.15.0
tzlocal==2.1
urllib3==1.26.4

So, I think there may be scenario's where the loading of PIL first by pdfarranger vs first by pyhanko may be an issue?

@plenaerts
Copy link
Author

My code works now in plenaerts/pdfarranger@7aa54a0 . Very rudimentary at the moment, but I've got a signature 👍

I bet if we add an image to a pdf first, which should trigger loading PIL in pdfarranger first, we'll get to the same segfault...

@MatthiasValvekens
Copy link
Owner

OK, great! I can probably find the time to do some digging tomorrow, but I think you're on the right track here. Especially gi-cairo looks suspicious... The immediate cause of the crash was PIL's loading of FriBiDi (an implementation of the Unicode bidirectional algorithm), and I'm fairly sure that a graphics rendering lib like cairo also has that as a dependency. Given that you installed PIL/Pillow through pip and python3-gi-cairo through your system package manager, I wouldn't be surprised if there's a binary compatibility issue with the FriBiDi library there.

Let me know if you encounter any other clues.

@plenaerts
Copy link
Author

I've got my very basic flow working in pdfarranger since plenaerts/pdfarranger@34e57ed ; without segfault. So. I've no clue if that means the beid signer with stamp is simply not using PIL?

Not sure how I can check if we're safe from segfaults :-) Something tells me endusers will run into them unless they use their package managers.

@MatthiasValvekens
Copy link
Owner

MatthiasValvekens commented Apr 15, 2021

I've no clue if that means the beid signer with stamp is simply not using PIL?

That would be correct. I forgot to mention it yesterday, but the PDF graphics operators to render the default background are actually hardcoded in the source:

pyHanko/pyhanko/stamp.py

Lines 649 to 675 in 95d9cad

STAMP_ART_CONTENT = RawContent(
box=BoxConstraints(width=100, height=100),
data=b'''
q 1 0 0 -1 0 100 cm
0.603922 0.345098 0.54902 rg
3.699 65.215 m 3.699 65.215 2.375 57.277 7.668 51.984 c 12.957 46.695 27.512
49.34 39.418 41.402 c 39.418 41.402 31.48 40.078 32.801 33.465 c 34.125
26.852 39.418 28.172 39.418 24.203 c 39.418 20.234 30.156 17.59 30.156
14.945 c 30.156 12.297 28.465 1.715 50 1.715 c 71.535 1.715 69.844 12.297
69.844 14.945 c 69.844 17.59 60.582 20.234 60.582 24.203 c 60.582 28.172
65.875 26.852 67.199 33.465 c 68.52 40.078 60.582 41.402 60.582 41.402
c 72.488 49.34 87.043 46.695 92.332 51.984 c 97.625 57.277 96.301 65.215
96.301 65.215 c h f
3.801 68.734 92.398 7.391 re f
3.801 79.512 92.398 7.391 re f
3.801 90.289 92.398 7.391 re f
Q
''')
"""
Hardcoded stamp background that will render a stylised image of a stamp using
PDF graphics operators (see below).
.. image:: images/stamp-background.svg
:alt: Standard stamp background
:align: center
"""

You'll only run into issues with PIL in the following situations:

  • Using a bitmap image in a signature appearance. Here, pyHanko directly uses PIL to read image data and dump it in a format that PDF viewers expect.
  • Using barcodes and/or QR codes in signatures. I actually also implemented all of that functionality using "raw" PDF drawing operations that don't require PIL, but the libraries I import to do the actual barcode-related calculations apparently always import PIL in the background. Unfortunately there's no easy way out of that :/

So, bottom line, you should be safe as long as you don't explicitly use either of the above two features.

EDIT: ... and don't expose pyHanko's config loading to user input, because loading images from the config would of course trigger a PIL import as well.

@plenaerts
Copy link
Author

I played with a couple of options but they all come to issues on Ubuntu.

1. Import early

If I move the import of the pyHanko modules to the top of all code like here in this line in this commit, then all goes well. But then again, this can result in package developers all staking their claim on the first lines for their imports.

2. Import late

This code does the import late, when we first use pyHanko logic.

2.1 Avoid system site packages

Trying to install as much as possible of the dependencies in a venv still results in the segfault. Per instructions on https://pygobject.readthedocs.io/en/latest/getting_started.html#ubuntu-getting-started we still need package manager installed cairo and gtk stuff. That may deliver the issue.

2.2 Maximize system site packages

Trying to install as much as possible using system site packages from ubuntu packages still installs Pillow locally as ubuntu provides python3-pil at 7.2.0 but pyHanko wants at least 8.0.1. Keeping the version requirement at 8.0.1 results in the segfault.

If I remove the pyHanko dependency on Pillow 8.0.1 and reuse the system site package at 7.2.0, then I don't get the segfault. Not sure if you really need PIL 8.0.1? This might be the "best" solution for ubuntu users assuming you don't really need features in 8.0.1. I understand that is not an optimal condition.

Conclusion based on my limited views

I propose you explain to re-users of pyHanko what the catch 22 is, how they should install and how they should import.

Thanks a lot for your help! I learned a lot 👍

I'm not going to spend significantly more time on this as my problem at this moment is fixed: the code to put the sig today does not use PIL. I don't think I'll want to build fancy stamp configurations soon. I'll now focus on getting the user to choose stamp coordinates in an acceptable way in pdfarranger.

plenaerts added a commit to plenaerts/pdfarranger that referenced this issue Apr 17, 2021
@MatthiasValvekens
Copy link
Owner

Thanks for the in-depth report! I'll look into downgrading the Pillow dependency then, since as you say, it's highly likely that I don't actually need any of the new 8.x features, so if there are no breaking API changes in the parts of Pillow that I'm using, we can relax that requirement.

In the end, it probably boils down to a binary incompatibility issue, and those are very hard to avoid when combining multiple package management systems (pip and the system package manager in this case). Ubuntu/debian often lag behind package versions on PyPI (sometimes with good reason), so I'm sure that you won't be the last person to run into a problem of this nature :). I'll make a note in the documentation somewhere.

If you don't mind, I'll close this issue now. Thanks again for the investigation!

@plenaerts
Copy link
Author

👍 My pleasure! Thanks for pyHanko ;-)

MatthiasValvekens added a commit that referenced this issue Apr 17, 2021
Some of our binary dependencies are quite large and/or likely to
conflict with libraries installed on the local system (see #13).
To avoid that, this commit makes two non-core dependencies optional.

 - python-pkcs11 is now an optional dependency, and the CLI takes that
   into account.
 - Image support is now also optional, to avoid a mandatory PIL/Pillow
   dependency. Since the barcode module imports PIL in the background,
   the code responsible for generating QR codes has been factored out
   into a separate module (which is fully independent of PIL).
   One-dimensional barcodes and background images are now the only
   features that require PIL to use.
 - The minimal Pillow version has been decreased to support distros relying
   on older system binaries.
@nulano
Copy link

nulano commented Sep 18, 2021

FYI, Pillow 8.3.2 from two weeks ago should fix this issue: python-pillow/Pillow#5637 (comment)

@MatthiasValvekens
Copy link
Owner

Thanks! I'm going to keep the lower bound on Pillow as-is for now (since it doesn't really matter for the features that we need here), but it's good to know that this issue has been fixed upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants