New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredump when importing pyhanko in pdfarranger #13
Comments
I tried using gdb to debug the python process. I got that idea here: https://stackoverflow.com/a/46228627
So, I'm guessing PIL is where I should look. Potentially PIL getting imported twice? |
Interesting... I'm also assuming that PIL is the culprit here. I have no idea how PIL / Pillow is structured internally, but there was a relatively recent change to PIL that changed the way FriBiDi is loaded, which may or may not be related: python-pillow/Pillow#5062. Which version of PIL are you running? My dev environment is on 8.1.0, for what it's worth. Second question: do you actually need bitmap image support? Because if not, I could try to lazify the PIL import so that at least it won't cause segfaults in unrelated code. PyHanko only depends on PIL for importing background images, basically. |
I updated Pillow to 8.2.0, but I'm not seeing the behaviour you're reporting on my end. According to my logs, the last CI run also used 8.2.0, and so far I'm not seeing anything there. I'm just guessing here, but it could be that the GTK version you use depends on a version of FriBiDi that's binary incompatible with the one that this version of Pillow expects, probably because the latter is more recent. Perhaps combing through the dependencies with Again, if you don't need the image support, I'm happy to restructure my imports to make sure that PIL only gets imported on-demand as opposed to automatically :) |
The pyhanko.stamp module now defers importing the modules for barcode and image support until it needs them. See issue #13.
Could you check if the commit I just pushed fixes your issue? Or at least delays the segfault until you try to use an image / QR code ;) Thanks! |
Its in the imports! I put the same import line at the very beginning of the
pdfarranger module and then I did not get the segfault.
I will get back with something working, but I think its important to find
out why we see this. You dont want N dependencies all claiming they want to
be imported first...
Op wo 14 apr. 2021 18:21 schreef Matthias Valvekens <
***@***.***>:
… I updated Pillow to 8.2.0, but I'm not seeing the behaviour you're
reporting on my end. According to my logs, the last CI run also used 8.2.0,
and so far I'm not seeing anything there.
I'm just guessing here, but it could be that the GTK version you use
depends on a version of FriBiDi that's binary incompatible with the one
that this version of Pillow expects, probably because the latter is more
recent. Perhaps combing through the dependencies with ldd would shed some
light here? I haven't tested pyHanko with pre-8.x versions of Pillow, but
there's a good chance that pyHanko doesn't care if you downgrade Pillow.
Again, if you don't need the image support, I'm happy to restructure my
imports to make sure that PIL only gets imported on-demand as opposed to
automatically :)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADNFRLUBPQBRP2ZRVAXDVNTTIW6JJANCNFSM4243SKXQ>
.
|
I agree, but if I'm not mistaken, the fix I just pushed should at least make it so that importing pyHanko doesn't crash the interpreter. Could you paste your Python version and the output of |
Sorry, last comments crossed :-) I had to fiddle a bit to get a clean requirements list because of pdfarranger depencies on DistUtilsExtra, gi and cairo, which I've not installed via pip but used three respective variants of But, I have:
and
So, I think there may be scenario's where the loading of PIL first by pdfarranger vs first by pyhanko may be an issue? |
My code works now in plenaerts/pdfarranger@7aa54a0 . Very rudimentary at the moment, but I've got a signature 👍 I bet if we add an image to a pdf first, which should trigger loading PIL in pdfarranger first, we'll get to the same segfault... |
OK, great! I can probably find the time to do some digging tomorrow, but I think you're on the right track here. Especially gi-cairo looks suspicious... The immediate cause of the crash was PIL's loading of FriBiDi (an implementation of the Unicode bidirectional algorithm), and I'm fairly sure that a graphics rendering lib like cairo also has that as a dependency. Given that you installed PIL/Pillow through pip and Let me know if you encounter any other clues. |
I've got my very basic flow working in pdfarranger since plenaerts/pdfarranger@34e57ed ; without segfault. So. I've no clue if that means the beid signer with stamp is simply not using PIL? Not sure how I can check if we're safe from segfaults :-) Something tells me endusers will run into them unless they use their package managers. |
That would be correct. I forgot to mention it yesterday, but the PDF graphics operators to render the default background are actually hardcoded in the source: Lines 649 to 675 in 95d9cad
You'll only run into issues with PIL in the following situations:
So, bottom line, you should be safe as long as you don't explicitly use either of the above two features. EDIT: ... and don't expose pyHanko's config loading to user input, because loading images from the config would of course trigger a PIL import as well. |
I played with a couple of options but they all come to issues on Ubuntu. 1. Import earlyIf I move the import of the pyHanko modules to the top of all code like here in this line in this commit, then all goes well. But then again, this can result in package developers all staking their claim on the first lines for their imports. 2. Import lateThis code does the import late, when we first use pyHanko logic. 2.1 Avoid system site packagesTrying to install as much as possible of the dependencies in a venv still results in the segfault. Per instructions on https://pygobject.readthedocs.io/en/latest/getting_started.html#ubuntu-getting-started we still need package manager installed cairo and gtk stuff. That may deliver the issue. 2.2 Maximize system site packagesTrying to install as much as possible using system site packages from ubuntu packages still installs Pillow locally as ubuntu provides python3-pil at 7.2.0 but pyHanko wants at least 8.0.1. Keeping the version requirement at 8.0.1 results in the segfault. If I remove the pyHanko dependency on Pillow 8.0.1 and reuse the system site package at 7.2.0, then I don't get the segfault. Not sure if you really need PIL 8.0.1? This might be the "best" solution for ubuntu users assuming you don't really need features in 8.0.1. I understand that is not an optimal condition. Conclusion based on my limited viewsI propose you explain to re-users of pyHanko what the catch 22 is, how they should install and how they should import. Thanks a lot for your help! I learned a lot 👍 I'm not going to spend significantly more time on this as my problem at this moment is fixed: the code to put the sig today does not use PIL. I don't think I'll want to build fancy stamp configurations soon. I'll now focus on getting the user to choose stamp coordinates in an acceptable way in pdfarranger. |
Venv scenario's in MatthiasValvekens/pyHanko#13
Thanks for the in-depth report! I'll look into downgrading the Pillow dependency then, since as you say, it's highly likely that I don't actually need any of the new 8.x features, so if there are no breaking API changes in the parts of Pillow that I'm using, we can relax that requirement. In the end, it probably boils down to a binary incompatibility issue, and those are very hard to avoid when combining multiple package management systems ( If you don't mind, I'll close this issue now. Thanks again for the investigation! |
👍 My pleasure! Thanks for pyHanko ;-) |
Some of our binary dependencies are quite large and/or likely to conflict with libraries installed on the local system (see #13). To avoid that, this commit makes two non-core dependencies optional. - python-pkcs11 is now an optional dependency, and the CLI takes that into account. - Image support is now also optional, to avoid a mandatory PIL/Pillow dependency. Since the barcode module imports PIL in the background, the code responsible for generating QR codes has been factored out into a separate module (which is fully independent of PIL). One-dimensional barcodes and background images are now the only features that require PIL to use. - The minimal Pillow version has been decreased to support distros relying on older system binaries.
FYI, Pillow 8.3.2 from two weeks ago should fix this issue: python-pillow/Pillow#5637 (comment) |
Thanks! I'm going to keep the lower bound on Pillow as-is for now (since it doesn't really matter for the features that we need here), but it's good to know that this issue has been fixed upstream. |
I'm trying to include signing features in pdfarranger using pyhanko, but on my first import of pyhanko modules the application coredumps. This is the very specific line which causes the coredump: https://github.com/plenaerts/pdfarranger/blob/a96db16096ca80e2af01be4f151c5e41fc58b1a6/pdfarranger/signer.py#L104
I tested using the pyhanko API in this script but once I try reusing this code in the gtk application I don't get to import pyhanko modules.
I don't have a clue where to start looking what causes this. Anyone else?
The text was updated successfully, but these errors were encountered: