Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: fix DLL not found error in Python 3.8+ #305

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

danyeaw
Copy link
Contributor

@danyeaw danyeaw commented Dec 19, 2022

This was also attempted in #248 and #277.

In Python 3.8, Python changed the behavior to no longer automatically import DLLs from the Path. Unfortunately, Pycairo is effectively broken in Windows when building from source.

As a user, I should be able to follow any Pycairo example in Windows, and it should just work.

Right now, that is not possible for users to run import cairo without calling os.add_dll_directory first.

It also isn't possible to package an app using PyInstaller in Windows with Pycairo built from source because the hooks perform their analysis in an isolated subprocess, and there's no way to propagate the DLL search paths. Reference issue: pyinstaller/pyinstaller#7333

Using the wheels built for Pycairo isn't possible because they are statically linked, and I need a dynamic linking to use GTK and Cairo together. This was a clever idea though for a casual user just trying to make a quick app using pycairo standalone.

In #277, I thought this change wasn't needed, but it was only because I was using a patched version of PyGObject that was already adding the DLL directory.

This upstreams a patch from gvsbuild. In Python 3.8, a new
os.add_dll_directory method was added to make loading DLLs in Windows
more consistent. The impact of this is that even if I compile GTK
successfully with MSVC, and add it to the Path, INCLUDE, and LIB, I am
still not able to pip install pycairo in Windows. This finds the first
GTK DLL location from the Windows PATH variable.
Comment on lines 6 to 19
if sys.platform == "win32" and sys.version_info >= (3, 8):
env_path = os.environ.get("PATH", "").split(os.pathsep)
first_gtk_dll_path = next(
filter(
lambda path: path is not None
and Path.is_file(Path(path) / "cairo.dll"),
env_path,
),
None,
)
if first_gtk_dll_path:
with os.add_dll_directory(first_gtk_dll_path):
from ._cairo import * # noqa: F401,F403

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to do this, how about moving this code into a file called cairo/_distributor_init.py, following how numpy does it, https://github.com/numpy/numpy/blob/1d5b3397dc7beaf9ca7b2326127e74f55aecd162/numpy/_distributor_init.py? Also, I would prefer that file should by default be empty and only be added by the build system when appropriate, for example I don't want this to happen on the wheels distributed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll have to think about possible options so that if someone gets the pycairo source code and runs pip install . or does pip install --no-binary :all: pycairo it gets the DLL code, but it doesn't with the statically built wheel.

Would you be open to the CI deleting some code during build?

Maybe we could detect that the .lib files exist during runtime, and if they don't, search for DLLs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also catch a DLL exception using a try-except, and only try to search for the DLLs if the user is about to get an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Anything should work but having it separated into a separate file like that is better. @lazka may have a better opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up separating the DLL search logic in to a separate windows_init module. However, we will break the imports need to be at the module level rule if we move all the logic.

@lazka
Copy link
Member

lazka commented Dec 19, 2022

I'm fully aware that this is frustrating and packag devs and users have to deal with this somehow without any proper guidance from CPython upstream, so the following is a bit of a rant, not directed at you or this PR...

In Python 3.8, Python changed the behavior to no longer automatically import DLLs from the Path. Unfortunately, Pycairo is effectively broken in Windows when building from source.

Like every other package linking against external software

As a user, I should be able to follow any Pycairo example in Windows, and it should just work.

Right now, that is not possible for users to run import cairo without calling os.add_dll_directory first.

As it is intended and recommended by CPython developers, for security reasons.

It also isn't possible to package an app using PyInstaller in Windows with Pycairo built from source because the hooks perform their analysis in an isolated subprocess, and there's no way to propagate the DLL search paths. Reference issue: pyinstaller/pyinstaller#7333

Why doesn't it allow setting a dll path in that process? How does it handler other Python packages linking against external libraries?

What if we check a PYCAIRO_DLL_DIRECTORY env var instead? to make sure this is only active during build time, and not when run on some random Windows machine?

Using the wheels built for Pycairo isn't possible because they are statically linked, and I need a dynamic linking to use GTK and Cairo together. This was a clever idea though for a casual user just trying to make a quick app using pycairo standalone.

Yeah, that was one of the reasons why I didn't want to provide wheels initially, but it helped more users that it hurt in the end.

In #277, I thought this change wasn't needed, but it was only because I was using a patched version of PyGObject that was already adding the DLL directory.

@danyeaw
Copy link
Contributor Author

danyeaw commented Dec 20, 2022

I'm fully aware that this is frustrating and packag devs and users have to deal with this somehow without any proper guidance from CPython upstream, so the following is a bit of a rant, not directed at you or this PR...

No offense taken 😉. I want pycairo to be awesome as well, so I'm here to help us find a good solution.

Like every other package linking against external software

We don't expect Linux and macOS users to explicitly say where their .so files are located before they run pycairo. I feel like Windows users should have a similar experience.

As it is intended and recommended by CPython developers, for security reasons.

Correct, although I think the threat is pretty minimal if we are only adding very specific DLLs and not all DLLs found on the Path.

They were definitely also protecting for the use case that libraries bundle their own libs instead of only relying on end users to explicitly declare a search path. In the Enable better DLL resolution issue, the solution they picked which zooba called number 1 was specifically to allow for libraries to package their own libs. This is a little different from that I agree though, we aren't talking about packaging our own libs but linking to them.

Why doesn't it allow setting a dll path in that process? How does it handler other Python packages linking against external libraries?

It does allow setting the DLL path if a dependency library uses add_dll_directory, for example PyInstaller tracks calls to add_dll_directory during package initialization.

What if we check a PYCAIRO_DLL_DIRECTORY env var instead? to make sure this is only active during build time, and not when run on some random Windows machine?

This might be a good balance to do something like this. Wouldn't we need it set at runtime to find the DLL though? I can build pycairo fine, it is running pycairo where I get errors.

@lazka
Copy link
Member

lazka commented Dec 20, 2022

Like every other package linking against external software

We don't expect Linux and macOS users to explicitly say where their .so files are located before they run pycairo. I feel like Windows users should have a similar experience.

I should have said "Like every other package linking against external software, on Windows"

As it is intended and recommended by CPython developers, for security reasons.

Correct, although I think the threat is pretty minimal if we are only adding very specific DLLs and not all DLLs found on the Path.

Our extension only looks for cairo.dll, or libcairo-2.dll on the PATH with older CPython, just like your code. So there is no difference from what I see.

They were definitely also protecting for the use case that libraries bundle their own libs instead of only relying on end users to explicitly declare a search path. In the Enable better DLL resolution issue, the solution they picked which zooba called number 1 was specifically to allow for libraries to package their own libs. This is a little different from that I agree though, we aren't talking about packaging our own libs but linking to them.

Why doesn't it allow setting a dll path in that process? How does it handler other Python packages linking against external libraries?

It does allow setting the DLL path if a dependency library uses add_dll_directory, for example PyInstaller tracks calls to add_dll_directory during package initialization.

What if we check a PYCAIRO_DLL_DIRECTORY env var instead? to make sure this is only active during build time, and not when run on some random Windows machine?

This might be a good balance to do something like this. Wouldn't we need it set at runtime to find the DLL though? I can build pycairo fine, it is running pycairo where I get errors.

By build time I meant the "PyInstaller" part. What does PyInstaller do with the found DLLs via add_dll_directory? Copy them somewhere?

@danyeaw
Copy link
Contributor Author

danyeaw commented Dec 20, 2022

Our extension only looks for cairo.dll, or libcairo-2.dll on the PATH with older CPython, just like your code. So there is no difference from what I see.

I didn't understand this, are you saying that Pycairo is already trying to automatically load the cairo DLLs in Windows with the latest Python?

By build time I meant the "PyInstaller" part. What does PyInstaller do with the found DLLs via add_dll_directory? Copy them somewhere?

Yes, it packages them with the app.

Add support for an environmental variable PY_DLL_DIR to explicitly set a
DLL directory in Windows. Only search for DLLs in the path if PY_DLL_DIR
isn't set, and we get an ImportError.
@danyeaw
Copy link
Contributor Author

danyeaw commented Dec 21, 2022

I made some improvements today to the PR:

  1. Reduced the import of libraries to just os try to help ensure that this minimally impacts load up speed of pycairo
  2. Enabled a PY_DLL_DIR environmental variable for users to explicitly specify where to load DLLs
  3. Only search for DLLs in the path if there is an ImportError
  4. Switched from using filter to a generator to make things more pythonic
  5. Removed the extra load code from the test initialization
  6. I offloaded as much as I could in to a separate module, while keeping module level imports

I also did a bit of research to make sure that this approach is consistent with other similar libraries. PySide2 has a similar DLL search routine in its __init__.py file.

@danyeaw
Copy link
Contributor Author

danyeaw commented Aug 20, 2023

I did some more research about this today, and another option to solve this may be to bundle the DLL.
Other projects like numpy are using delvewheel to mangle the DLL names and copy them in to the wheel itself. With this approach, then there is no need to add a DLL search path.

I ran delvewheel repair .\pycairo-1.24.1-cp311-cp311-win_amd64.whl and it added the following files to a pycairo.libs folder in the wheel:

-----          2023-08-19    21:45        1314816 cairo-2-911d9408ba893bb7dfb6fc000f437c51.dll
-----          2023-08-19    21:45         325632 fontconfig-1-5a3e967bac0b518a42681fe96da1254a.dll
-----          2023-08-19    21:45         764416 freetype-6-4d157d0af160bf79091a3941d344146b.dll
-----          2023-08-18    23:04         205312 libexpat-32a51a377aab8070e54717fafbb53c7a.dll
-----          2023-08-19    21:45         261632 libpng16-8a1769d45baac4bc8e07d0fc48ae8b9d.dll
-----          2023-08-18    23:05         715264 pixman-1-0-f14bd97a45972824ebde5a8f4de5b44b.dll
-----          2023-08-18    22:52          90112 zlib1-ea5571e53626d4c436f912f06a19b918.dll

I noticed in the setup.py, we already have the cairo.dll as part of the package data, but I am not sure if that is still being used still. Also, maybe this approach overlaps with the cairo static build approach that the current wheels are using?

I also saw some projects using the cairo/_distributor_init.py file that @naveen521kk suggested above, and then in the main __init__.py, doing a star import.

Any thoughts on either of these two approaches?

@naveen521kk
Copy link
Contributor

I noticed in the setup.py, we already have the cairo.dll as part of the package data, but I am not sure if that is still being used still.

Yeah previously cairo.dll was included in the wheel, but now it's not being used.

Also, maybe this approach overlaps with the cairo static build approach that the current wheels are using?

Could be. If we were to let the build-system/packager add cairo/_distributor_init.py and leave it empty otherwise then this approach should work.

@danyeaw
Copy link
Contributor Author

danyeaw commented Aug 20, 2023

@naveen521kk maybe we could define who this build-system/packager is since the packaging ecosystem in Python is really complex, and I'm not sure I understand. I'm also not in any way blaming Pycairo for this, I think this change in Python 3.8 left lots of libraries on the hook to individually find solutions which isn't great.

  1. Packaging Python libraries and tools - this is where pycairo fits, it is a library aimed at Python developers in order to build Python applications on top of what Cairo provides. Packaging in this category consists of Modules, Python Source Distributions, and Python Binary Distributions, and pycairo publishes a sdist and wheels to PyPI for this use case. For the current binaries provided, they are dynamically linked for Linux and macOS, and statically linked for Windows. Some libraries build on top of other libraries, like mathplotlib's mplcairo makes use of pycairo.

  2. Packaging Python applications - applications that make use of pycairo fit here like Gaphor as an example. There are various ways to package applications including PEX, anaconda, freezers, images, containers, VMs, and even hardware with software loaded. Developers of these applications make use of the library packaging above in order to do development of their applications.

  3. Operating System packages - this is a whole other category for when operating systems have their own software distribution system. Linux distribution packages and MSYS2 fits in to this category. It does have some overlap with the packaging above because some developers set up their development environments using these packages and some users install applications using them as well. However, this is only about a 1/3 of Python developers, and many developers use Python.org downloaded versions, anaconda, pyenv, or other ways to install their development environment (source).

Right now, the workflow to build pycairo (1) is broken on Windows because I have to patch it manually, and I am not able to pip install the package from PyPI. This impacts the ability for me to package an application (2) using a freezer because it doesn't know where to find the DLLs.

So when you say build-system/packager, do you mean operating system packagers (3) should be adding the cairo/_distributor_init.py? If so, that doesn't really fix the use cases for 1 and 2. What are the concerns about including this logic in Pycairo?

@naveen521kk
Copy link
Contributor

Sorry for not being clear. By "build-system/packager" I meant anyone who builds wheels or packages the application, not just OS packagers. Only they would know the exact path where to find the libcairo* DLL file and add the necessary code for it. Sometimes the build-system may know the path to find them and could add the necessary code.

I know that this doesn't actually fix the issue, where, one can build pycairo from the source and expect it to run on Windows. That's the reason we provide wheels on that Platform. For application developers, I think they could build their own wheels and patch it using delvewheel or something similar and use cairo/_distributor_init.py as necessary. Or we could teach pyinstaller (or similar) to work around that as necessary.

I'll elaborate on a use case for cairo/_distributor_init.py. Let's say that we are building a wheel and using delvewheel for patching the wheel, then they would as well add the necessary code to cairo/_distributor_init.py such that it adds the directory pycairo.libs to the search path. Or it would be even better if delvewheel directly added those DLLs in cairo/ adjacent to the _cairo module. In that case, it would just work as it's adjacent to the module requiring it.

What are the concerns about including this logic in Pycairo?

Which logic do you mean? Is it about using cairo/_distributor_init.py for searching the PATH? In which case I don't like the idea of searching the PATH for finding the DLL and adding it to add_dll_directory being the default way (as it is against what is recommended by CPython developers).

@danyeaw
Copy link
Contributor Author

danyeaw commented Aug 21, 2023

Hi @naveen521kk, thanks for the response!

Sorry for not being clear. By "build-system/packager" I meant anyone who builds wheels or packages the application, not just OS packagers. Only they would know the exact path where to find the libcairo* DLL file and add the necessary code for it. Sometimes the build-system may know the path to find them and could add the necessary code.

In the Python ecosystem, the project that develops Python libraries and tools are the same people who distribute the sdist and wheel on PyPI. This is the 1st use case.

I know that this doesn't actually fix the issue, where, one can build pycairo from the source and expect it to run on Windows. That's the reason we provide wheels on that Platform.

But because these wheels are statically built, they only work for a very small set of use cases that pycairo is used for. They won't work for PyGObject development.

For application developers, I think they could build their own wheels and patch it using delvewheel or something similar and use cairo/_distributor_init.py as necessary.

I don't know of any other Python libraries that require developers to patch the library and rebuild it in order to do a pip install. This isn't a good developer experience.

Which logic do you mean? Is it about using cairo/_distributor_init.py for searching the PATH?

Yup, the logic in this PR.

In which case I don't like the idea of searching the PATH for finding the DLL and adding it to add_dll_directory being the default way (as it is against what is recommended by CPython developers).

What do you see as the recommendation by the CPython developers? The release notes for Python 3.8 specifically talk about adding this add dll function for this purpose. Would it be acceptable if we didn't search on the PATH at all and only looked for another environmental variable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants