Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide hooks freezing with for PyInstaller? #17184

Closed
htgoebel opened this issue Aug 29, 2020 · 27 comments · Fixed by #20745
Closed

Provide hooks freezing with for PyInstaller? #17184

htgoebel opened this issue Aug 29, 2020 · 27 comments · Fixed by #20745

Comments

@htgoebel
Copy link

PyInstaller maintainer here.

PyInstaller includes a "hook" for adapting it to the special needs and methods of numpy. This works quite well, anyway if numpy changes, it might take some time until PyInstaller catches up.

PyInstaller 4.0 (released a few weeks ago) added support for package developers to provide hooks with their packages. Thus numpy could deliver hooks including or excluding modules and files as needed. If you are interested in providing PyInstaller hooks with numpy, please have a look at package developers to provide hooks with their packages.

@rgommers
Copy link
Member

Thanks @htgoebel. Providing a hook seems like a good thing for numpy to do. Things that would make sense to exclude I can think of:

  • pytest
  • nose
  • all tests/ dirs
  • _struct_ufunc_tests.so and other _xxx_tests.so files
  • numy/core/include/
  • distutils
  • f2py
  • *.pxd

I can imagine there's a use case somewhere for all of those, but for normal freezing I doubt any of that is useful. So providing an easy config for PyInstaller users to get a slimmed down package may make a lot of sense.

@htgoebel
Copy link
Author

@rgommers Sound good. Please let me know if you need assistance.

One point the documentation might not be verbose enough: hooks are handled per module, so you could e.g. exclude foreignModule from numpy.somemodule which still having it as a (requried) dependency for numpy.othermodule. For this, just create a hook hook-numpy.somemodule.py

@rgommers rgommers added this to the 1.20.0 release milestone Aug 29, 2020
@rossbar
Copy link
Contributor

rossbar commented Aug 29, 2020

Related discussion on PyInstaller hooks in #16163

@Legorooj
Copy link

I'm already on this - I'm going to migrate our current suite to numpy.

@seberg
Copy link
Member

seberg commented Nov 13, 2020

Removing the milestone. Please correct me if I am wrong that the hooks are developed/maintained in PyInstaller, so that they are unrelated to our release cycle?

If you need input from us, please don't hesitate to ping.

@Legorooj
Copy link

Legorooj commented Nov 14, 2020

@seberg we're waiting on one PR (pyinstaller/pyinstaller#5168) to be finished and merged before @bwoodsend sends a PR with our entire numpy compatability suite.

Please correct me if I am wrong that the hooks are developed/maintained in PyInstaller

Currently that is the case, yes. However the idea is to move the hooks into numpy for multiple reasons, the main one being that PyInstaller has a much slower release cycle than numpy. If the hooks are in NumPy, any changes required in the hooks for a specific version of numpy will go live when that version of numpy is released, instead of a few months afterwards. And on top of that, the hooks only have to support one version of NumPy - the version they're bundled with.

@bwoodsend
Copy link
Contributor

TBH I'm not so sure that this is necessary.

I stopped working on PR (pyinstaller/pyinstaller#5168) in favour of my Conda support PR (still a work in progress) which will fix our issues with Conda+numpy in a more general way. Conda's numpy was our main incentive to transfer responsibility for this hook to someone who would know which DLLs numpy needs. Official PyPI (and even unofficial mkl) wheels are trivial for us to support. Because all DLLs numpy uses are kept in numpy's installation in site-packages, they're easy for PyInstaller to identify. Once my Conda PR is able to take care of Conda's weirdness implicitly, the numpy hook would reduce down to something like:

from PyInstaller.utils.hooks import collect_dynamic_libs

# Collect all DLLs inside numpy's installation folder, dump them into built app's root.
binaries = collect_dynamic_libs("numpy", ".")

# Submodules PyInstaller can't detect (probably because they're only imported by extension modules).
hiddenimports = ['numpy.core._dtype_ctypes']

# Remove testing and building code and packages which are referenced but aren't really dependencies. 
excludedimports = [
    "scipy", "pytest", "nose", "distutils", "f2py", "setuptools", "numpy.f2py",
    "numpy.distutils",
]

This is:

  • Not too complex.
  • Unlikely to change much as numpy evolves. Possibly a new hidden import could slip in but they're the easiest thing to fix.

So you can have the hook if you want it, but it's not such a big deal if this doesn't happen. Certainly, I'd agree that you keep this off any milestones. I'll submit a PR to numpy with the finalised hook once I get round to finishing up on PyInstaller's end (which will probably take me ages).

@seberg seberg modified the milestone: 1.21 release Nov 14, 2020
@htgoebel
Copy link
Author

@bwoodsend Even if the hook you propose looks trivial, it is not. See e.g. the list excluded imports, which needs to be maintained. And as @Legorooj wrote, PyInstaller has a much slower release cycle than numpy. Thus IMHO the best place to maintain the hook is the numpy project.

@seberg
Copy link
Member

seberg commented Nov 15, 2020

I think we are happy to include it inside NumPy. Just to note that the 1.20 release is imminent, so if you would like a hook included it would be good to have a PR by next weekend. (That timeline is not quite set in stone, and I guess we may be able to do a bit later, as this probably touches no other code, so backporting is likely fine even after an rc is released. But it is not ideal.)

@bwoodsend
Copy link
Contributor

Alright then. Including it in NumPy it is then. But it won't happen imminently so certainly don't wait for us.

We will keep a copy of the hook we send to you for a period of time. You're copy will have precedence over ours so older NumPy versions (which don't contain the hook) will use our copy (which we know is compatible) but if later on you change NumPy and your copy of the hook then updated NumPys (which will contain the hook) will correctly use your updated copy of the hook. So in theory, PyInstaller will always choose the hook which matches the version of NumPy installed without the need to backport this into older versions.

@seberg
Copy link
Member

seberg commented Nov 15, 2020

I am a bit curious whether a pluggy based system (what e.g. pytest uses), might be a simple long term solution allowing to develop hooks also as third party libraries.

@bwoodsend
Copy link
Contributor

Hooks are declared with a pkg_resources entry point called pyinstaller40 so that hooks can be provided for any package by any package - even if the packages don't match. We've recently used this to move most of our hooks into a dedicated hook package pyinstaller-hooks-contrib which gets released more regularly. Is that what you mean by a pluggy based system?

@seberg
Copy link
Member

seberg commented Nov 15, 2020

Oh, yeah, that is what I was wondering. So you could already provide a hook with a faster release cadence. But I guess it doesn't really matter too much. It is probably also fine to include it into NumPy ( at least assuming there are not recurring issues around including conda hacks).

@bwoodsend
Copy link
Contributor

Thinking about testing the hook ... I don't suppose that there are any plans to move NumPy's test suite out of the numpy package itself? There's a really nippy way of running PyInstaller on pytest.main() then pointing the resultant executable at your test suite which means you can run your entire test suite under PyInstaller-ified conditions with all the options and nicities that pytest brings (test filtering, nice tracebacks, pdb) but it can't work if the test code is intermingled with the source code.

@mattip
Copy link
Member

mattip commented Jan 3, 2022

We tend to like the current structure, since the tests related to random are in numpy/random, the tests related to linalg are in numpy/linalg, etc. If we did refactor the tests to move to tests/random, tests/linalg, ... it would mean rebasing many of the open PRs.

On the other hand, breaking out the tests might make the lives of projects like Numba and CuPy (and PyInstaller) easier.

@rgommers
Copy link
Member

rgommers commented Jan 3, 2022

If you remove numpy/conftest.py and all the tests from an install, you may be able run the tests from a repo clone against the installed numpy. I don't see why that wouldn't work.

We tend to like the current structure, since the tests related to random are in numpy/random, the tests related to linalg are in numpy/linalg, etc. If we did refactor the tests to move to tests/random, tests/linalg, ... it would mean rebasing many of the open PRs.

It also loses the ability for testing on end user installs, where we often need it to detect issues with (e.g.) building against a different BLAS library. I believe that's the original reason for everything being in-tree. If we move it out, we'd need to install it as a separate package with matching version numbering to the main package.

On the other hand, breaking out the tests might make the lives of projects like Numba and CuPy (and PyInstaller) easier.

Only if we restructure the tests and make it easier to import overrides I think. For reasonable reusability we'd need some configurability like https://github.com/data-apis/array-api-tests/.

@bwoodsend
Copy link
Contributor

If you remove numpy/conftest.py and all the tests from an install, you may be able run the tests from a repo clone against the installed numpy. I don't see why that wouldn't work.

Sumodules of a package have to be imported from the same package. If I have two copies of NumPy findable in sys.path, the first without the tests and the second with them, then import numpy resolves to the first and, on encountering an import to a test submodule (e.g. import numpy.lib.tests), Python will only look for that submodule in the first copy of numpy - the one without numpy.lib.tests. If you swap the sys.path order around, then you're only testing the copy with the tests which, in the context of testing with PyInstaller, means that you're not testing anything at all.

If we did refactor the tests to move to tests/random, tests/linalg, ... it would mean rebasing many of the open PRs.

Don't quote me on this but last time I did a similar refactoring, I found that if I did it as two commits, the first which uses only git mv and the second which fixes all the import statements to match the new file structure, then the history is preserved sufficiently that on rebasing other changes, git is able straddle the git mv commit no problem. The fixing imports commit will likely still lead to conflicts wherever a PR also touches imports but it's a lot less mess than you'd expect.

It also loses the ability for testing on end user installs, where we often need it to detect issues with (e.g.) building against a different BLAS library. I believe that's the original reason for everything being in-tree. If we move it out, we'd need to install it as a separate package with matching version numbering to the main package.

Asking users to substitute their version into and download https://codeload.github.com/numpy/numpy/zip/refs/tags/v1.22.0 then pull out the tests directory from the .zip file doesn't sound like too much of a step up? You could script it under a numpy.setup_testsuite() or something if it is.


But I don't really expect you to rip up the floorboards for PyInstaller's sake. I've got my answer. AFAIK, numpy doesn't do much lazy/delayed importing so a simple test will probably do.

@rgommers
Copy link
Member

rgommers commented Jan 3, 2022

Sumodules of a package have to be imported from the same package.

Ugh, I never liked that we changed to adding __init__.py files to tests, and this is a good reason why.

@Legorooj
Copy link

Legorooj commented Jan 4, 2022

Another option may be to seperate the tests out from the main numpy package, into an external package which you could install via numpy[test] or something? It would still allow the tests to be installed & run easily, but it wouldn't install them by default.

@bwoodsend
Copy link
Contributor

This will make PyInstaller a test dependency. Shall I make it a hard requirement and put it in test_requirements.txt or would you rather I use pytest.importorskip("PyInstaller") so that the test suite will simply skip the PyInstaller if PyInstaller isn't there?

@mattip
Copy link
Member

mattip commented Jan 4, 2022

I would prefer a soft pytest.importorskip("PyInstaller"), and in the PR that adds the test, make sure to install pyinstaller for at least some of the CI runs. If the test is slow, it should also get a slow mark.

@rgommers
Copy link
Member

rgommers commented Jan 5, 2022

make sure to install pyinstaller for at least some of the CI runs

some -> one?

It'd be great for this kind of test to be orthogonal in test job definition, so if it fails then it fails in a single place.

@rgommers
Copy link
Member

rgommers commented Jan 5, 2022

Another option may be to seperate the tests out from the main numpy package, into an external package which you could install via numpy[test] or something?

I'd rather have a separate package (e.g., numpy-testsuite) then, the extra syntax is weird and non-portable.

We may do this at some point, maybe, but I think we need to have better reasons for doing it than just for PyInstaller's benefit. Making the test suite portable to other array libraries would be one such reason. Or binary/wheel size - if it'd give a significant size reduction (not sure it will), then that matters. Size matters quite a bit actually - a 1 MB wheel size reduction save ~1 PB/yr bandwidth usage on PyPI alone.

@charris
Copy link
Member

charris commented Jan 5, 2022

I'd rather have a separate package (e.g., numpy-testsuite) then

IIRC, we discussed doing this when making the transition from nose to pytest.

@bwoodsend
Copy link
Contributor

Or binary/wheel size - if it'd give a significant size reduction (not sure it will), then that matters. Size matters quite a bit actually - a 1 MB wheel size reduction save ~1 PB/yr bandwidth usage on PyPI alone.

You must have had this suggestion before but just running strip **.so before setup.py bdist_wheel knocks the wheel size from 22MB to 7.9MB. Surely you don't expect users to routinely gdb their NumPy binaries?

bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 6, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is encluded to verify that the hook still
finds all necessary files.

Closes numpy#17184.
@mattip
Copy link
Member

mattip commented Jan 6, 2022

We do try to strip wheels we ship, see MacPython/numpy-wheels#19. Has that gone missing again?

@Legorooj
Copy link

Legorooj commented Jan 6, 2022

I'd rather have a separate package (e.g., numpy-testsuite) then, the extra syntax is weird and non-portable.

That's actually kind of what I mean, apologies; I assumed you'd be familiar with the concept of setuptool's extra's.

What I was suggesting was splitting out the tests into a different python package, which would be an extra dependency of numpy. You'd still be able to install it via pip install numpy-testsuite or whatever, but numpy[test] would work too.

An example of use for this is PyInstaller's encryption feature - it needs an extra dependency so we recommend people install pyinstaller[encryption] if they want the feature, but pip install tinyaes would have the same effect.

bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 6, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 8, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 8, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 8, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 8, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 12, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
bwoodsend added a commit to bwoodsend/numpy that referenced this issue Jan 20, 2022
Adding this special hook file tells PyInstaller what files a self contained
NumPy application needs to run. A test is included to verify that the hook still
finds all necessary files.

Closes numpy#17184.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants