Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi4torch may be built against a different pytorch version than present in the venv #7

Open
d1saster opened this issue Sep 3, 2022 · 2 comments

Comments

@d1saster
Copy link
Member

d1saster commented Sep 3, 2022

Related to #5: pip seems to utilize its own installation of pytorch to build mpi4torch, no matter which version is present in the virtual environment.

This has the sideeffect that installing heat==1.2.0, which requires torch<=1.11.0 and then installing mpi4torch leads to unresolved symbols, once one tries to load mpi4torch, since pip will pull a newer version of torch to build mpi4torch.

Steps to reproduce in a fresh virtual environment:

pip install torch==1.11.0
pip install -v mpi4torch # this will at the moment also fetch an instance of torch==1.12.1 and use that to build mpi4torch
python -c 'import mpi4torch'

The latter will result (with torch=1.12.1 being used for the build) in:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/xyz/Test/2022-09-03_mpi4torch_bug/venv/lib/python3.9/site-packages/mpi4torch/__init__.py", line 2, in <module>
    from ._mpi import *
ImportError: /home/xyz/Test/2022-09-03_mpi4torch_bug/venv/lib/python3.9/site-packages/mpi4torch/_mpi.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index

which is reasonable since this was changed between 1.11 and 1.12.

d1saster pushed a commit that referenced this issue Sep 3, 2022
Although mpi4torch is not distributed as binary wheel files, pip's
behavior regarding build isolation and caching can lead to situations
in which mpi4torch is build against a different pytorch version than
the one present at installation time.

This partially addresses issue #7.
@d1saster
Copy link
Member Author

d1saster commented Sep 3, 2022

Commit 505ddd9 only partially fixes this issue. The commit enforces the built-time torch version in the to-be-installed binary wheel, thus removing the ABI incompatibility.

This way the segfault is removed, but it may still lead to potential versioning conflicts during install. And in fact this is not hypothetical.

pip still leaves opportunities on the table, since

pip install -v torch==1.11.0 .

can, due to a combination of build isolation and unfortunate caching, still yield to versioning conflicts

INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install mpi4torch==0.1.0 and torch==1.11.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested torch==1.11.0
    mpi4torch 0.1.0 depends on torch==1.12.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Ideally pip would check prior to building mpi4torch that it would suffice to install torch=1.11.0 in the isolated build environment.

In theory, PEP517 allows for custom in-tree build systems which can be used to dynamically generate the version requirement of the build-dependencies, as illustrated here. However, it is unclear to me how this is best used.

To be continued ...

@d1saster
Copy link
Member Author

d1saster commented Sep 4, 2022

As far as I can tell this is pypa/pip#9542 . Scipy devs face a similar situation with ABI compatibility to numpy and they crafted the oldest-supported-numpy meta-package to get at least some ABI compatibility across different numpy version. But I guess this is not an option for torch with its C++ ABI and in general too much fluctuation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant