-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build_ext --inplace overwrites data files if they have been installed via MANIFEST.in #886
Comments
It looks like specifying scikit-build's approach to hooking setuptools for autogenerating a MANIFEST also seems to work. Since that manifest generation is a setuptools hook, it happens inside the |
The underlying problem is coming from this code block populating the |
… for wheels (#1233) Using MANIFEST.in currently runs into a pretty nasty scikit-build bug (scikit-build/scikit-build#886) that results in any file included by the manifest being copied from the install tree back into the source tree whenever an in place build occurs after an install, overwriting any local changes. We need an alternative approach to ensure that all necessary files are included in built packages. There are two types: - sdists: scikit-build automatically generates a manifest during sdist generation if we don't provide one, and that manifest is reliably complete. It contains all files needed for a source build up to the rmm C++ code (which has always been true and is something we can come back to improving later if desired). - wheels: The autogenerated manifest is not used during wheel generation because the manifest generation hook is not invoked during wheel builds, so to include data in the wheels we must provide the `package_data` argument to `setup`. In this case we do not need to include CMake or pyx files because the result does not need to be possible to build from, it just needs pxd files for other packages to cimport if desired. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1233
… for wheels (#1348) Using MANIFEST.in currently runs into a pretty nasty scikit-build bug (scikit-build/scikit-build#886) that results in any file included by the manifest being copied from the install tree back into the source tree whenever an in place build occurs after an install, overwriting any local changes. We need an alternative approach to ensure that all necessary files are included in built packages. There are two types: - sdists: scikit-build automatically generates a manifest during sdist generation if we don't provide one, and that manifest is reliably complete. It contains all files needed for a source build up to the raft C++ code (which has always been true and is something we can come back to improving later if desired). - wheels: The autogenerated manifest is not used during wheel generation because the manifest generation hook is not invoked during wheel builds, so to include data in the wheels we must provide the `package_data` argument to `setup`. In this case we do not need to include CMake or pyx files because the result does not need to be possible to build from, it just needs pxd files for other packages to cimport if desired. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1348
… for wheels (#12960) Using MANIFEST.in currently runs into a pretty nasty scikit-build bug (scikit-build/scikit-build#886) that results in any file included by the manifest being copied from the install tree back into the source tree whenever an in place build occurs after an install, overwriting any local changes. We need an alternative approach to ensure that all necessary files are included in built packages. There are two types: - sdists: scikit-build automatically generates a manifest during sdist generation if we don't provide one, and that manifest is reliably complete. It contains all files needed for a source build up to the cudf C++ code (which has always been true and is something we can come back to improving later if desired). - wheels: The autogenerated manifest is not used during wheel generation because the manifest generation hook is not invoked during wheel builds, so to include data in the wheels we must provide the `package_data` argument to `setup`. In this case we do not need to include CMake or pyx files because the result does not need to be possible to build from, it just needs pxd files for other packages to cimport if desired. I also reverted #12945, which was a stopgap solution to avoid this underlying problem. That change would have caused import issues inside the python/cudf directory when installing (the lack of an inplace build would have made the source tree unimportable) so this fix removes that minor limitation introduced in that PR. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #12960
… for wheels (#5278) Using MANIFEST.in currently runs into a pretty nasty scikit-build bug (scikit-build/scikit-build#886) that results in any file included by the manifest being copied from the install tree back into the source tree whenever an in place build occurs after an install, overwriting any local changes. We need an alternative approach to ensure that all necessary files are included in built packages. There are two types: - sdists: scikit-build automatically generates a manifest during sdist generation if we don't provide one, and that manifest is reliably complete. It contains all files needed for a source build up to the cuml C++ code (which has always been true and is something we can come back to improving later if desired). - wheels: The autogenerated manifest is not used during wheel generation because the manifest generation hook is not invoked during wheel builds, so to include data in the wheels we must provide the `package_data` argument to `setup`. In this case we do not need to include CMake or pyx files because the result does not need to be possible to build from, it just needs pxd files for other packages to cimport if desired. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Dante Gama Dessavre (https://github.com/dantegd) URL: #5278
… for wheels (#3342) Using MANIFEST.in currently runs into a pretty nasty scikit-build bug (scikit-build/scikit-build#886) that results in any file included by the manifest being copied from the install tree back into the source tree whenever an in place build occurs after an install, overwriting any local changes. We need an alternative approach to ensure that all necessary files are included in built packages. There are two types: - sdists: scikit-build automatically generates a manifest during sdist generation if we don't provide one, and that manifest is reliably complete. It contains all files needed for a source build up to the cugraph C++ code (which has always been true and is something we can come back to improving later if desired). - wheels: The autogenerated manifest is not used during wheel generation because the manifest generation hook is not invoked during wheel builds, so to include data in the wheels we must provide the `package_data` argument to `setup`. In this case we do not need to include CMake or pyx files because the result does not need to be possible to build from, it just needs pxd files for other packages to cimport if desired. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #3342
I want to work toward making scikit-build behaves as close as possible to the way setuptools works by default. I'm mostly planning on doing this via scikit-build-core's setuptools support which will replace scikit-build's code in the future, but happy to slowly work toward improving this too. Packages and data files are really tricky in scikit-build. |
Totally understand the challenges here, and I agree with the strategy. IMHO this particular bug seems worth prioritizing a bit higher than waiting on scikit-build-core though. It's a recipe for potentially significant losses of local work for developers. I didn't manage to find a sufficient root cause to determine an optimal solution, but I suspect that this particular case may have potential patches. Do files from MANIFEST.in actually ever need to be copied back to the source tree? I don't recall if that was being done by setuptools or scikit-build, if the latter maybe that could simply be disabled? Alternatively, perhaps If there isn't an easy workaround, perhaps certain code paths should simply be disabled and throw errors. Overwriting local changes seems far worse to me than simply having scikit-build throw an error saying "build_ext is not supported after install when using MANIFEST.in" (assuming that situation can be robustly detected). |
This bug is a fairly edge case scenario, but it's nasty enough that I think it would be worthwhile to fix if at all possible since the results are potentially quite bad and involve file corruption. Now that I have an MRE I am attempting to zero in on a root cause, but would appreciate insights if someone else knows what to do without further inspection. tl;dr
setup.py build_ext --inplace
appears to be unsafe for use aftersetup.py install
if a MANIFEST.in file is present and points to any file that may be modified.Here is a gist with a project that demonstrates the basic problem (note that the
_hello.pyx
and__init__.py
files should be placed in ahello
subdirectory). First runpython setup.py install
. Now, make any edit to_hello.pyx
(I includex=1
twice, simplest change is to comment out one line) and runpython setup.py build_ext --inplace
. The edit that was just made will vanish.The problem appears to be that any file that is included by the MANIFEST.in is being copied into scikit-build's
cmake-install
directory by asetup.py install
command, whilebuild_ext --inplace
copies files from this install tree back into the source tree, again respecting MANIFEST.in. Onlysetup.py install
actually copies the current version of the file intocmake-install
. If noinstall
command is ever run, then there is nothing to copy and everything seems to work fine i.e. it is completely fine if you only ever usebuild_ext --inplace
. However, onceinstall
is run even once, the files listed in the manifest exist in the install tree and are only updated by subsequent install commands. Meanwhile, every subsequentbuild_ext --inplace
copies the files from the install tree back into the source directory. The result is that onceinstall
has been run,build_ext --inplace
is no longer safe to use because it will overwrite all local changes with the last state in which the file was installed.The text was updated successfully, but these errors were encountered: