Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dirs change introduced is incompatible with distlib library #294

Closed
logachev opened this issue May 10, 2019 · 33 comments
Closed

Empty dirs change introduced is incompatible with distlib library #294

logachev opened this issue May 10, 2019 · 33 comments

Comments

@logachev
Copy link

logachev commented May 10, 2019

It looks like 0.33.2 change is incompatible with latest distlib package.

If I try to do distlib.wheel.install() it throws an error because folder ZipInfo object filename is not in the records:
https://bitbucket.org/pypa/distlib/src/3b0fd333c8fb15bc04e570a23ee4836caefb7951/distlib/wheel.py#lines-521

I don't see any details about Folders in PEP (https://www.python.org/dev/peps/pep-0427/), so not clear if this is something that should be addressed by wheel or distlib package.

@lmazuel
Copy link

lmazuel commented May 10, 2019

Easy way to reproduce:
https://files.pythonhosted.org/packages/ff/1d/faec767bcb7b21156f6225d77da8cbf48dcb1f8e12dbc65663649073433a/azure_mgmt_iothub-0.8.0-py2.py3-none-any.whl

In [1]: import distlib.wheel

In [3]: w=distlib.wheel.Wheel("D:\\azure_mgmt_iothub-0.8.0-py2.py3-none-any.whl")

In [4]: w.verify()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-00bb035fd43f> in <module>()
----> 1 w.verify()

D:\VEnvs\testiot\Lib\site-packages\distlib\wheel.py in verify(self)
    795                 if u_arcname.endswith('/RECORD.jws'):
    796                     continue
--> 797                 row = records[u_arcname]
    798                 if row[2] and str(zinfo.file_size) != row[2]:
    799                     raise DistlibException('size mismatch for '

KeyError: 'azure/'

@stefangordon
Copy link

Tagging related issue/pr #287 / #289

@agronholm
Copy link
Contributor

What do you suggest as a fix? If I simply revert the change, then those empty PEP 420 namespace packages will continue to be left out of wheels.
Is distlib.wheel a vendored version of the wheel package?

@agronholm
Copy link
Contributor

Is distlib.wheel a vendored version of the wheel package?

Answering my own question: it's not.

Perhaps the solution is to:

  • Only add a directory entry if the directory is empty
  • Add a RECORD entry for those directories, but leave the hash empty

@agronholm
Copy link
Contributor

I've created a possible solution for the problem in the directoryfix branch. Please let me know if it works for you.

@jaraco
Copy link
Member

jaraco commented May 11, 2019

not clear if this is something that should be addressed by wheel or distlib package.

I have a fairly strong preference that distlib should support wheels that have directory entries (not just empty ones) represented by the very common trailing /, given that most tools that construct zip files from a directory tree will produce directory entries (including Python's zipfile module).

@jaraco
Copy link
Member

jaraco commented May 11, 2019

Add a RECORD entry for those directories, but leave the hash empty

I'm -1 on changing the format of the RECORD.

@agronholm
Copy link
Contributor

I don't think anybody is suggesting a format change for RECORD. But I'm beginning to think that maybe a PEP amendment is in order to address the issue of empty directories in a wheel, one way or another. In the meantime, how do you feel about my proposed fix? It would restore old behavior for 99.99% of projects.

@agronholm
Copy link
Contributor

By the way, can you point out a project that is suffering from the lack of empty directories in wheels? I've been unable to produce even an sdist of such a thing.

@jaraco
Copy link
Member

jaraco commented May 11, 2019

Here's the example, built on pypa/sample-namespace-packages:

sample-namespace-packages master $ cd native/pkg_a                                                                                                                                   
pkg_a master $ pip-run wheel==0.33.2 -- setup.py bdist_wheel                                                                                                                   
Collecting wheel==0.33.2
  Using cached https://files.pythonhosted.org/packages/2c/03/64c2d74bc381f9b8529151e7dda63921d3209be7b447bf876d96c115e7d3/wheel-0.33.2-py2.py3-none-any.whl
Installing collected packages: wheel
Successfully installed wheel-0.33.2
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/example_pkg
creating build/lib/example_pkg/a
copying example_pkg/a/__init__.py -> build/lib/example_pkg/a
installing to build/bdist.macosx-10.9-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.9-x86_64
creating build/bdist.macosx-10.9-x86_64/wheel
creating build/bdist.macosx-10.9-x86_64/wheel/example_pkg
creating build/bdist.macosx-10.9-x86_64/wheel/example_pkg/a
copying build/lib/example_pkg/a/__init__.py -> build/bdist.macosx-10.9-x86_64/wheel/example_pkg/a
running install_egg_info
running egg_info
creating example_pkg_a.egg-info
writing example_pkg_a.egg-info/PKG-INFO
writing dependency_links to example_pkg_a.egg-info/dependency_links.txt
writing top-level names to example_pkg_a.egg-info/top_level.txt
writing manifest file 'example_pkg_a.egg-info/SOURCES.txt'
reading manifest file 'example_pkg_a.egg-info/SOURCES.txt'
writing manifest file 'example_pkg_a.egg-info/SOURCES.txt'
Copying example_pkg_a.egg-info to build/bdist.macosx-10.9-x86_64/wheel/example_pkg_a-1-py3.7.egg-info
running install_scripts
creating build/bdist.macosx-10.9-x86_64/wheel/example_pkg_a-1.dist-info/WHEEL
creating 'dist/example_pkg_a-1-py3-none-any.whl' and adding 'build/bdist.macosx-10.9-x86_64/wheel' to it
adding 'example_pkg/'
adding 'example_pkg_a-1.dist-info/'
adding 'example_pkg/a/'
adding 'example_pkg/a/__init__.py'
adding 'example_pkg_a-1.dist-info/METADATA'
adding 'example_pkg_a-1.dist-info/WHEEL'
adding 'example_pkg_a-1.dist-info/top_level.txt'
adding 'example_pkg_a-1.dist-info/RECORD'
removing build/bdist.macosx-10.9-x86_64/wheel
pkg_a master $ cd dist                                                                                                                                                         
dist master $ $PYTHONPATH='example_pkg_a-1-py3-none-any.whl'                                                                                                                   
dist master $ python -c "import example_pkg.a" && echo 'worked!'                                                                                                               
worked!
dist master $ cd ..                                                                                                                                                            
pkg_a master $ rm -r dist                                                                                                                                                      
pkg_a master $ pip-run wheel==0.33.1 -- setup.py bdist_wheel                                                                                                                   
running bdist_wheel
running build
running build_py
installing to build/bdist.macosx-10.9-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.9-x86_64/wheel
creating build/bdist.macosx-10.9-x86_64/wheel/example_pkg
creating build/bdist.macosx-10.9-x86_64/wheel/example_pkg/a
copying build/lib/example_pkg/a/__init__.py -> build/bdist.macosx-10.9-x86_64/wheel/example_pkg/a
running install_egg_info
running egg_info
writing example_pkg_a.egg-info/PKG-INFO
writing dependency_links to example_pkg_a.egg-info/dependency_links.txt
writing top-level names to example_pkg_a.egg-info/top_level.txt
reading manifest file 'example_pkg_a.egg-info/SOURCES.txt'
writing manifest file 'example_pkg_a.egg-info/SOURCES.txt'
Copying example_pkg_a.egg-info to build/bdist.macosx-10.9-x86_64/wheel/example_pkg_a-1-py3.7.egg-info
running install_scripts
creating build/bdist.macosx-10.9-x86_64/wheel/example_pkg_a-1.dist-info/WHEEL
creating 'dist/example_pkg_a-1-py3-none-any.whl' and adding 'build/bdist.macosx-10.9-x86_64/wheel' to it
adding 'example_pkg/a/__init__.py'
adding 'example_pkg_a-1.dist-info/METADATA'
adding 'example_pkg_a-1.dist-info/WHEEL'
adding 'example_pkg_a-1.dist-info/top_level.txt'
adding 'example_pkg_a-1.dist-info/RECORD'
removing build/bdist.macosx-10.9-x86_64/wheel
pkg_a master $ cd dist                                                                                                                                                         
dist master $ python -c "import example_pkg.a"                                                                                                                                 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'example_pkg'

As you can see, building the native namespace package example_pkg.a on wheel 0.33.2 enables the wheel to be put on sys.path and be importable. When built on wheel 0.33.1, the namespace package fails to import.

@pfmoore
Copy link
Member

pfmoore commented May 11, 2019

But I'm beginning to think that maybe a PEP amendment is in order

I think this is reasonable. Tools obviously have different expectations, and the only real reason that I'm aware of for this change is to work around the Python issue that zipimport doesn't handle namespace imports if the directory entry isn't there.

I don't think there's any real issue over what the spec specifies - requiring directory entries seems fine to me if that helps some people. But IMO the "zipfile tools all do this" argument is fairly weak - after all Python's own zipfile module doesn't, or this issue would never have arisen.

Let's either lock down a specific behaviour in the spec, or accept that wheels without directory entries are entirely valid.

@jaraco
Copy link
Member

jaraco commented May 11, 2019

Python's own zipfile module doesn't, or this issue would never have arisen.

Python's zipfile module does do this when building a zipfile from a directory tree:

draft $ mkdir foo                                                                                                                                                              
draft $ python -m zipfile -c out.zip foo                                                                                                                                       
draft $ rmdir foo                                                                                                                                                              
draft $ unzip out.zip                                                                                                                                                          
Archive:  out.zip
   creating: foo/
draft $ python -m zipfile -l out.zip                                                                                                                                           
File Name                                             Modified             Size
foo/                                           2019-05-11 13:58:06            0

The reason this issue arose was because wheel isn't utilizing the tools typically used to create zipfiles (Info-ZIP, Windows Explorer, zipfile -c, but is instead wrote its own logic for constructing zip files. It does look like bdist_egg might be subject to the same limitation.

@pfmoore
Copy link
Member

pfmoore commented May 11, 2019

Python's zipfile module does do this when building a zipfile from a directory tree:

But not when adding files individually, "by hand". Honestly, though, the "why" of this isn't important. The reality is that the wheel spec doesn't require compatible wheels to include directory entries. It would be pretty simple to change the spec, at which point fixing the tools is non-controversial. Why waste energy debating what's "right" when just proposing a spec change will likely go through with little or no problem?

@agronholm
Copy link
Contributor

@jaraco Seems that the example you provided is not valid due to:

adding 'example_pkg/a/__init__.py'

This indicates that a is a normal package, not a (PEP 420) namespace package, so there is no empty directory to worry about. Also, the link you provided gives me a 404.

@agronholm
Copy link
Contributor

agronholm commented May 11, 2019

I see two immediate ways forward:

  1. Entirely revert the patch that broke other tools
  2. Apply my revised patch from the directoryfix branch which only creates directory entries for empty directories (which should do what you wanted for the use case you described)

I am also curious about what prompted the patch in the first place. Was there a real world project out there which was having problems?

@jaraco
Copy link
Member

jaraco commented May 11, 2019

the example you provided is not valid due to adding 'example_pkg/a/__init__.py'.

In that example, example_pkg is the PEP-420 namespace package and example_pkg.a is a normal package under example_pkg.

the link you provided gives me a 404

Thanks - corrected.

Was there a real world project out there which was having problems?

This issue stems from pypa/packaging-problems#212.

I see two immediate ways forward:

To consider, neither option (1) nor option (2) will help the situation for wheels that were released (or end up being released) with 0.33.2.

Option (2), the directoryfix branch, will stop the bleeding for packages that don't have empty directories. It will continue to cause failures where there are empty directories. I also don't think it will correct the issue for the canonical example above (the presence of example_pkg/a will prevent the creation of an example_pkg/ entry in the zip file, so the namespace package will still be unreachable (though I haven't tested).

My preference would be:

  1. Update distlib to honor the presence of directory entries in wheels. This solution not only fixes issues for packages produced by wheel 0.33.2, but also wheels that could be produced by extracting and re-zipping an existing wheel using other tools (that add those entries). Additionally, the spec should be updated to specify that including directory entries SHOULD be present.

@agronholm
Copy link
Contributor

It will continue to cause failures where there are empty directories. I also don't think it will correct the issue for the canonical example above (the presence of example_pkg/a will prevent the creation of an example_pkg/ entry in the zip file, so the namespace package will still be unreachable (though I haven't tested).

I am thoroughly confused. The issue here was about empty directories, correct? Since the example package contains a subpackage (a), it's not empty, and thus not a problem (I have published several libraries with a similar layout and they all work fine). So this leaves me wondering: what is the actual problem here being fixed?

@agronholm
Copy link
Contributor

I took a more thorough look at the example and I see the problem now: the wheels are used directly without being unpacked first, which causes the problem.

That said, until we get distlib sorted out, it would probably be prudent to retract this change since it's breaking things for people right now.

@jaraco
Copy link
Member

jaraco commented May 11, 2019

it would probably be prudent to retract this change since it's breaking things for people right now.

I agree. Does that mean to pull 0.33.2 also?

@agronholm
Copy link
Contributor

I'm not sure which is better: yanking the affected releases from PyPI or releasing a 0.33.4. People who have 0.33.2/3 installed right now might continue to have a bad time unless a new release is made. Or is this what you mean for me to do anyway?

@jaraco
Copy link
Member

jaraco commented May 11, 2019

It's your call. I might try just pulling 0.33.2/3 and see if that's sufficient. If workflows have cached/saved versions, an 0.33.4 might be called-for, and you may want to do that in advance just to head off that possibility.

@agronholm
Copy link
Contributor

I've yanked both 0.33.2 and 0.33.3. I'm releasing 0.33.4 to replace them.

jcfr added a commit to scikit-build/scikit-build that referenced this issue May 11, 2019
jcfr added a commit to scikit-build/scikit-build that referenced this issue May 11, 2019
@logachev
Copy link
Author

Thanks @agronholm for addressing this issue!

@dholth
Copy link
Member

dholth commented May 14, 2019

I remember making the decision to leave out directories. It is easier. If it was git or mercurial you'd have to put a .keep file in there, would that work?

@jaraco
Copy link
Member

jaraco commented May 14, 2019

@dholth It wouldn't. The issue isn't that directories aren't indicated. It's that they're implicit. You have an entry like this:

example_pkg/a/__init__.py

But you don't have any entries for example_pkg/ nor example_pkg/a/. The lack of of example_pkg/ is what leads to bpo-14905.

But if you were to unzip and rezip any wheel using popular zip tools, it no longer has the issue.

@agronholm
Copy link
Contributor

An updated distlib was released a few hours ago which addresses the directory entries handling issue. The next question is, when will it be sufficiently widespread to allow me to reintroduce the patch?

@dholth
Copy link
Member

dholth commented May 14, 2019 via email

@jaraco
Copy link
Member

jaraco commented May 14, 2019

What does older distlib think about the directory entry plus a hash for the empty string?

It may not like it or it may ignore it. I would recommend against having directory entries in RECORD or a hash of the empty string anywhere.

@dholth
Copy link
Member

dholth commented May 16, 2019

Inside the wheel RECORD is something you can hash to check the entire wheel. If you leave out directory entries then two wheels with the same RECORD hash could install a different set of namespace packages. To preserve that feature the directory entries must be included in RECORD, but the hash of the empty string would not be necessary.

@agronholm
Copy link
Contributor

You're right, although I fail to see any potential attack vector for malicious packages here.

@jaraco
Copy link
Member

jaraco commented May 16, 2019

Inside the wheel RECORD is something you can hash to check the entire wheel. If you leave out directory entries then two wheels with the same RECORD hash could install a different set of namespace packages.

Good point. I hadn't thought of that. But it only affects completely empty directories, which I don't think are at stake here.

Another way we may want to think about it: perhaps wheels could disallow empty directories or only include RECORD for completely empty directories. There's no need for a RECORD of directories that are there for the purpose of containing other modules and packages, which is what namespace packages typically do, as in that case, the RECORD already implicitly defines the namespace package.

In other words, the presence of example_pkg/a/__init__.py or example_pkg/a.py (in the zip and in the RECORD) implies the existence of example_pkg/ as a directory. Only in the case where there is a completely empty directory would there be an opportunity for a completely empty package to be present (or not present) despite identical RECORDs.

Of course, that is a seemingly valid use-case outside of zip files:

$ mkdir foo
$ python -c "import foo; print(foo.__spec__.loader)"                                                                                                               
<_frozen_importlib_external._NamespaceLoader object at 0x10e2c0860>

But not in zip files, even with directory entries:

$ zip -c foo.zip foo                                                                                                                                               
  adding: foo/ (stored 0%)
Enter comment for foo/:

$ python -m zipfile -l foo.zip                                                                                                                                     
File Name                                             Modified             Size
foo/                                           2019-05-16 11:40:08            0
$ rmdir foo                                                                                                                                                        
$ env PYTHONPATH=foo.zip python -c "import foo"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named foo

That seems like an unnecessary variance, but also probably ignorable for the more common case of namespace packages containing anything.

@dholth
Copy link
Member

dholth commented May 16, 2019

Unexpected present or missing directories wouldn't be a very good attack, no. Maybe it is better to think about it as an "is this the same?" question instead of an "is this malicious?" question.
Each RECORD entry would need to correspond to a ZipInfo, you wouldn't want to force the installer to figure out if a ZipInfo directory was also implied by other entries in the file. So a simple rule would be if you need an empty directory or want to have a separate entry for an implied directory then mention it in RECORD.

@pfmoore
Copy link
Member

pfmoore commented May 16, 2019

These details are getting fiddly enough that I think it's essential that whatever gets finally agreed is recorded in the wheel spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants