Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wheel naming is not following PEP 491 convention #3777

Open
bastienqb opened this issue Jan 19, 2023 · 10 comments
Open

[BUG] Wheel naming is not following PEP 491 convention #3777

bastienqb opened this issue Jan 19, 2023 · 10 comments
Labels
bug help wanted Needs Discussion Issues where the implementation still needs to be discussed. Needs Investigation Issues which are likely in scope but need investigation to figure out the cause

Comments

@bastienqb
Copy link

bastienqb commented Jan 19, 2023

setuptools version

setuptools==66.0.0

Python version

python 3.8

OS

macOS

Additional environment information

No response

Description

I am building a wheel for a python package using setuptools and it seems that the naming of my wheel file is not respecting the PEP 491 convention.

For external reasons, I need to name my package with the structure {namespace}.{package-name}. If I follow the convention, I would expect that my wheel file is named: namespace_package_name-0.1.0-py2.py3-none-any.whl.

However, I get this name for my wheel: namespace.package_name-0.1.0-py2.py3-none-any.whl, which is not respecting the convention.

Expected behavior

I expect the "." in the package name to be replaced with a "_" in the wheel name.

How to Reproduce

  1. create a hello_world package structure:
    hello_world
    |- my_package
    |    |- __init__.py
    |    |- hello_world.py
    |- pyproject.toml
    |- setup.cfg
    
  2. in setup.cfg, write:
    [metadata]
    name = namespace.my-package
    version = 0.1.0
    
    [options]
    zip_safe = False
    include_package_data = True
    packages = find:
    
  3. in project.toml, write:
    [build-system]
    requires = ["setuptools==66.0.0", "wheel"]
    build-backend = "setuptools.build_meta"
    
  4. run pipx run build at the root of your hello_world package
  5. inspect the dist directory which was created

Output

in the dist folder, you find:

  • namespace.my_package-0.1.0-py3-none-any.whl
@bastienqb bastienqb added bug Needs Triage Issues that need to be evaluated for severity and status. labels Jan 19, 2023
@jaraco
Copy link
Member

jaraco commented Jan 20, 2023

As much as I dislike unnecessary name mangling, this report does appear to be correct. I'm unsure if Setuptools is responsible for the naming or if the wheel package is. Regardless, it probably should be fixed.

@dholth is there any chance the PEP could be updated to allow . characters in the wheel name? What was the motivation for mangling them? They're an intrinsic, important character in the Python package name.

@jaraco jaraco added help wanted Needs Discussion Issues where the implementation still needs to be discussed. Needs Investigation Issues which are likely in scope but need investigation to figure out the cause and removed Needs Triage Issues that need to be evaluated for severity and status. labels Jan 20, 2023
@abravalheri
Copy link
Contributor

abravalheri commented Jan 21, 2023

@jaraco I belive that the PEP originally allowed for ., but the living spec was changed as a result or a discussion:

pypa/packaging.python.org#844

(Although after a quick look on the Discourse thread, it looks to me that the . character, specifically, was not really debated and ended up accidentally changing)

@mgorny
Copy link
Contributor

mgorny commented Feb 10, 2023

The current spec seems to say that full name normalization should happen (i.e. lower case + runs of special chars to underscore), and from my quick test "newer" backends all follow that.

From distribution viewpoint, we'd also prefer setuptools following, as otherwise we end up with unpredictable filenames (with some backends producing normalized names with others not).

@mgorny
Copy link
Contributor

mgorny commented Feb 10, 2023

I'm unsure if Setuptools is responsible for the naming or if the wheel package is.

Apparently wheel is, with wheel_dist_name() function in bdist_wheel.py. There's pypa/wheel#440 which seems to tackle this, though the bug title talks of .dist-info naming.

@abravalheri
Copy link
Contributor

Hi @jaraco, there was a discussion recently on the Python discourse about the normalisation of the distribution file names https://discuss.python.org/t/change-in-pypi-upload-behavior-intentional-accidental-pebkac/27707. I will try to summarise the key takeaways I found on why most of the community seems to be in favour of the normalisation. Hopefully this answers the question "What was the motivation for mangling them?":

  1. PyPI (as a public package index) has very strong reasons for enforcing strict uniqueness checks (security reasons, competition between publishers that might confuse users, etc…). Therefore it is not viable to differentiate between distributions named after “normal packages” and namespace packages on PyPI.
  2. pip, whose primary use case is to download from PyPI, prefers to rule out the possibility of treating distributions named after namespace packages and “normal packages” as two different distributions. This is compatible with PyPI and also helps users to fix unintentional typing errors and avoid downloading wrong/malicious distributions.
  3. Members of the community defend that having one normalisation rule to be applied everywhere would be simpler.
  4. There is some advantage in normalising the .dist-info/.egg-info directory (faster lookup), and if I understood correct this would also help to optimise the checks for conflicting distributions already installed (since .dist-info serves as a database).
  5. Private indexes have to follow PEP 503 and do name normalisation. So it is not possible for distributions named a.b and a_b to coexist in the same private index.
  6. There is some level of agreement that Name in the PKG-INFO/METADATA files should not be normalised and reflect the user's input.

So it seems that the name change unlocks optimisations and simplifications.

@bastienqb, if you would like you can chip in the discussion on https://discuss.python.org/t/change-in-pypi-upload-behavior-intentional-accidental-pebkac/27707 to explain why keeping the names in the format {namespace}.{package-name} is important. Otherwise there seems to be a push in the community for a strict standard that normalises the file name (as a mean to unlock the optimisations and simplifications I mentioned before).

@jaraco
Copy link
Member

jaraco commented Jun 16, 2023

Unfortunately, I don't think that answers my question - "why is . normalized to _?". They're very clearly different separators and have very different semantic meaning in Python. That is, if a Python user can't tell the difference between those characters, they're already headed for disaster.

Moreover, if the goal is to collapse any characters that a user might find confusing, it suggests that other normalization should occur. By this logic, PyPI should probably also normalize "I" and "l", maybe "j" and "i", "3" and "e", and probably others.

Since there's a strong push toward PyPI names being valid Python identifiers and since "jaraco.collections" and "jaraco_collections" are very much different Python identifiers, I feel strongly that either or both names should be allowed and should be different packages.

I'm very much in support of normalizing for security and to limit the diversity of the namespace and to do that in a way that's largely transparent to the user. What I'd really like to avoid is users seeing "downloading zope_interface" when the package they're downloading is "zope.interface" and the Python package that's installed is zope.interface.

The most important factor here is not to give namespace packages a second-class experience, and that's exactly what they'll get if they follow the convention of naming the package by mapping the Python package to the Distribution package name and the . gets replaced by _ in user-visible locations.

@dstufft
Copy link
Member

dstufft commented Jun 16, 2023

There's some confusion happening here.

Regardless of what happens, PyPI (and everyone else) is going to treat ., -, and _ as equal characters. This behavior has existed since basically the dawn of time in PyPI, setuptools, pip, etc. This isn't any different than the fact we treat F as equal to f. This is the status quo for ~20 years, and isn't likely going to change.

There's some confusion that came out of some of the specs where the Name field inside of the METADATA some people interpreted that to saying that the Name field should be normalized. I don't believe that there is wide spread support for that, and PyPI does not require that, and I think the people who think that, have essentially just misread the specs, and I'm preparing a PEP that will clarify that the Name field (and thus ultimately the "canonical" name, which should be used in any user visible locations. So when someone looks at the project on PyPI, or whatever it should use the name as it exists in the Name field.

On PyPI we normalize the name in the Simple API URLs only. So for zope.interface the simple API URL is /simple/zope-interface/. We do not consider this a user visible location, it's part of the API contract between an installer and PyPI. From a practical standpoint pip has to be able to take a user entered name and get the URL, and if we didn't do this normalization in the url, then pip install django would fail (because it's Django not django), etc.

The question is largely around filenames. Does zope.interface need to produce a wheel named zope.interface-1.0.whatever.whl, or can it produce a wheel named zope_interface-1.0.whatever.whl. Noting of course that no matter what we choose, a package named foo-bar is never going to have it's name represented exactly perfectly in the wheel.

The specs as they're currently written decide that the filename is not a user facing value, and treats them much like the URLs in the Simple API, an interchange format between computer systems. Of course filenames are also a little more visible than Simple API URLs, they do appear (as filenames) in the PyPI UI, etc.

So ultimately the question is:

  1. Given that zope.interface and zope-interface and zope_interface are all the same name as far as packaging is concerned.
  2. Given that the project's "canonical" name for display is zope.interface.
  3. Given that the PyPI index url for the project is going to be /simple/zope-interface/.

Is it OK for the filenames to be:

  • zope_interface-6.0.tar.gz
  • zope_interface-6.0-cp311-cp311-win_amd64.whl
  • zope_interface-6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

or MUST it be:

  • zope.interface-6.0.tar.gz
  • zope.interface-6.0-cp311-cp311-win_amd64.whl
  • zope.interface-6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

@CAM-Gerlach
Copy link

CAM-Gerlach commented Jun 17, 2023

Unfortunately, I don't think that answers my question - "why is . normalized to _?".

@abravalheri shared your question here on the linked thread, and ended up being convinced by the chorus of responses, so I'll try to summarize the main reasons other PyPA maintainers cited:

  • It applies one consistent normalization everywhere, rather than using multiple different normalization scheme for specific contexts (beyond just escaping the normalization character to _)
  • It ensures the resulting sdist/wheel/dist-info filename is deterministic for a given project name, making various operations (e.g. package name and metadata lookups) much simpler and more efficient
  • It prevents multiple normalization-equivalent distribution archives (sdists, wheels) from existing at the filesystem/URL level and creating ambiguity
  • It is what ≈all other backends (flit-core, hatchling, meson-python, pdm-backend, poetry-core) already do, rather than inventing a whole new scheme, changing the standard yet again and expecting all the other backends to switch
  • It accurately reflects the underlying constraints of what names are actually considered equivalent, and how installers and package indices have always treated them

However, there was equally strong support for only applying normalization to the identifiers that are not primarily user-facing, i.e. the artifact filenames and the .dist-info, and mandating that the METADATA Name field not be normalized, and that tools should always use that value whenever presenting the project name in a user-facing context (or if they do happen to rely on the distinction). This seems to address your main overriding concern—that the project name be presented to the user as the author intended.

Therefore, it seems a PEP formally declaring that Name MUST NOT be normalized and SHOULD always be what is presented to the user, while also stating the that it MUST be normalized in new archive filenames and .dist-info, would come closest to giving everyone most of what they want here without regressing on the de-facto status quo for either, which as Dustin summarizes on the thread is a mess for everyone involved—especially maintainers with . in their project names, which was actually what kick-started that discussion in the first place.

@jaraco
Copy link
Member

jaraco commented Jun 18, 2023

I concede. It doesn't matter what the motivation was to consider . and _ equivalent, but they are now by consensus.

@CAM-Gerlach
Copy link

Just to be clear, this was only the case for sdist and wheel filenames, dist-info directories and when requesting a package by name from an index—there was also strong consensus that they should not be considered equivalent in the canonical project name, the Name field of pyproject.toml, PKG-INFO and METADATA, and the display name for user consumption, and that should be kept exactly as originally written by the project author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Needs Discussion Issues where the implementation still needs to be discussed. Needs Investigation Issues which are likely in scope but need investigation to figure out the cause
Projects
None yet
Development

No branches or pull requests

6 participants