Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a public API #262

Open
agronholm opened this issue Oct 3, 2018 · 21 comments · May be fixed by #472
Open

Provide a public API #262

agronholm opened this issue Oct 3, 2018 · 21 comments · May be fixed by #472
Labels
documentation enhancement needs-discussion Needs broader discussion / PyPA consensus
Milestone

Comments

@agronholm
Copy link
Contributor

Several third party libraries broke when the internal API of wheel was changed as the result of a major refactoring effort. Since there is clearly a need for wheel to function as a library as well, a public API should be defined and documented.

Here's what I plan to include in the API:

  • The WheelFile class
@matthew-brett
Copy link

Just a request to put something in the WheelFile class or other API that will:

Thanks for considering.

@wimglenn
Copy link
Contributor

wimglenn commented Oct 4, 2018

+1

pypa define a wheel class in at least 5 different places (!)

To make matters worse, pip also imports wheel and vendors distlib + setuptools, so it gets 4 different implementations at once. I think there would be great benefit to pypa/wheel providing a public API for other projects to use, and reduce this fragmented mess. It's important core infrastructure for packaging and should have some public surface area.

@agronholm
Copy link
Contributor Author

To comment @matthew-brett 's list:

  • read and write the information in the WHEEL package info - currently read_pkg_info and write_pkg_info
    • this sounds like a better fit for a more generic packaging library as it is not wheel specific
  • report the tags with which a particular wheel is compatible - e.g. https://github.com/pypa/wheel/blob/0.31.1/wheel/install.py#L122
    • doable, although I would still prefer a shared pep425 implementation if possible
  • read and write the RECORD file (read / write CSV as tuples or something).
    • not specific to wheel, so maybe should be put in a more generic packaging library too?

@matthew-brett
Copy link

Is the idea then, to have two copies of the code, one in wheel and another in a generic packaging library? Or would wheel depend on this library? Is it really worth building a new generic packaging library for this, given that the code is pretty small?

facebook-github-bot pushed a commit to facebookincubator/xar that referenced this issue Feb 21, 2019
Summary:
Upstream wheel refactored and deleted an API we depend on.
We would either have to copy in the specific code we need,
and potentially make modifications, or vendor in wheel. I
chose to vendor in wheel because it is cleaner, and we can
go back to upstream if they provide the API we need again.

See pypa/wheel#262.

Reviewed By: cooperlees

Differential Revision: D14146161

fbshipit-source-id: 5f6a3e49ce5c6d19964a3b067ec79ca6985d0d33
@pradyunsg
Copy link
Member

Is the plan to essentially document the WheelFile as it sits today as a supported API?

If so, I have one major-ish concern with that - it has a baked-in assumption about the wheel format always being a flat zip. It's likely not going to be possible to transparently add support for newer versions of the wheel format, which don't operate on ".data files as members of same zip". There's been a lot of discussion about nested archives for wheels' .data directory, which allows for smaller overall wheel files on https://discuss.python.org/t/3810/ and we shouldn't lock ourselves out of that enhancement due to this API.

Obviously, we can't support that kind of wheel format just yet because those aren't even close to being fully specified (!). However, I think it'd be really nice to have an API that can accommodate for such changes in a wheel 2.0 format, without needing significant user-side code change. It'd be nice to be able to gain support for reading from newer versions of wheel files by upgrading wheel (the package), without needing to update the code that handles the wheel files using the API.

In other words, I think we should avoid baking in the assumption in a supported API that wheel files' .data directory will be files directly in the same zip. It's OK if we don't do it now, but we'd be creating more design work, for when a wheel 2.0 format is specified.


I'd like it if we clearly from separate them from the supported API (so that pip can "easily" trim those files when vendoring). It'd be even nicer to have a separate library with the reusable code, so that pip doesn't have to do any trimming.


In terms of how pip uses wheel (the package) today, pip tries to import it to check if it is even installed, and that's all. :)

Beyond that, there's 2 main ways we deal with wheel files:

  • unpacking an existing wheel file (installation)
  • creating a wheel file, from a bunch of info we have at runtime (in tests)

I don't expect/want wheel to have any installation logic exposed (that's what github.com/pradyunsg/installer is for!).

In terms of "reading" a wheel, I think having the capability to perform random access on dist_info files, and only having sequential access to data files is all that's needed/can be guaranteed if we wanna be a bit more future-proof.

In terms of "writing" a wheel, I don't have a feel for what the API should look like - pip's use case is "dumb". We have a helper function that takes a dictionary of file_name: contents with the assumption that the contents is always a UTF-8 string. That's not OK for anything that is doing something more relevant to actual users (eg: dealing with large data files, or compiled code).

(typed on my phone, with typo happy thumbs)

@agronholm
Copy link
Contributor Author

If so, I have one major-ish concern with that - it has a baked-in assumption about the wheel format always being a flat zip.

I was going to refactor WheelFile so it doesn't inherit from ZipFile, to minimize the API surface.

I'd like it if we clearly from separate them from the supported API (so that pip can "easily" trim those files when vendoring). It'd be even nicer to have a separate library with the reusable code, so that pip doesn't have to do any trimming.

I suggested back on discuss that setuptools could absorb the bdist_wheel command which would leave wheel as just a reference implementation and a command line tool. Would this work for the pip maintainers?

@pradyunsg
Copy link
Member

Would this work for the pip maintainers?

I can't speak for others, but I'd certainly be a happy pip maintainer if this happened. :)

@mattip
Copy link
Contributor

mattip commented Jun 8, 2020

I commented on the setuptools issue. I think first wheel needs to publish its API, then setuptools would publish a release that uses the approved wheel API, only then could wheel drop bdist_wheel.

@pradyunsg
Copy link
Member

I was going to refactor WheelFile so it doesn't inherit from ZipFile, to minimize the API surface.

Awesome! I'm interested in what this would look like, since it's relevant to pip/any installer as a "reader" of wheel files. I'm happy to provide concrete feedback once there's a (draft?) API for me to provide feedback on, if that's welcome and useful. :)

@agronholm
Copy link
Contributor Author

Sure, I'll come up with a proposal and post it here. Probably within the next couple days.

@agronholm
Copy link
Contributor Author

I commented on the setuptools issue. I think first wheel needs to publish its API, then setuptools would publish a release that uses the approved wheel API, only then could wheel drop bdist_wheel.

Yes, that's what I was thinking too.

@pfmoore
Copy link
Member

pfmoore commented Jun 9, 2020

OK, so thinking about how I'd use a wheel library, I would like to see the following APIs:

Wheel filename handling

I'd like to be able to check if a file is a wheel, parse the individual components out of a filename, and build a filename from components. This is purely to abstract away the handling of the PEP-defined name format, to protect client code from possible future changes, and to avoid "home grown" over-simplified code.

  1. wheel.parse_filename(filename) -> metadata. The metadata should have attributes for name, version, build, and for the tag sets python, abi and platform. Raise an exception if the filename isn't in the correct format.
  2. wheel.make_filename(name, version, build, python, abi, platform) -> str. Construct a filename from the various parts. All apart from name and version should be optional.

Wheel reading

Maybe this is the responsibility of @pradyunsg's installer project. But outside of installation per se, I'd imagine needing the following functions:

  1. Check the hashes - validate that the wheel has not been tampered with by validating the hashes in RECORD against the file contents.
  2. Extract individual files, to memory or disk. Ideally, I'd like the API to abstract away the distribution-1.0.dist-info and distribution-1.0.data top-level directories, so I could do extract_metadata_file() or extract_data_file() for these.

Open question - do we expect clients to hard code the 'METADATA' filename? If not, then we need an API to get it.

Wheel writing

I'd want to be able to build wheels in memory or direct to disk. Taking an already-open file handle would work for this.

  1. Add files to the wheel - same considerations as for reading around .dist-info and .data directories, and the METADATA filename.
  2. The wheel file object needs to know the name and version to determine the correct .dist-info directory name.
  3. The code should automatically write the RECORD and WHEEL files. (Open question - does anyone need the ability to override what gets written? I don't).

General questions

  • Do we need to support signed wheels? I'd argue no, they are too niche, but others may differ.
  • How does Root-Is-Purelib get handled? I mostly ignore this, as my usage is pretty much entirely Python-only wheels, but more general backends would need this, I guess.

@agronholm
Copy link
Contributor Author

agronholm commented Jun 9, 2020

I've created a sketch of the new API (and its implementation) in the publicapi branch.

Some questions came up when I was writing this:

  • Should we try to retain any of the attributes from the original file? I see pros and cons here.
  • Do we want to retain the file dates or set them to 1970-01-01 00:00:00+00:00 (for reproducible wheels)? Or make it a switch?
  • Where should we read RECORD in? The old WheelFile class did that in __init__(), but is that something we want to keep doing?
  • Should the Generator field in WHEEL refer to the wheel library itself, or be given as an argument to the initializer? (currently we do bdist_wheel <wheel-library-version>)
  • The WheelFile class used write about its progress to a logger – should we have a global logger, instance logger, no logger or something else entirely?

@gaborbernat
Copy link

gaborbernat commented Jun 11, 2020

@agronholm can you make it a PR so we can comment inline? Would also help to see a few example usages of it, to see how it translates when trying to do some expected usages, the existence of such tests IMHO would help.

@pradyunsg
Copy link
Member

pradyunsg commented Jun 11, 2020

My thoughts:

Do we want to retain the file dates or set them to 1970-01-01 00:00:00+00:00 (for reproducible wheels)?

Retain file dates. Use SOURCE_DATE_EPOCH.

Where should we read RECORD in?

Upon initialization works well. I don't see why we'd have to change this.

Should the Generator field in WHEEL refer to the wheel library itself, or be given as an argument to the initializer?

The current implementation (default to wheel <version>, allowing override) is good enough IMO.

The WheelFile class used write about its progress to a logger – should we have a global logger, instance logger, no logger or something else entirely

No thoughts on this. global is likely easiest. :)

Should we try to retain any of the attributes from the original file? I see pros and cons here.

🤷

@agronholm
Copy link
Contributor Author

can you make it a PR so we can comment inline?

PR created: #357

Would also help to see a few example usages of it, to see how it translates when trying to do some expected usages, the existence of such tests IMHO would help.

I'll convert bdist_wheel to use this API.

@agronholm
Copy link
Contributor Author

I have now pushed changes that make bdist_wheel use the new API. I've migrated the old tests to match and they pass now too.

@agronholm agronholm added the needs-discussion Needs broader discussion / PyPA consensus label Aug 10, 2021
@abravalheri
Copy link
Contributor

abravalheri commented Jan 9, 2022

I think extracting the following into packaging (universally vendored in all packaging projects) would help a lot to solve the problem pointed out when moving bdist_wheel into setuptools:

read and write the information in the WHEEL package info - currently read_pkg_info and
write_pkg_info

  • this sounds like a better fit for a more generic packaging library as it is not wheel specific

read and write the RECORD file (read / write CSV as tuples or something).

  • not specific to wheel, so maybe should be put in a more generic packaging library too?

(but in a non-opaque way, e.g. dealing with dictionaries instead of raw-bytes)

@pradyunsg
Copy link
Member

@agronholm Anything I can do to help move this forward? (I'm interested in having wheel contain an implementation of installer's WheelFile API, so that we can use the canonical implementation here with installer. :)

@agronholm
Copy link
Contributor Author

agronholm commented Oct 22, 2022

@agronholm Anything I can do to help move this forward? (I'm interested in having wheel contain an implementation of installer's WheelFile API, so that we can use the canonical implementation here with installer. :)

Sorry for the delayed response. I haven't touched the publicapi branch in a while, so I needed to update it for the work the continue. I'm doing that today, and I'll push the changes once I'm done.

As for what you can do, making a list of changes we need for a release would be a good start (once I've pushed my changes). We can discuss the next steps once that has been done.

UPDATE: more tests are passing than failing :)
UPDATE 2: just one test still failing

@agronholm
Copy link
Contributor Author

@pradyunsg I've created a draft PR: #472. It's currently failing due to not finding the wheel package which is a bit strange, as this is working for me locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement needs-discussion Needs broader discussion / PyPA consensus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants