Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit relationship between Distribution and PathDistribution #445

Open
jaraco opened this issue Apr 9, 2023 · 1 comment
Open

Revisit relationship between Distribution and PathDistribution #445

jaraco opened this issue Apr 9, 2023 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@jaraco
Copy link
Member

jaraco commented Apr 9, 2023

Should Distribution objects expose a method to resolve paths?

Yes, that is one solution. There is already .locate_file() which is not far off, except that its implementation in PathDistribution precisely removes the metadata directory that we need in this case.

More generally, I wonder if there is just a tad too much coupling between the Distribution and PathDistribution classes?

There is a mix of abstract and concrete methods in the Distribution base class, and several of the concrete methods seem fairly tightly bound to the PathDistribution subclass. For example, the at() static method creates and returns a PathDistribution instance outright, and the metadata(), entry_points(), files(), and requires() methods are all implemented with assumptions as to which files should be available and that the organization should resemble a dist-info or egg-info distribution.

Now, there is (AFAICS) only the one (PathDistribution) subclass of Distribution available in this project, and I don't really have enough experience with this project or its users to really imagine what other Distribution subclasses would or could exist in the wild.

Still, I would suspect that a subclass that was sufficiently different from PathDistribution to rather prefer subclassing Distribution directly, would then find itself not merely implementing the read_text() an locate_file() abstract methods, but would probably also need to _re_implement several of metadata(), entry_points(), files(), or requires() as well.

Hence, I would raise the question whether some of these methods would be better off with their concrete implementations moved into the PathDistribution subclass? (Of course leaving abstract methods behind in Distribution where that makes sense.) This would grant these concrete methods direct access to the stuff they need inside PathDistribution, instead of having to add more interfaces to Distribution for stuff that really only makes sense for PathDistribution.

Still, this is a much bigger refactoring than I set out to do, and I would not feel comfortable starting this without consulting you.

Originally posted by @jherland in #437 (comment)

@jaraco
Copy link
Member Author

jaraco commented Dec 3, 2023

I know I set this aside from the previous effort. I read it again today.

a subclass that was sufficiently different from PathDistribution would probably also need to _re_implement several of metadata(), entry_points(), files(), or requires().

All of these methods rely primarily on read_text, so I'd expect them to work without re-implementation. At least, that's the design - if the Distribution subclass implements a read_text, which given a filename returns the text of that metadata "file", the other methods rely on that interface to process the contents. The fact that some of these (files in particular) are highly aligned to the implementation details of egg-info and distinfo is an artifact, but there for practicality. I'd like to deprecate the egg-based solution, but only after suitable replacements are in place (e.g. Setuptools no longer generates them under any circumstances).

I don't really have enough experience with this project or its users to really imagine what other Distribution subclasses would or could exist in the wild.

It is the intention that anyone anywhere can install a DistributionFinder on sys.meta_path. You could, for example, build a database of Python packages and have them importable with metadata, or even build an importer system that imports directly from PyPI and loads the metadata from artifacts in PyPI without downloading them to disk. We explicitly tried not to couple too tightly to the existing implementation of Folders-on-Disk and Zipfiles on sys.path.

Prior to the 0.11 release (changes), there were two finders, one for folders on disk and another for wheels. That change consolidated those two finders into one by leveraging the abstractions in common between pathlib.Path and zipfile.Path.

It's conceivable that no one is relying on the Distribution subclass at this point, but it still seems like a meaningful abstraction to maintain - to avoid embedding reliance on the "pathlib-like" interface or for paths to exist on a file system.

the at() static method creates and returns a PathDistribution instance

Yes, that's a little awkward. Perhaps it would be better to be a top-level function or a classmethod on PathDistribution or something else.

I recommend to take each of these concerns and articulate an issue and possibly propose an alternative approach, depending on how you're feeling about it with the above context.

@jaraco jaraco added the help wanted Extra attention is needed label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant