Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate double book keeping in extern, just using _vendor directly #4330

Closed
wants to merge 9 commits into from

Conversation

abravalheri
Copy link
Contributor

@abravalheri abravalheri commented Apr 26, 2024

Main idea:

  • Use _vendor directly instead of extern

This is an investigation on #4320 (review)

Summary of changes

Intentional changes:

  • 836168b: Modify tool/vendored.py so that: <------------------- this is possibly the most relevant commit for review

    1. Transitive dependencies are automatically documented
      (so we don't loose track of what needs what).
      • This is done via pip-compile and vendored.in (direct dependencies)
        ==> vendored.txt. See e09f4ae.
    2. Instead of a case-by-case import rewrite, all vendored imports are rewritten at once
      • Case-by-case workarounds were kept (e.g. intentionally delayed imports)
    3. Sync versions of vendored packages in pkg_resources and setuptools
      so that it is easier for "de-vendoring" (this is done via --constraint in pip).
  • 2c45906: Remove setuptools.extern and pkg_resouces.extern.

Collateral changes:

  • b0dcf0d: Update on vendored packages
  • 9e7dece: Replace extern imports with _vendor

Changes that were needed just to make the tests pass:

Closes

Pull Request Checklist


Drawbacks

The separation between the direct dependencies (vendored.in) and the automatic logging and documentation of the transitive dependencies (vendored.txt) is nice from some point-of-view.

However, it also requires that we run tox -e vendor using the lowest version of Python supported by setuptools (otherwise conditional environment markers may be ignored). This may be easy to forget.

@abravalheri abravalheri changed the title Eliminate double book keeping for extern and just use _vendor directly Eliminate double book keeping in extern, just using _vendor directly Apr 26, 2024
@abravalheri
Copy link
Contributor Author

NOTE:

There are some people out there importing extern: https://github.com/search?q=%2Ffrom+setuptools.extern%2F+language%3APython&type=code&l=Python

Maybe we should have a deprecation here?
Although, I find it weird that a 3rd-party is to importing extern without at least a try-except.

@abravalheri
Copy link
Contributor Author

@jaraco is this an approach you would be interested for us to follow in setuptools?

@abravalheri abravalheri marked this pull request as ready for review April 26, 2024 21:28
@jaraco
Copy link
Member

jaraco commented Apr 26, 2024

Although, I find it weird that a 3rd-party is to importing extern without at least a try-except.

IIRC, my intention for extern was that it was providing the compatibility shim - if the vendored packages were present, they'd be imported, but if they were missing, the dependencies could be loaded from sys.path. That's what VendorImporter.search_path did. I wanted to make it trivially easy for a downstream distro to simply remove the _vendor directory and supply the dependencies and everything would just work. Hard-coding _vendor in the imports will break this assumption (and require distro packagers to re-write all of these imports).

My hope/expectation was that at some point the vendored packages would be removed in all but a few isolated scenarios and real dependencies would be used. That vision never panned out.

Maybe we should have a deprecation here?

Although I made extern a public-facing name, I never intended it to be used for anything except project specifically dependent on setuptools' dependencies (e.g. they need to catch the same exception from packaging). I never considered the backward compatibility implications here.

@jaraco is this an approach you would be interested for us to follow in setuptools?

My hope was that setuptools (and maybe pkg_resources) could simply import these dependencies instead of hard-coding the imports from their vendored location. I was thinking something like:

- from setuptools.extern import platformdirs
+ import platformdirs

And then early in the import of setuptools ensure that everything in setuptools._vendor appears in sys.path (such that the packages always prefer installed packages and only fall back to _vendor in bootstrap scenarios). I even have hopes to make the vendoring an exceptional case, where they're only installed (or added to sys.path) under special conditions that are known to need special bootstraping support, but otherwise rely entirely on standard packaging resolution. That is, under most circumstances, a PEP 517/518 installer will ensure dependencies are met before doing anything with setuptools... and only in source-only builds would there need to be a bootstrap step to break the cyclic dependency (e.g. setuptools --> packaging (from source dist) --> setuptools).

I'm not confident that transitioning to a hard-coded _vendor import will get us there.

Copy link
Contributor

@Avasam Avasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share @jaraco 's concerns about having to hard-code _vendor in the import.
In my mind, the best-case scenario would be if import packaging used the system's if it was already present, then fallbacks to _vendor.

For pkg_resources we could likely have a bit of logic really early in __init__ that adds to path if missing.

Setuptools isn't as easy with its multiple importable modules. Maybe every module that uses a vendored dependency should call a util function early that does that check and setup? How does the distutil hack do it? Maybe just adding that helper to __init__ is sufficient?

I'm not as experienced with advanced features of python's import system and setuptools' intricacies, sorry if my suggestions don't make sense.


Edit

if the vendored packages were present, they'd be imported, but if they were missing, the dependencies could be loaded from sys.path.

If you prepend _vendor on the sys.path so it has priority if a module is found. Wouldn't it work that way?


Anyway, here's a handful of comments concerning type-checking that would be useful even outside of this PR.

mypy.ini Outdated Show resolved Hide resolved
mypy.ini Outdated Show resolved Hide resolved
mypy.ini Outdated Show resolved Hide resolved
Comment on lines +76 to +86
from ._vendor.packaging.requirements import (
# These imports are explicit only to avoid mypy error
InvalidRequirement as _packaging_InvalidRequirement,
Requirement as _packaging_Requirement,
)

__import__('pkg_resources.extern.packaging.version')
__import__('pkg_resources.extern.packaging.specifiers')
__import__('pkg_resources.extern.packaging.requirements')
__import__('pkg_resources.extern.packaging.markers')
__import__('pkg_resources.extern.packaging.utils')
__import__('pkg_resources._vendor.packaging.version')
__import__('pkg_resources._vendor.packaging.specifiers')
__import__('pkg_resources._vendor.packaging.requirements')
__import__('pkg_resources._vendor.packaging.markers')
__import__('pkg_resources._vendor.packaging.utils')
Copy link
Contributor

@Avasam Avasam Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this direct import might render __import__('pkg_resources._vendor.packaging.requirements') useless?
On that note I also don't see any packaging.specifiers in this module.
packaging.version is only used for Version and invalidVersion.
packaging.markers is only used for Marker and InvalidMarker.
packaging.utils is only used for canonicalize_name.

Unless I misunderstand what else these imports could be for, they can likely be removed even outside this PR.

Copy link
Contributor

@Avasam Avasam Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I couldn't do a suggestion over some deleted lines, here's what it'd look like:

from ._vendor import platformdirs
from ._vendor.packaging.markers import Marker, InvalidMarker
from ._vendor.packaging.requirements import (
    InvalidRequirement as _packaging_InvalidRequirement,
    Requirement as _packaging_Requirement,
)
from ._vendor.packaging.utils import canonicalize_name
from ._vendor.packaging.version import Version, InvalidVersion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...

My approach here was not to differ much from the original content. I just wanted to make mypy stop failing.

Removing __import__('pkg_resources._vendor.packaging.requirements') should be probably fine. But the other ones, maybe it is a separated PR.

Copy link
Contributor

@Avasam Avasam May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted what I saw as good changes on their own into #4348

mypy.ini Outdated
Comment on lines 25 to 28
[mypy-pkg_resources._vendor.*,setuptools._vendor.*,setuptools.config._validate_pyproject.*,setuptools.tests.*]
# Even when excluded there might be problems: python/mypy/issues/11936#issuecomment-1466764006
# - Avoid raising issues when importing from "_vendor" modules
# - Avoid raising issues when importing dependencies in tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[mypy-pkg_resources._vendor.*,setuptools._vendor.*,setuptools.config._validate_pyproject.*,setuptools.tests.*]
# Even when excluded there might be problems: python/mypy/issues/11936#issuecomment-1466764006
# - Avoid raising issues when importing from "_vendor" modules
# - Avoid raising issues when importing dependencies in tests
[mypy-pkg_resources._vendor.*,setuptools._vendor.*]
# Avoid raising import issues in vendored modules. Even when excluded there might be problems: python/mypy/issues/11936#issuecomment-1466764006

Do you still need to specify tests here? These are modules that should also be found in exclude, tests are not excluded. Originally I added it because of the dynamically imported created modules that are imported, but you type: ignored those, so this shouldn't be needed.

Similarly the _validate_pyproject issue was fixed upstream.

Also seems to work locally w/o.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state of the repo, think we need special handling for

pkg_resources._vendor.*
setuptools._vendor.*
setuptools.config._validate_pyproject.*
setuptools.tests.*

so that we can run tests from 3.8 to 3.12. I remember trying without them and hitting some problems.

Further changes might obliviate some of those, but I did not want to extend even more this PR (e.g. using the latest version of _validate_pyproject).

Co-authored-by: Avasam <samuel.06@hotmail.com>
Copy link
Contributor Author

@abravalheri abravalheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @jaraco and @Avasam for the insightful comments. I think that for now we can close this PR since it is not the direction we want to go.

My main concern was that simply adding _vendor to sys.path is prone to dependency version conflicts (e.g. setuptools might require a minimum version of a certain package, but that is in conflict with the version installed in the system).


Note: I think that there are some ideas here that might be worthy adopting in the code base.

I really liked the division between vendored.in and vendored.txt via pip-tools since it creates a very clear documentation of what are the direct dependencies and what are the transitive dependencies (tracking the chain).

mypy.ini Outdated Show resolved Hide resolved
mypy.ini Outdated
Comment on lines 25 to 28
[mypy-pkg_resources._vendor.*,setuptools._vendor.*,setuptools.config._validate_pyproject.*,setuptools.tests.*]
# Even when excluded there might be problems: python/mypy/issues/11936#issuecomment-1466764006
# - Avoid raising issues when importing from "_vendor" modules
# - Avoid raising issues when importing dependencies in tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state of the repo, think we need special handling for

pkg_resources._vendor.*
setuptools._vendor.*
setuptools.config._validate_pyproject.*
setuptools.tests.*

so that we can run tests from 3.8 to 3.12. I remember trying without them and hitting some problems.

Further changes might obliviate some of those, but I did not want to extend even more this PR (e.g. using the latest version of _validate_pyproject).

Comment on lines +76 to +86
from ._vendor.packaging.requirements import (
# These imports are explicit only to avoid mypy error
InvalidRequirement as _packaging_InvalidRequirement,
Requirement as _packaging_Requirement,
)

__import__('pkg_resources.extern.packaging.version')
__import__('pkg_resources.extern.packaging.specifiers')
__import__('pkg_resources.extern.packaging.requirements')
__import__('pkg_resources.extern.packaging.markers')
__import__('pkg_resources.extern.packaging.utils')
__import__('pkg_resources._vendor.packaging.version')
__import__('pkg_resources._vendor.packaging.specifiers')
__import__('pkg_resources._vendor.packaging.requirements')
__import__('pkg_resources._vendor.packaging.markers')
__import__('pkg_resources._vendor.packaging.utils')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...

My approach here was not to differ much from the original content. I just wanted to make mypy stop failing.

Removing __import__('pkg_resources._vendor.packaging.requirements') should be probably fine. But the other ones, maybe it is a separated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants