Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importlib_metadata handles PathLike objects on sys.path when importlib.PathFinder does not #372

Closed
graingert opened this issue Mar 9, 2022 · 10 comments

Comments

@graingert
Copy link
Contributor

graingert commented Mar 9, 2022

demo script:

import contextlib
import os
import pathlib
import sys
import tempfile

import importlib_metadata


@contextlib.contextmanager
def _tmp_path(*args, **kwargs):
    with tempfile.TemporaryDirectory(*args, **kwargs) as tmp_dir:
        yield pathlib.Path(tmp_dir)


def main():
    with _tmp_path() as tmp_path:
        (tmp_path / "module.py").write_bytes(b"def function():\n    return 1\n")
        dist_info = tmp_path / "demo_package-0.0.0.dist-info"
        dist_info.mkdir()
        (dist_info / "entry_points.txt").write_bytes(
            b"[group]\nname = module:function\n"
        )
        sys.path.append(tmp_path)
        ep = next(
            iter(importlib_metadata.entry_points(name="name", group="group")), None
        )
        print(ep)
        print(ep.load())


if __name__ == "__main__":
    sys.exit(main())

results in:

EntryPoint(name='name', value='module:function', group='group')
Traceback (most recent call last):
  File "/home/graingert/projects/demo_importlib.py", line 33, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/graingert/projects/demo_importlib.py", line 13, in _tmp_path
    yield pathlib.Path(tmp_dir)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/graingert/projects/demo_importlib.py", line 29, in main
    print(ep.load())
          ^^^^^^^^^
  File "/home/graingert/.virtualenvs/testing311/lib/python3.11/site-packages/importlib_metadata/__init__.py", line 203, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1142, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'module'

however if I os.fsdecode the tmp_path - before appending it to the sys.path I get:

EntryPoint(name='name', value='module:function', group='group')
<function function at 0x7f2b12192700>

see also https://github.com/python/cpython/blob/23dcea5de736b367c0244042aaca10971538b2b4/Lib/importlib/_bootstrap_external.py#L1460-L1461
and https://bugs.python.org/issue32642

@jaraco
Copy link
Member

jaraco commented Mar 12, 2022

I've uploaded the repro as a gist so I can replicate the error with this one-liner:

$ http https://gist.githubusercontent.com/jaraco/835328403b381be186f15ae943183031/raw/c876455f3ad994660139e6e1c472e1a1a934d792/metadata-issue372.py | pip-run -q importlib_metadata -- -
EntryPoint(name='name', value='module:function', group='group')
Traceback (most recent call last):
  File "<stdin>", line 33, in <module>
  File "<stdin>", line 29, in main
  File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-necvv5sn/importlib_metadata/__init__.py", line 203, in load
    module = import_module(match.group('module'))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'module'

@jaraco
Copy link
Member

jaraco commented Mar 12, 2022

I can confirm the same issue exists in the stdlib version:

$ http https://gist.githubusercontent.com/jaraco/835328403b381be186f15ae94318303/raw/c876455f3ad994660139e6e1c472e1a1a934d792/metadata-issue372.py | sed -e 's/import importlib_metadata/import importlib.metadata as importlib_metadata/' | python -
EntryPoint(name='name', value='module:function', group='group')
Traceback (most recent call last):
  File "<stdin>", line 33, in <module>
  File "<stdin>", line 29, in main
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/metadata/__init__.py", line 162, in load
    module = import_module(match.group('module'))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'module'

I'm not sure there's much for importlib metadata to do here. Sure, it could exclude path entries that aren't strings.

I do see where importlib_metadata resolves a pathlib.Path to str.

It looks like that cast was added to introduce compatibility on Python 2 (see #121 (comment)).

Now that Python 2 support is dropped, that cast can be dropped too. I wonder what that does to the repro.

@graingert
Copy link
Contributor Author

graingert commented Mar 12, 2022

afaik sys.path can have bytes in, which that str call would change to "b'/some/dir'"

@jaraco
Copy link
Member

jaraco commented Mar 12, 2022

Removing the cast to str doesn't change the error (importlib_metadata still finds the path):

$ http https://gist.githubusercontent.com/jaraco/835328403b381be186f15ae943183031/raw/c876455f3ad994660139e6e1c472e1a1a934d792/metadata-issue372.py | pip-run -q git+https://github.com/python/importlib_metadata@bugfix/372-no-cast-path -- -
EntryPoint(name='name', value='module:function', group='group')
Traceback (most recent call last):
  File "<stdin>", line 33, in <module>
  File "<stdin>", line 29, in main
  File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-mqdr02l5/importlib_metadata/__init__.py", line 203, in load
    module = import_module(match.group('module'))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'module'

@graingert
Copy link
Contributor Author

I think you have to skip paths that aren't isinstance(path, (str, bytes)) here

path.search(prepared) for path in map(FastPath, paths)

@jaraco
Copy link
Member

jaraco commented Mar 12, 2022

afaik sys.path can have bytes in, which that str call would change to "b'/some/dir'"

Ugh. The docs are contradictory on the topic, stating that it's a list of strings, but implying that bytes are supported by saying that types other than bytes and str are ignored.

I'm not sure I want to add support for bytes unless a real world use case demonstrates the need.

@jaraco
Copy link
Member

jaraco commented Mar 12, 2022

I think you have to skip paths that aren't isinstance(path, (str, bytes)) here

path.search(prepared) for path in map(FastPath, paths)

I agree that would stop it from discovering metadata on paths indicated by Path objects. It also would affect discovery by paths not on sys.path, possibly a behavior worth keeping.

I'm not sure what is the right behavior here. I'm inclined to sit on this one for now until importlib has settled on what should happen about sys.path, but even if it decides to retain the behavior that only str/bytes are allowed on sys.path, I'd lean toward importlib metadata supporting any paths that are str or Path-like.

What was the use-case that triggered this report? Is it something that can't be easily worked around?

@graingert
Copy link
Contributor Author

What was the use-case that triggered this report? Is it something that can't be easily worked around?

I was writing a test for pandas that mutates sys.path https://github.com/pandas-dev/pandas/pull/46302/files#diff-a363ecff783b66b30c1c2ea2481145430bba2e2c7a2afc279a1db24912eab756R130 and noticed the discrepancy

@graingert
Copy link
Contributor Author

graingert commented Mar 15, 2022

so it turns out bytes don't actually work in sys.path: ((although they used to in py3.2) unless they are zip files) see https://bugs.python.org/issue47025

gist here https://gist.github.com/graingert/3422e812b0aa243c9719884f86be52ff

import contextlib
import os
import pathlib
import sys
import tempfile

import importlib_metadata


@contextlib.contextmanager
def _tmp_path(*args, **kwargs):
    with tempfile.TemporaryDirectory(*args, **kwargs) as tmp_dir:
        yield pathlib.Path(tmp_dir)


def main():
    with _tmp_path() as tmp_path:
        (tmp_path / "module.py").write_bytes(b"def function():\n    return 1\n")
        dist_info = tmp_path / "demo_package-0.0.0.dist-info"
        dist_info.mkdir()
        (dist_info / "entry_points.txt").write_bytes(
            b"[group]\nname = module:function\n"
        )
        sys.path.append(os.fsencode(tmp_path))
        import module
        assert module.function() == 1
        ep = next(
            iter(importlib_metadata.entry_points(name="name", group="group")), None
        )
        print(ep)
        print(ep.load())


if __name__ == "__main__":
    sys.exit(main())
Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 1363, in _path_importer_cache
KeyError: b'/tmp/tmpokxmswlq'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/graingert/projects/demo_importlib.py", line 35, in <module>
    sys.exit(main())
  File "/home/graingert/projects/demo_importlib.py", line 25, in main
    import module
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1002, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 945, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1430, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1399, in _get_spec
  File "<frozen importlib._bootstrap_external>", line 1365, in _path_importer_cache
  File "<frozen importlib._bootstrap_external>", line 1341, in _path_hooks
  File "<frozen importlib._bootstrap_external>", line 1623, in path_hook_for_FileFinder
  File "<frozen importlib._bootstrap_external>", line 1495, in __init__
  File "<frozen importlib._bootstrap_external>", line 182, in _path_isabs
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

@jaraco jaraco closed this as completed in 4212eed Mar 19, 2022
@jaraco
Copy link
Member

jaraco commented Mar 19, 2022

In the referenced commit, I've documented the current behavior. It's my opinion that the other importlib machinery should support this behavior, but I'll be happy to revisit this issue at such a time that Python is more explicit about what types are allowed. In the meantime, I'd advise to just not put pathlib.Path objects on sys.path. Let me know if following that advice would be problemmatic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants