Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modulegraph: Use loaders to get module content. #5157

Merged
merged 9 commits into from
Oct 18, 2020

Conversation

htgoebel
Copy link
Member

modulegraph's heritage includes code still based on Python 2.x capabilities on
how to find a module and get its source or code. It also contained a anomaly
regarding packages and their __init__-file: If a package was detected,
it's __init__ file was loaded as a module. This, while being ugly, worked
in most cases, but failed if the __init__ module is an extension
module (see #5131, #4346), ending in an infinite loop. This was caused by
modulegraph distinguishing between the package and its __init__ module.

The solution is to switch to "modern" loaders, both being a loader for a
specific type of modules (source, extension, etc.) and having a package
characteristic (property is_package())

This commit does the following

  • In _find_module_path() no longer return "metadata" but a loader. This
    also removed huge part of this function, making it much easier to understand.
    As a side effect, this asymmetric closing of a file in a completely other
    part of the code (_safe_import_module) could be removed.

  • Change _load_module to use the loaders.

  • Merge _load_package into `__load_module``, getting rid of the anomaly
    described above.

  • Adjust the test-cases to the new behavior (esp. loader instead of
    metadata-tuple and filehandle)

Please note: Since we plan to change to modulegraph2 soon anyway, I did not
spend too much time on creating a clean solution.

See #4406, closes #5131, #4346.

@htgoebel

This comment has been minimized.

@agronholm
Copy link

agronholm commented Sep 10, 2020

Tried running this against a simple script containing just one line: import pydantic. This sends the released version of pyinstaller into an infinite loop. With this PR, I get a different error:

Command and output
$ pyinstaller -F script.py 
26 INFO: PyInstaller: 4.1.dev0
27 INFO: Python: 3.8.5
27 INFO: Platform: Linux-5.8.4-200.fc32.x86_64-x86_64-with-glibc2.2.5
27 INFO: wrote /tmp/script.spec
33 INFO: UPX is available.
35 INFO: Extending PYTHONPATH with paths
['/tmp', '/tmp']
37 INFO: checking Analysis
37 INFO: Building Analysis because Analysis-00.toc is non existent
37 INFO: Initializing module dependency graph...
38 INFO: Caching module graph hooks...
43 INFO: Analyzing base_library.zip ...
1625 INFO: Processing pre-find module path hook distutils from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks/pre_find_module_path/hook-distutils.py'.
1626 INFO: distutils: retargeting to non-venv dir '/usr/lib64/python3.8'
2624 INFO: Caching module dependency graph...
2680 INFO: running Analysis Analysis-00.toc
2694 INFO: Analyzing /tmp/script.py
2701 INFO: Processing module hooks...
2702 INFO: Loading module hook 'hook-difflib.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2702 INFO: Excluding import 'doctest'
2703 INFO:   Removing import of doctest from module difflib
2703 INFO: Loading module hook 'hook-distutils.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2705 INFO: Loading module hook 'hook-encodings.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2734 INFO: Loading module hook 'hook-heapq.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2735 INFO: Excluding import 'doctest'
2736 INFO:   Removing import of doctest from module heapq
2736 INFO: Loading module hook 'hook-multiprocessing.util.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2736 INFO: Excluding import 'test'
2736 INFO:   Removing import of test from module multiprocessing.util
2736 INFO: Loading module hook 'hook-pickle.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2737 INFO: Excluding import 'argparse'
2737 INFO:   Removing import of argparse from module pickle
2737 INFO: Loading module hook 'hook-sysconfig.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2744 INFO: Loading module hook 'hook-xml.py' from '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks'...
2906 INFO: Looking for ctypes DLLs
2908 INFO: Analyzing run-time hooks ...
2909 INFO: Including run-time hook '/tmp/venv/lib64/python3.8/site-packages/PyInstaller/hooks/rthooks/pyi_rth_multiprocessing.py'
2913 INFO: Looking for dynamic libraries
3200 INFO: Looking for eggs
3200 INFO: Using Python library /lib64/libpython3.8.so.1.0
3203 INFO: Warnings written to /tmp/build/script/warn-script.txt
3218 INFO: Graph cross-reference written to /tmp/build/script/xref-script.html
3224 INFO: checking PYZ
3224 INFO: Building PYZ because PYZ-00.toc is non existent
3225 INFO: Building PYZ (ZlibArchive) /tmp/build/script/PYZ-00.pyz
Traceback (most recent call last):
  File "/tmp/venv/bin/pyinstaller", line 8, in <module>
    sys.exit(run())
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/__main__.py", line 114, in run
    run_build(pyi_config, spec_file, **vars(args))
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/__main__.py", line 65, in run_build
    PyInstaller.building.build_main.main(pyi_config, spec_file, **kwargs)
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/build_main.py", line 720, in main
    build(specfile, kw.get('distpath'), kw.get('workpath'), kw.get('clean_build'))
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/build_main.py", line 667, in build
    exec(code, spec_namespace)
  File "/tmp/script.spec", line 18, in <module>
    pyz = PYZ(a.pure, a.zipped_data,
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/api.py", line 101, in __init__
    self.__postinit__()
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/datastruct.py", line 160, in __postinit__
    self.assemble()
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/api.py", line 123, in assemble
    self.code_dict[entry[0]] = get_code_object(entry[0], entry[1])
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/utils.py", line 619, in get_code_object
    co = _load_code(modname, filename)
  File "/tmp/venv/lib64/python3.8/site-packages/PyInstaller/building/utils.py", line 597, in _load_code
    return compile(source, filename, 'exec')
ValueError: source code string cannot contain null bytes

@htgoebel

This comment has been minimized.

@agronholm

This comment has been minimized.

@htgoebel

This comment has been minimized.

@htgoebel
Copy link
Member Author

@agronholm Please give the new code a try, which should now fix the issue.

@htgoebel htgoebel force-pushed the issue-4158-pydantic branch 2 times, most recently from 61344fb to 1adbe4f Compare September 12, 2020 10:43
@agronholm
Copy link

Seems to work. I don't get any errors anymore. Was the hook a necessary part of this fix somehow?

@htgoebel
Copy link
Member Author

@agronholm The hook is not part of the fix, but the hook is required for freezing pydantic.

While working on the following commits, I stepped over
add_to_suffix__extension() returning wrong results. Obviously unit-tests have
been missing.
These tests check for "recursion too deep" caused by the `__init__` module
being an extension module.

See pyinstaller#5131, pyinstaller#4346.
modulegraph's heritage includes code still based on Python 2.x capabilities on
how to find a module and get its source or code. It also contained a anomaly
regarding packages and their ``__init__``-file: If a package was detected,
it's ``__init__`` file was loaded as a module. This, while being ugly, worked
in most cases, but failed if the ``__init__`` module is an extension
module (see pyinstaller#5131, pyinstaller#4346), ending in an infinite loop. This was caused by
modulegraph distinguishing between the package and its ``__init__`` module.

The solution is to switch to "modern" loaders, both being a loader for a
specific type of modules (source, extension, etc.) and having a package
characteristic (property ``is_package()``)

This commit does the following

- In ``_find_module_path()`` no longer return "metadata" but a loader. This
  also removed huge part of this function, making it much easier to understand.
  As a side effect, this asymmetric closing of a file in a completely other
  part of the code (``_safe_import_module``) could be removed.

- Change ``_load_module`` to use the loaders.

- Merge ``_load_package`` into `__load_module``, getting rid of the anomaly
  described above.

- Adjust the test-cases to the new behavior (esp. loader instead of
  metadata-tuple and filehandle)

Please note: Since we plan to change to modulegraph2 soon anyway, I did not
spend too much time on creating a clean solution.

See pyinstaller#4406, closes pyinstaller#5131, pyinstaller#4346.
While the latest changes fixed modulegraph to not go into an endless recursion
if a package's `__init__` is an extension module, there still was an issue:
The returned node was of type ``Package``, leading to ``PYZ()`` trying to load
and compile the source. Since the file is an extension module, compiling
fails, of course.

This change adds a test-case for this.
Introduce a new node type ExtensionPackage, which signals that a package's
`__init__` is an extension module.

While the latest changes fixed modulegraph to not go into an endless recursion
if a package's `__init__` is an extension module, there still was an issue:
The returned node was of type ``Package``, leading to ``PYZ()`` trying to load
and compile the source (to get "missing" the code-object). Since the file is
an extension module, compiling failed, of course.
This maps the modulegraph nodes of type ExtensionPackage to simple extensions
for the sake of the TOC.
Add modulegraph's new node type ExtensionPackage.
Converting modulegraph Nodes into TOC entries was done twice.
The code was the same, except that one implementation did not filter node types
and had less comments.
When converting the nodes to TOC entries, also change the filename for
ExtensionPackage: Actually convert the packages name (``mypkg``) into the
module name (``mypkg.__init__``).
@htgoebel htgoebel merged commit 0b496be into pyinstaller:develop Oct 18, 2020
@htgoebel htgoebel deleted the issue-4158-pydantic branch October 18, 2020 19:35
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursion error with Pydantic
3 participants