Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inspect fails to retrieve source code inside frozen app (.py files included) #4764

Closed
lukasberbuer opened this issue Mar 25, 2020 · 38 comments
Closed
Labels
feature Feature request pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this.

Comments

@lukasberbuer
Copy link

lukasberbuer commented Mar 25, 2020

Description of the issue

I'm trying to use inspect.getsource(<object>), e.g. inspect.getsource(datetime.date) in a frozen application.
Although I manually added the source files to datas

    datas=[
        *collect_data_files("datetime", include_py_files=True),
    ],

the application crashes with OSError: could not get source code.
The file path seems to get extracted correctly with inspect.getsourcefile(datetime.date) though.
Edit: inspect.getsourcefile(datetime.date) returns a non-existing file path [...]\dist\main\datetime.py. It should be [...]\dist\main\lib\datetime.py instead.

All files to reproduce the problem are found here:
https://gist.github.com/lukasberbuer/d9cc1716ada09f54c53640a97a11f7a5

Context information (for bug reports)

  • Output of pyinstaller --version: 3.6
  • Version of Python: 3.7
  • Platform: both GNU/Linux (Ubuntu 19.04 LTS), Windows 10 Professional
@Legorooj
Copy link
Member

This is weird... Do you have the time to track the line that raises the OSError?

@lukasberbuer
Copy link
Author

Ah I was wrong. The problem already occurs in resolving the right filepath:
filepath = inspect.getsourcefile(datetime.date) should return [...]\dist\main\lib\datetime.py but returns [...]\dist\main\datetime.py

I stepped through the getsourcefile function (see comments for values and branches):

def getsourcefile(object):
    """Return the filename that can be used to locate an object's source.
    Return None if no way can be identified to get the source.
    """
    filename = getfile(object)  # = C:\Dev\pyinstaller_inspect\dist\main\datetime.pyc
    all_bytecode_suffixes = importlib.machinery.DEBUG_BYTECODE_SUFFIXES[:]
    all_bytecode_suffixes += importlib.machinery.OPTIMIZED_BYTECODE_SUFFIXES[:]  # = ['.pyc', '.pyc']
    if any(filename.endswith(s) for s in all_bytecode_suffixes):
        filename = (os.path.splitext(filename)[0] +
                    importlib.machinery.SOURCE_SUFFIXES[0])  # = C:\Dev\pyinstaller_inspect\dist\main\datetime.py
    elif any(filename.endswith(s) for s in
                 importlib.machinery.EXTENSION_SUFFIXES):
        return None
    if os.path.exists(filename):  # -> False
        return filename
    # only return a non-existent filename if the module has a PEP 302 loader
    if getattr(getmodule(object, filename), '__loader__', None) is not None:  # -> True
        return filename  # return here
    # or it is in the linecache
    if filename in linecache.cache:
        return filename

So already getfile(object) returns a non-existing file path C:\Dev\pyinstaller_inspect\dist\main\datetime.pyc.
If I copy datetime.py in the onedir root directory, everything works.

@Legorooj
Copy link
Member

@lukasberbuer getfile returns, in this case, datetime.__file__ - the datetime module, not the class. We need to figure out why it's returning what it is.

@lukasberbuer
Copy link
Author

@Legorooj: I think the modules filename datetime.py is correct, because inspect will find and extract the date class definition there.

The getsourcefile(datetime.date) call returns, because the module has an attribute __loader__ (PEP 302 loader), which is injected by PyInstaller.
getsource(...) will call getsourcelines(...) -> findsource(...) -> linecache.getlines(...) -> linecache.update_cache(...) -> lazycache(...) if file was not found which calls <module>.__loader__.get_source(<module>.__name__). Now __loader__.get_source(...) is implemented by PyInstaller ./PyInstaller/loader/pyi_mod03_importers.py:

    def get_source(self, fullname):
        """
        Method should return the source code for the module as a string.
        But frozen modules does not contain source code.
        Return None.
        """
        if fullname in self.toc:
            return None
        else:
            # ImportError should be raised if module not found.
            raise ImportError('No module named ' + fullname)

So None is returned which causes the OSError exception in findsource(...).

Still, I wonder why the datetime.pyc located in the onedir root directory? I don't see it there during runtime.

@Legorooj
Copy link
Member

Legorooj commented Mar 30, 2020

I'm not actually sure - but it's likely something to do with the python import system. I'll try and trace this.

@lukasberbuer
Copy link
Author

Do you have some ideas what I can do to narrow down the problem? I'm stuck at the moment...

@Legorooj
Copy link
Member

Legorooj commented Apr 8, 2020

@lukasberbuer yes - traced the area of pyinstaller. Can you read C? You'll need it to trawl through the bootloader code - I've not had time yet. Sooo many of the issues that get opened on this repo are "please read the docs" etc. Or could be a bug - but the author's primary language isn't english, and the grammar is impossible to understand.

So this issue got buried - it's on my to-do list... Somewhere. It'd be nice to get this fixed.

@lukasberbuer
Copy link
Author

@Legorooj: Sorry for the late response - was a busy week. I really appreciate your work as maintainers of such big project.
I'll try to dig myself through the bootloader code this weekend...

@Legorooj
Copy link
Member

@lukasberbuer before you do that, had another idea. In PyInstaller.utils.hooks there's a function called collect_metadata. Can you try that on datetime?

@htgoebel
Copy link
Member

@lukasberbuer Well done tracking this down :-)

As the code says, the source file is search in self.toc, which is is the "table of content" for the PYZ archive. (The PYZ archive is where datetime.pyc is located - the the manual for details.)

So the solution is to add the source-files to the PYZ in the .spec-file. PyInstaller currently does not provide support-functions for this, so you need to build the correct data-structures manually. Of course collect_data_files() can help you to collect the files, but the data structure is different. Again, please refer to the manual.

Scratch:

source_files = collect_my_sourcefiles(…)  # return same structure as `collect_data_files()`
source_files_toc = toc = TOC((x, y, 'DATA') for x, y in source_files)
pyz = PYZ(a.pure, a.zipped_data, source_files_toc)

@htgoebel
Copy link
Member

@lukasberbuer Uh, sorry, I was wrong - did not read the doc-string you copied.

In addition to the above you would need to change the loader along these lines:

if fullname in self.toc:
   src = fullname.replace(".", "/")  # convert to path
   if src in self.toc:
       return self._pyz_archive.extract(fullname)[1]
   else:
      return None
else:
   …

Please let us know how this works for you.

@lukasberbuer
Copy link
Author

lukasberbuer commented Apr 13, 2020

@htgoebel: Great, that works!

main.spec:

# -*- mode: python ; coding: utf-8 -*-

import inspect
import datetime

block_cipher = None

def collect_source_files(modules):
    datas = []
    for module in modules:
        source = inspect.getsourcefile(module)
        dest = f"src.{module.__name__}"  # use "src." prefix
        datas.append((source, dest))
    return datas

source_files = collect_source_files([datetime])  # return same structure as `collect_data_files()`
source_files_toc = TOC((name, path, 'DATA') for path, name in source_files)

a = Analysis(
    ['main.py'],
    pathex=[],
    binaries=[],
    datas=[
        # *collect_data_files("datetime", include_py_files=True),
        # *copy_metadata("datetime"),
    ],
    hiddenimports=[],
    hookspath=[],
    runtime_hooks=[],
    excludes=[],
    win_no_prefer_redirects=False,
    win_private_assemblies=False,
    cipher=block_cipher,
    noarchive=False
)

pyz = PYZ(
    a.pure,
    a.zipped_data,
    source_files_toc,
    cipher=block_cipher
)

exe = EXE(
    pyz,
    a.scripts,
    [],
    exclude_binaries=True,
    name='main',
    debug=True,
    bootloader_ignore_signals=False,
    strip=False,
    upx=False,
    console=True
)

coll = COLLECT(
    exe,
    a.binaries,
    a.zipfiles,
    a.datas,
    strip=False,
    upx=True,
    upx_exclude=[],
    name='main'
)

Modified FrozenImporter (pyimod03_importers.py):

class FrozenImporter(object):
...
    def get_source(self, fullname):
        """
        Method should return the source code for the module as a string.
        But frozen modules does not contain source code.

        Return None.
        """
        if fullname in self.toc:
            sourcename = f"src.{fullname}"
            if sourcename in self.toc:
                return self._pyz_archive.extract(sourcename)[1].decode("utf-8")
            return None
        else:
            # ImportError should be raised if module not found.
            raise ImportError('No module named ' + fullname)
...

Is there any way to (monkey)patch the FrozenImporter from the .spec file so I don't have to add modify PyInstallers source code?

@Legorooj
Copy link
Member

@lukasberbuer just to check, can you run the tests on that mod? If it all works, a PR would be nice!

@Legorooj Legorooj added pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this. feature Feature request labels Apr 14, 2020
@lukasberbuer
Copy link
Author

@Legorooj: I'll work on that.. the f-strings limit it to py3.6 and above at the moment.
The prefix src.<module> feels like a bad workaround. Do you have a better idea to distinguish between modules and it's source files?

@Legorooj
Copy link
Member

@lukasberbuer sourcenames = 'src.{}'.format(fullname) should do the trick.

src.<module> is fine I think... Unless you want to use pyi_src? Anyway, get @htgoebel's opinion.

@bjones1
Copy link
Contributor

bjones1 commented May 26, 2020

I wonder if this could more easily be solved by calling collect_data_files('module-name', include_py_files=True) in a hook where module-name gives the module that you want source code included for?

@Legorooj
Copy link
Member

@bjones1 this was already tried; see the original comment.

@lukasberbuer
Copy link
Author

I wonder if this could more easily be solved by calling collect_data_files('module-name', include_py_files=True) in a hook where module-name gives the module that you want source code included for?

That would be wonderful. I need a way to make this thing work with hooks for skl2onnx. I couldn't figure out how to fuse the current work-around with this idea together.

@Legorooj
Copy link
Member

@bjones1 @lukasberbuer idea:

  • Add the changes to the loader - it should work normally as well
  • Add another function collect_source to the hook utils. The output could be assigned to datas.

@lukasberbuer
Copy link
Author

@Legorooj: Yeah, this would be the nicest solution. The problem if was facing is that datas is added to the PYZ TOC and therefore is not accessible for the FrozenLoader. This is why we had to use this hack:

pyz = PYZ(
    a.pure,
    a.zipped_data,
    source_files_toc,  # <--
    cipher=block_cipher
)

Is there any way FrozenImporter can access the datas TOC?

@lukasberbuer
Copy link
Author

@Legorooj: I was wrong. Accessing the datas TOC alone doesn't help us.. the source file has to be included in the PYZ archive as well - right?

Imagine we have a collect_source_files function which output can be assigned to datas. datas is consumed by the Analysis instance a. The PYZ archive - be default - is build with a.pure and a.zipped_data TOCs.
Without changing the API, the only way from datas to PYZ using hooks would be to misuse a.pure for the additional source files. This still feels very hacky...
What do you think?

@Legorooj
Copy link
Member

Hmm. How about a.source? It wouldn't be that hard to add a new attribute

@dwight9339
Copy link

So I'm having the same problem as mentioned in #4815. I've tried adding the collect_source_files function to my spec file as shown above but still have not had success. Is there something else that I need to do? Here is my spec file:

# -*- mode: python ; coding: utf-8 -*-
import inspect
import scrapy

block_cipher = None

def collect_source_files(modules):
    datas = []
    for module in modules:
        source = inspect.getsourcefile(module)
        dest = f"src.{module.__name__}"  # use "src." prefix
        datas.append((source, dest))
    return datas

source_files = collect_source_files([scrapy])  # return same structure as `collect_data_files()`
source_files_toc = TOC((name, path, 'DATA') for path, name in source_files)

a = Analysis(['scraper.py'],
             pathex=['C:\\Users\\19705\\Documents\\fiverr_jobs\\anti-terrorism_scraper\\prototype'],
             binaries=[],
             datas=[],
             hiddenimports=[],
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=True)
pyz = PYZ(a.pure, a.zipped_data, source_files_toc, cipher=block_cipher)
exe = EXE(pyz,
          a.scripts,
          [('v', None, 'OPTION')],
          exclude_binaries=True,
          name='key_term_scraper',
          debug=True,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          console=True )
coll = COLLECT(exe,
               a.binaries,
               a.zipfiles,
               a.datas,
               strip=False,
               upx=True,
               upx_exclude=[],
               name='key_term_scraper')

@lukasberbuer
Copy link
Author

@dwight9339: You need to modifiy the pyinstaller FrozenImporter in pyimod03_imports.py class as well:

class FrozenImporter(object):
...
    def get_source(self, fullname):
        """
        Method should return the source code for the module as a string.
        But frozen modules does not contain source code.

        Return None.
        """
        if fullname in self.toc:
            sourcename = f"src.{fullname}"
            if sourcename in self.toc:
                return self._pyz_archive.extract(sourcename)[1].decode("utf-8")
            return None
        else:
            # ImportError should be raised if module not found.
            raise ImportError('No module named ' + fullname)
...

@Alex-Mann
Copy link

@lukasberbuer Your solution works for me. I feel like this should be merged into the main branch.

Is this code able to live in the source somewhere so that it can always read the actual source code of a frozen .py file?

def collect_source_files(modules):
    datas = []
    for module in modules:
        source = inspect.getsourcefile(module)
        dest = f"src.{module.__name__}"  # use "src." prefix
        datas.append((source, dest))
    return datas

source_files = collect_source_files([scrapy])  # return same structure as `collect_data_files()`
source_files_toc = TOC((name, path, 'DATA') for path, name in source_files)

@PATAPOsha
Copy link

@lukasberbuer do you need to rebuild pyinstaller after modifying FrozenImporter?
I didnt rebuild, just put print inside FrozenImporter.get_source() that print appeared in my output .exe, so I guess changes took affect.
But that didn't solve problem. Still getting OSError: could not get source code
I'm struggling with Scrappy and it's spiders as mantioned in #4815

@PATAPOsha
Copy link

Nevermind, it worked.
In my case I had to include my spider source code along with scrapy itself.
So my .spec file:

# -*- mode: python ; coding: utf-8 -*-
import sys
import inspect
import scrapy
sys.path.extend(['C:\\Users\\path_to_your_scrapy_project', 'C:/Users/pata/path_to_your_scrapy_project'])

import my_project.spiders.my_spider

block_cipher = None

def collect_source_files(modules):
    datas = []
    for module in modules:
        source = inspect.getsourcefile(module)
        dest = f"src.{module.__name__}"  # use "src." prefix
        datas.append((source, dest))
    return datas

source_files = collect_source_files([scrapy, my_project.spiders.my_spider])  # return same structure as `collect_data_files()`
source_files_toc = TOC((name, path, 'DATA') for path, name in source_files)
...

@lukasberbuer
Copy link
Author

lukasberbuer commented Nov 30, 2020

@PATAPOsha: I just pulled the PyInstaller repository, changed the few lines of code and installed it in the project in editable mode with pip install -e <path to the local pyinstaller repo> - that worked for me.

@Alex-Mann: I would like to have it in the release as well but couldn't figure out a handy integration without diving to deep into the pyinstaller source code (as discussed above). The solution should work with hook files as well.

In my opinion, we should have:

  1. the minor modification of the FrozenImporter
  2. a collect_source_files function
  3. a variable sources for hook files (counterpart to datas) and the corresponding kwarg in the Analysis class
  4. modify the Analysis class to generate the sources TOC
  5. modify the spec file generator to pass the sources TOC to the PYZ constructor

Would you agree on those implementation steps?
I'm struggeling with (5)... I can work on a pull request the next days if somebody can help me out with the last step(s).

@Legorooj
Copy link
Member

Legorooj commented Dec 2, 2020

@bwoodsend can you give a hand here?

@bwoodsend
Copy link
Member

I've never worked with that corner of PyInstaller but surely just (5) I believe would simply be:

diff --git a/PyInstaller/building/templates.py b/PyInstaller/building/templates.py
index 20bc31e2..35a4d678 100644
--- a/PyInstaller/building/templates.py
+++ b/PyInstaller/building/templates.py
@@ -30,7 +30,8 @@
              cipher=block_cipher,
              noarchive=%(noarchive)s)
 pyz = PYZ(a.pure, a.zipped_data,
-             cipher=block_cipher)
+             cipher=block_cipher,
+             sources=a.sources)
 exe = EXE(pyz,
           a.scripts,
           a.binaries,
@@ -63,7 +64,8 @@
              cipher=block_cipher,
              noarchive=%(noarchive)s)
 pyz = PYZ(a.pure, a.zipped_data,
-             cipher=block_cipher)
+             cipher=block_cipher,
+             sources=a.sources)
 exe = EXE(pyz,
           a.scripts,
           %(options)s,

You'll also need a sources=%(sources)s somewhere in the a = Anaylsis(...) part.

@agronholm
Copy link

I'm trying to figure out a way to pack an application depending on pytorch with PyInstaller, and I'm lost. Is there a way to do this with the current PyInstaller release? I tried this workaround but that just gave me a new error due to modulegraph.py trying to open the dist\$projectname directory as a regular file.

@bwoodsend
Copy link
Member

#5697 resolves this. With pyinstaller>=4.3 (released this morning) you can just add source files as data files as originally suggested and inspect will find them.

@bwoodsend
Copy link
Member

Although I guess it would now make sense to add that collect_sources() hook utility...

@mfripp
Copy link

mfripp commented Nov 14, 2022

I have a similar problem to the original poster, in that my app needs to be able to use inspect.getsource() on a function from an imported package. The discussion above lays out the solution, but it is hard to tell what the current solution is. So here's a quick description of what works as of Nov. 2022, using PyInstaller 5.6.2 (after the update in #5697):

Option 1: Do exactly what @lukasberbuer originally wanted to do. This method is simple and is particularly useful if you want to include all the source files from a package. This now works as requested due to the update in #5697. In the example below, I've tweaked the collect_data_files call slightly, to collect only the .py files and exclude the __pycache__ directories that hold the .pyc files. For option 1, add the following snippets to your script.spec file:

from PyInstaller.utils.hooks import collect_data_files
...
a = Analysis(
    ...
    datas=[
        *collect_data_files("imported_package", include_py_files=True, excludes=['**/__pycache__/*']),
    ],
)

Option 2: Collect source code for particular modules. This option is useful if you only need to include source code for one or a few specific modules. This is similar to the collect_source_files() method that @lukasberbuer suggested later, but updated to match the final implementation of FrozenInstaller.get_source(), which was written to match the first suggestion, not the second. To use option 2, add the following snippets to your script.spec file:

import os
import imported_package.module1, imported_package.imported.module2

...

def collect_source_files(modules):
    datas = []
    for module in modules:
        source = inspect.getsourcefile(module)
        # updated to match final lookup logic in 
        # PyInstaller.loader.pyimod02_importers.FrozenImporter.get_source()
        dest = os.path.dirname(module.__name__.replace('.', os.sep))
        datas.append((source, dest))
    return datas

...

modules = [imported_package.module1, imported_package.module2]

a = Analysis(
    ...
    datas=[
        *collect_source_files(modules),
    ],
)

For my app, both of these create an additional copy of the source code in imported_package/..., even though the imported package's source code is already in site-packages/imported_package/.... But that is where PyInstaller's FrozenImport.get_source() expects to find it, and at least it works. More generally, I'm not sure why the .py and .pyc files for my dependencies are stored in site-packages/imported_package/... and the .so files are in imported_package/..., but again, it works so I won't worry about it.

Also note that PyInstaller automatically puts packages and modules from the Python standard library in the root of the packaged app, which is where FrozenImport.get_source() looks for code. So after #5697 you don't actually need to do anything to solve @lukasberbuer's original problem if you are inspecting the standard library, i.e.,inspect.getsource(datetime.date) will work automatically.

@rokm
Copy link
Member

rokm commented Nov 14, 2022

even though the imported package's source code is already in site-packages/imported_package/...

This should not be the case; we do not collect source code by default, and if we did, we would do so into top-level application directory (e.g., due to module collection mode that was added in #6945). So if you have source code collected in site-packages/, this is very likely caused by something you explicitly do in your spec.

By the same token, we do not add os.path.join(sys._MEIPASS, 'site-packages') to sys.path, so stuff in site-packages is not discoverable at runtime, unless you explicitly added that path to sys.path somewhere in your code.

@mfripp
Copy link

mfripp commented Nov 14, 2022

even though the imported package's source code is already in site-packages/imported_package/...

This should not be the case; we do not collect source code by default

Maybe this is a Mac-specific issue? On my Mac, running PyInstaller 5.6.2 under miniconda, the following produces a dist/script/site-packages/pandas folder with a lot of .py and __pycache__/*.pyc files. This is in addition to a dist/script/pandas folder that holds various .so and data files. The same seems to apply to other dependencies, e.g., numpy, pip or pytz.

mkdir /tmp/test
cd /tmp/test
conda create -name pyinstaller_test pyinstaller pandas
conda activate pyinstaller_test
echo "import pandas" > script.py
pyinstaller script.py
open dist/script   # open in file browser

@rokm
Copy link
Member

rokm commented Nov 14, 2022

Maybe this is a Mac-specific issue?

More likely an anaconda-specific issue, then (or an anaconda + macOS one). Our anaconda support is rather flaky and lacks thorough testing; could be the same sort of duplication problem as observed in #7165 (comment). Either way, I suggest you to switch to python.org + pip + venv for PyInstaller builds.

@rokm
Copy link
Member

rokm commented Nov 15, 2022

@mfripp Can you check if fix from #7247 improves the situation?

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature Feature request pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this.
Projects
None yet
Development

No branches or pull requests