Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building: collect built-in extensions into lib-dynload sub-directory #5604

Merged
merged 5 commits into from Apr 16, 2021

Conversation

rokm
Copy link
Member

@rokm rokm commented Mar 5, 2021

On macOS and linux, some of the python's built-ins have extension modules that originally reside in python3.X/lib-dynload directory. This directory is in sys.path, therefore the collected extensions have no parent directory and end up directly in the _MEIPASS.

This commit explicitly diverts such extensions into lib-dynload sub-directory in the _MEIPASS.

In addition to decluttering the _MEIPASS on linux and macOS, this also prevents ctypes.CDLL() from picking up the extensions' shared libraries and causing inconsistent behavior between frozen and unfrozen application, which in some corner cases leads to issues with shadowing, such as in #5583.

Fixes #5583. Fixes #3993.

Requires rebuilt bootloader, which will add _MEIPASS/lib-dynload to initial sys.path`.

@bwoodsend
Copy link
Member

How come this doesn't clash with the macOS/Mach-O library loader path rewriting magic we do on binaries?

@rokm
Copy link
Member Author

rokm commented Mar 5, 2021

How come this doesn't clash with the macOS/Mach-O library loader path rewriting magic we do on binaries?

Actually, it does, hence the macOS CI failures (on stuff that uses sqlite3) - I'll need to adjust the rewriting procedure as well.

@rokm
Copy link
Member Author

rokm commented Mar 5, 2021

Oh well, I guess multipackage needs to be fixed first...

@rokm
Copy link
Member Author

rokm commented Mar 8, 2021

This requires multipackage fix from #5606 for the multipackage test to pass (currently rebased on top of it).

The other failures on macOS were indeed due to library path rewriting, but for a different reason than I thought originally. For onedir builds, I had adjusted only the target file path but not the path in bincache, whereas for onefile builds, the change was made on the level that affected both. Therefore, the library paths in lib-dynload extensions were correctly adjusted in onefile, but not in onedir. I've now moved the path/name adjustment up to Analysis so that it affects all codepaths in the same way. So no modifications in library-path-rewriting itself is needed after all.

@rokm
Copy link
Member Author

rokm commented Mar 8, 2021

I'll squash the building-related commits into a single commit during next rebase; they're left separate for now to better illustrate the original and follow-up changes.

@rokm rokm marked this pull request as ready for review March 19, 2021 23:45
@rokm
Copy link
Member Author

rokm commented Apr 1, 2021

This change requires bootloader to be re-build. So in the period between merging this and committing rebuilt bootloaders, installing develop branch via github's snapshot archive will result in non-functioning PyInstaller on linux and macOS.

@bwoodsend
Copy link
Member

bwoodsend commented Apr 1, 2021

There's something broken with Github's CI it seems. Before 3.6 refused to start. Now 3.8 has done the same and 3.6 and 3.9 have been stuck on Starting workflow run... for the last 55 minutes.

This change requires bootloader to be re-build. So in the period between merging this and committing rebuilt bootloaders, installing develop branch via github's snapshot archive will result in non-functioning PyInstaller on linux and macOS.

In that case I think we should just rebuild them now and be done with it. We have several bug fixes like #5547 which will have people blindly installing from the development branch (me included) so merging this without building the bootloaders will cause a lot of disruption.

rokm added 4 commits April 8, 2021 11:45
…ion libraries

On linux and macOS, some of the built-ins are provided as extensions
(e.g., _sha1.cpython-39-x86_64-linux-gnu.so) that originally reside
in python3.X/lib-dynload directory. This directory is not in the
ctypes' library search path, therefore running
ctypes.CDLL('_sha1.cpython-39-x86_64-linux-gnu.so')
in python interpreter will come up empty.

In a frozen application, however, these extensions end up collected
directly in the _MEIPASS, which is searched by ctypes (because search
behavior is explicitly overriden by the hook). Therefore, trying to
load the extension's library file via ctypes.CDLL() will succeed,
resulting in inconsistent behavior between unfrozen and frozen
program.

On macOS, this causes issues with `pycryptodomex` (pyinstaller#5583), which,
due to its library search implementation, ends up with handle of
`_sha1` extension module instead of its private extension/library
 with the same name prefix.
On macOS and linux, some of the python's built-ins have extension
modules that originally reside in python3.X/lib-dynload directory.
This directory is in sys.path, therefore the collected extensions
have no parent directory and end up directly in the _MEIPASS.

This commit explicitly diverts such extensions into lib-dynload
sub-directory in the _MEIPASS.

In addition to decluttering the _MEIPASS on linux and macOS, this
also prevents ctypes.CDLL() from picking up the extensions'
shared libraries and causing inconsistent behavior between
frozen and unfrozen application, which in some corner cases
leads to issues with shadowing, such as in pyinstaller#5583.
Having pypath with greater size than pypath_w invites potential
trouble when path to executable is long enough that length of
pypath string exceeds PATH_MAX. And it makes little sense for the
two buffers to be of differently sized, anyway.
@Legorooj
Copy link
Member

Legorooj commented Apr 9, 2021

Going to attempt running the CI again.

@bwoodsend
Copy link
Member

Given that it's just Windows causing the problems and Windows passed on AppVeyor I think it's ok to say CI has passed.

@Legorooj
Copy link
Member

yeah, I agree. And the tests aren't actually failing, just taking too long. Needs debugging.

@Legorooj Legorooj merged commit 5c2dda9 into pyinstaller:develop Apr 16, 2021
@bwoodsend
Copy link
Member

Now for the next wave of antivirus false positives...

@rokm
Copy link
Member Author

rokm commented Apr 16, 2021

Ah, right, let's submit current bootloaders to virustotal before we officially make a release.

@bwoodsend
Copy link
Member

I'm going to see if I can get the virustotal API going...

@bwoodsend
Copy link
Member

Looks like someone has beaten us to it. Also, it now knows that it's a PyInstaller file thanks to this user-defined yaru rule. Didn't know that was a thing...

@rokm
Copy link
Member Author

rokm commented Apr 16, 2021

Looks like someone has beaten us to it.

Yeah, I figured manually uploading eight files will be faster than using API :)

Now let's wait and see how long after release they get flagged - i.e., are AV makers following PyInstaller releases, or are malware authors really quick on the uptake...

But we might want to look into that invalid-rich-pe-linker-version tag.

@bwoodsend
Copy link
Member

According to this, the rich header is just a very heavily concealed and undocumented collection of MSVC version numbers. Given MS refuse to publicly admit the thing exists, I think we'll struggle to work out how to fix it. It's also very odd that these files built using an official MSVC aren't automatically correct.

@rokm
Copy link
Member Author

rokm commented Apr 16, 2021

I suspect invalid-rich-pe-linker-version uses same test as "linker_test" in spoof_check.py script from https://github.com/RichHeaderResearch/RichPE.git. The "validity" of the rich PE linker version seems to be based on check between major and minor versions declared in PE header and in Rich header, and depends on the toolchain version:

  • current bootloaders (presumably VS2019): PE: 14, 28 vs Rich: 14, 0
  • rebuilt with VS2017 toolchain: PE: 14, 16 vs Rich: 14, 0
  • rebuild with VS2015 in Vagrant: PE: 14, 0 vs Rich: 14, 0

I've rebuilt all bootloaders with VS2015 using vagrant VM, and the invalid-rich-pe-linker-version tag is gone.

I'm not sure how much we should really care about that, though. The reports on bootloader files seem to be more or less the same (32-bit versions get more heat than 64-bit ones), and for all we know, this test/tag might actually be a honeypot to steer malware authors to older versions of VS.

@bwoodsend
Copy link
Member

Let's just assume it's harmless then. Incidentally, if an AV did infer that our bootloaders were malware because of the rich header then that AV would be so prone to false positives that no-one would use it. python38.dll taken from the official python.org installer also has this tag.

@rokm rokm deleted the lib-dynload-extensions branch May 5, 2021 09:39
rokm added a commit to rokm/pyinstaller that referenced this pull request May 20, 2021
Divert extensions, included by get_bootstrap_modules() (_struct,
zlib) into lib-dynload directory if applicable, to match the
behavior introduced in pyinstaller#5604. This prevents unused duplicates of
_struct and zlib extensions and fixed 5851.
rokm added a commit to rokm/pyinstaller that referenced this pull request May 20, 2021
Divert extensions, included by get_bootstrap_modules() (_struct,
zlib) into lib-dynload directory if applicable, to match the
behavior introduced in pyinstaller#5604. This prevents unused duplicates of
_struct and zlib extensions and fixed 5851.
rokm added a commit to rokm/pyinstaller that referenced this pull request May 23, 2021
Divert extensions, included by get_bootstrap_modules() (_struct,
zlib) into lib-dynload directory if applicable, to match the
behavior introduced in pyinstaller#5604. This prevents unused duplicates of
_struct and zlib extensions and fixed 5851.
bwoodsend pushed a commit that referenced this pull request May 26, 2021
Divert extensions, included by get_bootstrap_modules() (_struct,
zlib) into lib-dynload directory if applicable, to match the
behavior introduced in #5604. This prevents unused duplicates of
_struct and zlib extensions and fixed 5851.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SHA1_init missing when using PyInstaller on OSX Symbol not found PyCryptoDome Hash
3 participants