Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--onefile does not work properly #5615

Closed
hackenjoe opened this issue Mar 8, 2021 · 19 comments · Fixed by #5617
Closed

--onefile does not work properly #5615

hackenjoe opened this issue Mar 8, 2021 · 19 comments · Fixed by #5617
Labels
area:bootloader Caused be or effecting the bootloader bug

Comments

@hackenjoe
Copy link

When I run my script normally it works and it also works when I apply the full mode in the pyinstaller (i.e. when all files in the subfolder dist are unpacked). However, the .exe file no longer works when I apply the --onefile mode. The following error message then appears:

Failed to decode wchar_t from UTF-8
MultiByteToWideChar: Der an einen Systemaufruf ³bergebene Datenbereich ist zu klein.
share\jupyter\lab\staging\node_modules\.cache\terser-webpack-plugin\content-v2\sha512\2e\ba\cfce62ec1f408830c0335f2b46219d58ee5b068473e7328690e542d2f92f2058865c600d845a2e404e282645529eb0322aa4429a84e189eb6b58c1b97c1a could not be extracted!
fopen: No such file or directory

This is somehow strange since the executable file works when it is placed in dist beside the other packages etc. But as soon as I use --onefile to bundle them into one big file it gives this error message.

@bwoodsend
Copy link
Member

Can you please fill out the issue template.

@bwoodsend bwoodsend added the state:need info Need more information for solve or help. label Mar 8, 2021
@rokm rokm added area:bootloader Caused be or effecting the bootloader bug and removed state:need info Need more information for solve or help. labels Mar 8, 2021
@rokm
Copy link
Member

rokm commented Mar 8, 2021

Uf... there's definitely something fishy going on in bootloader when CArchive contains long names...

rokm added a commit to rokm/pyinstaller that referenced this issue Mar 8, 2021
On Windows, `pyi_path_fopen()` erroneously uses `MAX_PATH` (260)
instead of `PATH_MAX` (4096) to convert the filename to wide
characters, which causes `MultiByteToWideChar()` call in
`pyi_win32_utils_from_utf8()` to fail with
```
MultiByteToWideChar: The data area passed to a system call is too
small.
```
when we try to open a file with a long filename (> 260).

Due to lack of error checking, the resulting `wfilename` array ends
up with random content, and the `_wfopen()` call either fails
(in read-only mode) or creates a randomly-named file (in write
mode).

Fixes pyinstaller#5615.
@hackenjoe
Copy link
Author

Maybe it is because of the additional files that I am loading? I tested a MWE and it lead to the same error when using --onefile:

MWE:

import os
from google_drive_downloader import GoogleDriveDownloader as gdd
from pathlib import Path
from tensorflow import keras

f_path = 'data/face_model.h5'
if not Path(f_path).is_file():
    gdd.download_file_from_google_drive(file_id='xxx', dest_path=f_path)
my_model = keras.models.load_model(f_path)

@rokm
Copy link
Member

rokm commented Mar 8, 2021

I can trigger the

Failed to decode wchar_t from UTF-8
MultiByteToWideChar: The data area passed to a system call is too small.

message if full path length of the file to be extracted exceeds 260 characters. In your original post, the length of

share\jupyter\lab\staging\node_modules\.cache\terser-webpack-plugin\content-v2\sha512\2e\ba\cfce62ec1f408830c0335f2b46219d58ee5b068473e7328690e542d2f92f2058865c600d845a2e404e282645529eb0322aa4429a84e189eb6b58c1b97c1a

is 212 characters; but you also need to add the temporary extraction path for onefile (e.g., `C:\Users<username>\AppData\Local\Temp_MEIXXXXX˙). So if your username is a bit longer (more than 10 characters in this case), you'll hit the issue.

@rokm
Copy link
Member

rokm commented Mar 8, 2021

import os
from google_drive_downloader import GoogleDriveDownloader as gdd
from pathlib import Path
from tensorflow import keras

f_path = 'data/face_model.h5'
if not Path(f_path).is_file():
    gdd.download_file_from_google_drive(file_id='xxx', dest_path=f_path)
my_model = keras.models.load_model(f_path)

What exactly is the error message with this MWE? Because downloading a file should not trigger the filename-length issue.

Unless google_drive_downloader has a cache with long file names and placed it in a directory that's being included with the application (either intentionally or unintentionally).

@hackenjoe
Copy link
Author

hackenjoe commented Mar 8, 2021

The error message is the same:

Failed to decode wchar_t from UTF-8
MultiByteToWideChar: Der an einen Systemaufruf ³bergebene Datenbereich ist zu klein.
share\jupyter\lab\staging\node_modules\.cache\terser-webpack-plugin\content-v2\sha512\2e\ba\cfce62ec1f408830c0335f2b46219d58ee5b068473e7328690e542d2f92f2058865c600d845a2e404e282645529eb0322aa4429a84e189eb6b58c1b97c1a could not be extracted!
fopen: No such file or directory

Moreover it happens as soon as I am importing keras as follows:

from tensorflow import keras

Another interesting thing that I observed, was as soon as I imported tensorflow (v. 2.4.1 in conda) I was not able to use pyinstaller on the .py script due to maximum recursion depth. So i needed to increase the depth in the .spec file and rerun the script which took significant longer to build the exe file (almost 20 minutes).

@rokm
Copy link
Member

rokm commented Mar 8, 2021

Hmmm... alright, so the actual problem here seems to be that importing tensorflow pulls in jupyter (if it is available), which is how that long-named file ends up in the build of even the MWE. Which in turn breaks onefile build due to the MAX_PATH vs PATH_MAX problem that #5617 fixes. At least one mystery solved, there...

@hackenjoe
Copy link
Author

That's our bet. But now the question arises how we can fix this tensorflow thing when using onefile mode?

@bwoodsend
Copy link
Member

You can just add --exclude-module=jupyter to get it started. Or better still, use a virtual environment.

@hackenjoe
Copy link
Author

Unfortunately still the same output. What I could observe is that it takes some time during the conversion process at this step:

retargeting to fake-dir 'd:\\anaconda3\\envs\\my_env\\lib\\site-packages\\PyInstaller\\fake-modules'

@rokm
Copy link
Member

rokm commented Mar 8, 2021

You can just add --exclude-module=jupyter

It should be --exclude-module notebook, actually.

@hackenjoe
Copy link
Author

Okay the conversion is now faster but the same error message appears. I am using keras 2.4.3 and tensorflow 2.4.1. Maybe this message helps you identifying the problem:

>>> from tensorflow import keras
2021-03-08 21:04:22.846991: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-03-08 21:04:22.847142: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> print(keras.__version__)
2.4.0

@rokm
Copy link
Member

rokm commented Mar 8, 2021

If the exact same error message appears, that means that jupyter and its data in share\jupyter\... is still being pulled in. Can you attach your build log?

rokm added a commit to rokm/pyinstaller that referenced this issue Mar 9, 2021
On Windows, `pyi_path_fopen()` erroneously uses `MAX_PATH` (260)
instead of `PATH_MAX` (4096) to convert the filename to wide
characters, which causes `MultiByteToWideChar()` call in
`pyi_win32_utils_from_utf8()` to fail with
```
MultiByteToWideChar: The data area passed to a system call is too
small.
```
when we try to open a file with a long filename (> 260).

Due to lack of error checking, the resulting `wfilename` array ends
up with random content, and the `_wfopen()` call either fails
(in read-only mode) or creates a randomly-named file (in write
mode).

Fixes pyinstaller#5615.
@hackenjoe
Copy link
Author

hackenjoe commented Mar 9, 2021

Do you mean the log file in build folder, named warn...txt?

@rokm
Copy link
Member

rokm commented Mar 9, 2021

No, I meant the output you get on console... (also, next time please attach the log as file attachment instead, as pasting it makes it difficult to navigate the issue).

@rokm
Copy link
Member

rokm commented Mar 9, 2021

But anyway, our Windows CI just errored out while testing #5617; the long-filename test that was added is causing the same fopen: No such file or directory as in your case (and which I haven't been able to reproduce earlier). I think this is caused by the system not having long filenames enabled.

Can you try enabling long file paths on your system? (i.e., run Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name 'LongPathsEnabled' -Value 1 in PowerShell as administrator, or set the corresponding registry manually with registry editor, and then restart the system).

That might already take care of the issue (the offending file will be created with garbage filename, but it will go unnoticed if it is actually not used anywhere). To fix the issue fully, try installing the following PyInstaller branch, which is the same as #5617 but with rebuilt bootloader:

pip install https://github.com/rokm/pyinstaller/archive/bootloader-carchive-longfilenames-test.zip

and then rebuild your program.

@hackenjoe
Copy link
Author

I tried the new branch and the error message now changes to:

MWE:

import os
from google_drive_downloader import GoogleDriveDownloader as gdd
from pathlib import Path
from tensorflow import keras
print(os.listdir())
[20576] WARNING: file already exists but should not: C:\Users\Dennis\AppData\Local\Temp\_MEI205762\torch\_C.cp37-win_amd64.pyd
2021-03-10 15:45:14.310847: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-03-10 15:45:23.229585: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
['build', 'data', 'dist',  'test.py', 'test.spec', '__pycache__']

So basically it seems to solve the error somehow. But it still takes a relatively long time to execute the file (about 1-2 minutes).

@rokm
Copy link
Member

rokm commented Mar 10, 2021

OK, so that long-filename issue is taken care of.

[20576] WARNING: file already exists but should not: C:\Users\Dennis\AppData\Local\Temp_MEI205762\torch_C.cp37-win_amd64.pyd

I've seen this one pop up in issue reports every now and then, but I haven't gotten around investigating it, yet. I suspect it's related to pytorch hook collecting everything in a rather brute-force way, and probably ending up with a duplicate... But it appears to be harmless.

2021-03-10 15:45:14.310847: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-03-10 15:45:23.229585: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

This is quite normal as well, as you need to collect CUDA DLLs manually, if you want to use them. (Or if you don't have a NVIDIA GPU and their drivers installed).

But it still takes a relatively long time to execute the file (about 1-2 minutes).

Since it's a onefile build, it has to unpack everything (and tensorflow is quite large...).

@hackenjoe
Copy link
Author

Okay, let's summarise what we have so far. We know that if we import tensorflow 2.4.x and then use --onefile mode the exe file won't start because the filenames are too long. Your branch solves this bug and the standalone exe works as expected. It would be nice if the "warning message" you mentioned could be solved as well.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area:bootloader Caused be or effecting the bootloader bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants