Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for binaries larger than 2Gb #3939

Closed
lsoica opened this issue Dec 20, 2018 · 40 comments · Fixed by #5667
Closed

Error for binaries larger than 2Gb #3939

lsoica opened this issue Dec 20, 2018 · 40 comments · Fixed by #5667
Assignees
Labels
area:bootloader Caused be or effecting the bootloader help wanted Seeking help by somebody with deeper knowledge on this topic. pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this.

Comments

@lsoica
Copy link

lsoica commented Dec 20, 2018

For final binaries larger than 2Gb, the following exception message is printed out:

struct.error: 'i' format requires -2147483648 <= number <= 2147483647

https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/archive/writers.py#L264

class CTOC(object):
    ENTRYSTRUCT = '!iiiiBB'  # (structlen, dpos, dlen, ulen, flag, typcd) followed by name

@htgoebel htgoebel added area:bootloader Caused be or effecting the bootloader help wanted Seeking help by somebody with deeper knowledge on this topic. labels Jan 2, 2019
@htgoebel
Copy link
Member

htgoebel commented Jan 2, 2019

A possible solution would be to change this from signed to unigned values (need to be done in the bootloader, too).

Can you please provide a test-case we can include into the test-suite. Thanks.

@htgoebel htgoebel added the pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this. label Jan 7, 2019
@lsoica
Copy link
Author

lsoica commented Jan 7, 2019

A possible solution would be to change this from signed to unigned values (need to be done in the bootloader, too).

Can you please provide a test-case we can include into the test-suite. Thanks.

The switch from signed to unsigned moves the limit from 2gb to 4gb, right ? Is there a way to go over the 4gb limit ? Basically, could we go with an 8byte specifier ?

As for a test case, do you need a specific format or just a set of steps ?

@htgoebel
Copy link
Member

htgoebel commented Jan 7, 2019

The switch from signed to unsigned moves the limit from 2gb to 4gb, right ?

Right.

Is there a way to go over the 4gb limit ? Basically, could we go with an 8byte specifier ?

Basically we could be, but this required more changes. Also we should keep in mind 32-bit platforms, which might have trouble using 8-byte specifiers.

As for a test case, do you need a specific format or just a set of steps ?

Well, this should fit into our test-quite, which uses py-test. The most problematic point is to generate same data which surly ends up to be > 4gb when creating the CTOC.

I suggest several test-cases, which (as I assume) also ease developing tests:

  • an unit-test for CArchiveWriter and CArchiveReader (testing CTOC and CTOCReader is not of much use IMHO). Shall go into a new file in tests/unit/.
  • an end-to-end test to ensure the created executable actually works.
    This test could create a huge file containing some known data at some relevant positions (e.g. start, just before 2GB, just after 2GB, just before 4GB, just after 4GB), pass it using pyi_args=["--add-data", "xxx:yyy"] and the code should verify the expected data can be read.
    Shall go into tests/function/test_basic.py.

@Kenblair1226
Copy link

Kenblair1226 commented Sep 17, 2019

The switch from signed to unsigned moves the limit from 2gb to 4gb, right ?

Right.

Sorry this might be a dumb question, but how to do it? Change '!iiiiBB' to '!IIIIBB' ?
how to to this for bootloader for they are all binary files?

@Legorooj
Copy link
Member

Pinging

@Red-Eyed
Copy link

Any news?
I tried to change datastructure but have no luck

@Legorooj
Copy link
Member

@Red-Eyed I've added this to my todo-list, and I'll get this in the 4.1 release, maybe even the 4.0 release depending on when that actually gets released.

@sshuair
Copy link

sshuair commented Nov 9, 2020

@Legorooj is this fixed now?

@Red-Eyed
Copy link

Red-Eyed commented Nov 9, 2020

@Legorooj sorry for the annoying comments, but this issue is important to me: Currently I have to do my installer with additional tarballs rather than having just a single large executable.

So, If you don't mind, I just want to ask a few questions in order to have a better understanding the status of this issue:

  1. Is any kind of work in progress?
  2. Maybe there are some difficulties that doesn't allow to put > 2GB into the CArchive?
  3. How do you think when (if ever) this will be resolved?

Thanks!

@Legorooj
Copy link
Member

Legorooj commented Nov 9, 2020

Ah right. Let me take a lok at this right now - my apologies, I completely forgot about this.

  1. No. There will be within a week though hopefully.
  2. Moving the limit from 2gb to 4gb should be easy enough I think, however I don't know beyond that.
  3. Very soon.

@bwoodsend
Copy link
Member

I question the usefulness of this. It'd take a good 30 minutes to pack a 4GB application then a good minute or so to unpack again followed by however long it takes for your antivirus to give it a good sniff which would have to happen every time the user opens the application. I already find ~200MB applications to be annoyingly slow to startup. I think you'd be better off turning your one-dir applications into an installable bundle using NSIS or something equivalent. That way your user only has to unpack it once.

@Red-Eyed
Copy link

Red-Eyed commented Nov 9, 2020

@bwoodsend you described only ONE use case, and yes it's useless. Also, you do not take into account other OS than windows.

My use case is actually cross platform installer. And I do not want or need any third party installer, as my self written install in python code does the job.

Linux distributions do not have problems with filesystem and antiviruses so it's fast to unpack zstd archives in multithreaded mode.

Also, u said that u've done it, could you please share that branch with me?

@Red-Eyed
Copy link

Red-Eyed commented Nov 9, 2020

@bwoodsend

It'd take a good 30 minutes to pack a 4GB

Idk, it takes me about 2 min to pack ~2GB.But note, I just pack "data"

@bwoodsend
Copy link
Member

I don't have a branch for it. I was just messing about with zlib which is what PyInstaller uses to pack and unpack.

@bwoodsend
Copy link
Member

If your large size is coming from data files is it possible to just put those into a zip, have your code read directly from said zip, then include the zip in your onefile app?

@Red-Eyed
Copy link

Red-Eyed commented Nov 9, 2020

I just include tar.xz into my pyinstaller one file and then unpack it

@Red-Eyed
Copy link

Red-Eyed commented Nov 9, 2020

my tar.xz is about 1.6 GB (the source size is 6GB)
I want to use zstd instead, because unpacking is done in multi threaded mode (which is much faster), but compression ratio is sightly lower compared to xz, and it's size is about 2.5 GB that doesn't fit into current pyinstaller implementation.

@Red-Eyed
Copy link

Red-Eyed commented Nov 9, 2020

I don't have a branch for it. I was just messing about with zlib which is what PyInstaller uses to pack and unpack.

I played around with PyInstaller and CArchive, but I haven't make it to work.

So, If you're not going to create PR for this issue, I would like to see any kind of investigation work, even if it doesn't meet PR requirements, just to see what have u done.

Or u didn't change CArchive logic?

Thanks

@sshuair
Copy link

sshuair commented Nov 10, 2020

@bwoodsend if you pack the cuda cudnn libraries for deep learning, the package will up to 2GB archive. Please move the limit from 2GB to 4GB or larger. We really need this feature. Thanks bro.

@Kenblair1226
Copy link

@bwoodsend if you pack the cuda cudnn libraries for deep learning, the package will up to 2GB archive. Please move the limit from 2GB to 4GB or larger. We really need this feature. Thanks bro.

Yes, same usage. I ended up with putting all model data into password protected zip. It will be great if we could go beyond 2GB limitation.

@rokm
Copy link
Member

rokm commented Nov 10, 2020

Uf, this is going to be all sorts of fun...

Here's an experimental branch that raises limit from 2 GB to 4 GB by switching from signed integers to unsigned ones: https://github.com/rokm/pyinstaller/tree/large-file-support

I think before even considering the move to 64-bit integers for raising the limit further, we'll need to rework the archive extraction in the bootloader. Because currently, it extracts the whole toc entry into an allocated buffer and, if compression is enabled, decompresses it in one go. This is done both for internal use and for extraction onto filesystem during unpacking... The decompression should definitely be done in a streaming manner (i.e., using smaller chunks; so that we can avoid having whole compressed data in memory at once). And when extraction is performed as a part of pyi_arch_extract2fs(), the input file reading should be done in a streaming manner as well, even if there's no compression (so we can avoid reading the whole file into memory).

@bwoodsend
Copy link
Member

@rokm You mean it'll currently have all 2-4GB in RAM during decompression? That'd be horrific on a 4GB machine.

@rokm
Copy link
Member

rokm commented Nov 10, 2020

I'm not sure if the big data entries are actually compressed...

But even for uncompressed files, the current implementation of pyi_arch_extract2fs() uses pyi_arch_extract() to obtain the entry's data blob, and then writes it to the file in _MEIPASS dir... So unless I'm mistaken, if we add a 2 GB data file to the program, it will end up whole in the RAM during the unpacking...

unsigned char *
pyi_arch_extract(ARCHIVE_STATUS *status, TOC *ptoc)
{
unsigned char *data;
unsigned char *tmp;
if (pyi_arch_open_fp(status) != 0) {
OTHERERROR("Cannot open archive file\n");
return NULL;
}
fseek(status->fp, status->pkgstart + ntohl(ptoc->pos), SEEK_SET);
data = (unsigned char *)malloc(ntohl(ptoc->len));
if (data == NULL) {
OTHERERROR("Could not allocate read buffer\n");
return NULL;
}
if (fread(data, ntohl(ptoc->len), 1, status->fp) < 1) {
OTHERERROR("Could not read from file\n");
free(data);
return NULL;
}
if (ptoc->cflag == '\1') {
tmp = decompress(data, ptoc);
free(data);
data = tmp;
if (data == NULL) {
OTHERERROR("Error decompressing %s\n", ptoc->name);
return NULL;
}
}
pyi_arch_close_fp(status);
return data;
}
/*
* Extract from the archive and copy to the filesystem.
* The path is relative to the directory the archive is in.
*/
int
pyi_arch_extract2fs(ARCHIVE_STATUS *status, TOC *ptoc)
{
FILE *out;
size_t result, len;
unsigned char *data = pyi_arch_extract(status, ptoc);
/* Create tmp dir _MEIPASSxxx. */
if (pyi_create_temp_path(status) == -1) {
return -1;
}
out = pyi_open_target(status->temppath, ptoc->name);
len = ntohl(ptoc->ulen);
if (out == NULL) {
FATAL_PERROR("fopen", "%s could not be extracted!\n", ptoc->name);
return -1;
}
else {
result = fwrite(data, len, 1, out);
if ((1 != result) && (len > 0)) {
FATAL_PERROR("fwrite", "Failed to write all bytes for %s\n", ptoc->name);
return -1;
}
#ifndef WIN32
fchmod(fileno(out), S_IRUSR | S_IWUSR | S_IXUSR);
#endif
fclose(out);
}
free(data);
return 0;
}

(And it's even worse if it is compressed, because then we keep both whole compressed and uncompressed data blobs in memory during decompression).

@Red-Eyed
Copy link

Red-Eyed commented Nov 10, 2020

@rokm
Thanks for the effort!

I just tried your branch large-file-support (I didn't build anything, should I?)
Unfortunately, it doesn't work on Ubuntu 20.04
So it packs into one file, but it throws an error on the unpacking

I faced with the same error, I guess, in the bootloader when I tried to work on this issue

-> ~/my_cool_installer
[568688] Cannot open self /home/redeyed/my_cool_installer or archive /home/redeyed/my_cool_installer.pkg

@rokm
Copy link
Member

rokm commented Nov 10, 2020

(I didn't build anything, should I?)

Yes, you need to rebuild the bootloader yourself.

@Red-Eyed
Copy link

Red-Eyed commented Nov 10, 2020

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?

That would be awesome. Thanks!

@rokm
Copy link
Member

rokm commented Nov 11, 2020

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?

That would be awesome. Thanks!

One of the commits adds a test that creates a 3 GB data file with random contents, computes its md5 hash, and then adds this file to a onefile build of a program, which in turn reads the unpacked file from its _MEIPASS dir, computes the md5 hash, and compares it to the one that was computed previously.

This test is now passing on my Fedora 33 box, and in a Windows 10 VM. (But you need to rebuild the bootloader, because that's where the unpacking actually takes place).

@sshuair
Copy link

sshuair commented Nov 11, 2020

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?

That would be awesome. Thanks!

Yes! In my case, the libtorch_cuda.so is 900+MB. The package will larger than 2 GB if you add the cuda, cudnn and TensorRT library.

@Red-Eyed
Copy link

Red-Eyed commented Nov 11, 2020

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?
That would be awesome. Thanks!

Yes! In my case, the libtorch_cuda.so is 900+MB. The package will larger than 2 GB if you add the cuda, cudnn and TensorRT library.

Currently I have 6 GB of python environment (pytorch, tensorflow, scipy etc) and pack it into 1.6GB tar.xz archive to get under 2GB limit

but I want to use zstandard compression (to speed up decompression) but zstandart compresses to 2.6GB which is above of 2GB limit

@Red-Eyed
Copy link

Note: I just built bootloader and it works, thank you @rokm !

@rokm
Copy link
Member

rokm commented Nov 11, 2020

Here's a further branch, in which 64-bit integers are used: https://github.com/rokm/pyinstaller/tree/large-file-support-v2

So now in theory, sky's the limit - you can chuck in all your deep learning frameworks, CUDA libraries, pretrained models, ...

In practice, however, the 5 GB onefile test passes only on linux (tested only 64-bit for now). Windows (even 64-bit) do not seem to support executables larger than 4 GB. On (64-bit) macOS, the macholib used in a signing preparation step in the assembly pipeline seems to assume that the file size can be represented with an unsigned integer, so 4 GB max as well (and this is consistent with macOS signature searching code in the bootloader, which uses 32-bit unsigned integers).

So really huge onefile executables (> 4 GB) work only on linux. But on all three OSes, this limitation can be worked around by using .spec file and adding append_pkg=False as an extra argument to EXE(). This will give you a small executable (e.g., program) and a single large archive file to go with it (e.g., program.pkg).

@sshuair
Copy link

sshuair commented Nov 18, 2020

@Legorooj any progress?

@Legorooj
Copy link
Member

I stopped work on this because by the time I woke up the next morning @rokm had already written everything that needed to be😅. Maybe he could submit a PR?

@sshuair
Copy link

sshuair commented Nov 18, 2020

I stopped work on this because by the time I woke up the next morning @rokm had already written everything that needed to be😅. Maybe he could submit a PR?

Nice job!

@rokm
Copy link
Member

rokm commented Nov 18, 2020

@sshuair the current plan is to have the changes from those experimental branches submitted and merged gradually - first the endian-handling cleanup, then the file extraction cleanup, then switch to unsigned 32-bit ints, and finally to 64-bit ones.

In the meantime, if you require this functionality, you can use either of the experimental branches linked above.

@Red-Eyed
Copy link

Red-Eyed commented Nov 18, 2020

@sshuair
If u need >2GB just use branch that @rokm published. I just use it and it works on Ubuntu 20.04 and Windows (but before using it, u need to rebuild the bootloader, it's easy, just read the documentation)

@sshuair
Copy link

sshuair commented Nov 20, 2020

@Red-Eyed the branch that @rokm published seems not work for me. When the file PKG-00.pkg up to 2.7BG, the error appeared again. My OS system is Ubuntu 16.04.

@sshuair
Copy link

sshuair commented Nov 20, 2020

@Red-Eyed Sorry~~It's my fault, I install the package from master branch but not the large-file-support-v2. Now it works for me.

@EricPengShuai
Copy link

@rokm so how to solve the error problem for binaries larger than 4G, my OS system is CentOS Linux release 7.6.1810 (Core)
the final error information is as follows.

struct.error: 'I' format requires 0 <= number <= 4294967295

@rokm
Copy link
Member

rokm commented Sep 14, 2022

@rokm so how to solve the error problem for binaries larger than 4G, my OS system is CentOS Linux release 7.6.1810 (Core) the final error information is as follows.

struct.error: 'I' format requires 0 <= number <= 4294967295

You cannot. If you wanted to generate embedded archive that's larger than 4 GB, you would need to switch the types used by corresponding PyInstaller code (both python and C) to 64-bit integers. While this would work on linux, neither Windows nor macOS support executables larger than 4 GB, so we only extended max size from 2 GB to 4 GB by switching to unsigned 32-bit integers.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area:bootloader Caused be or effecting the bootloader help wanted Seeking help by somebody with deeper knowledge on this topic. pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this.
Projects
None yet
9 participants