Error for binaries larger than 2Gb #3939

lsoica · 2018-12-20T10:37:42Z

For final binaries larger than 2Gb, the following exception message is printed out:

struct.error: 'i' format requires -2147483648 <= number <= 2147483647

https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/archive/writers.py#L264

class CTOC(object):
    ENTRYSTRUCT = '!iiiiBB'  # (structlen, dpos, dlen, ulen, flag, typcd) followed by name

The text was updated successfully, but these errors were encountered:

htgoebel · 2019-01-02T13:46:29Z

A possible solution would be to change this from signed to unigned values (need to be done in the bootloader, too).

Can you please provide a test-case we can include into the test-suite. Thanks.

lsoica · 2019-01-07T13:03:08Z

A possible solution would be to change this from signed to unigned values (need to be done in the bootloader, too).

Can you please provide a test-case we can include into the test-suite. Thanks.

The switch from signed to unsigned moves the limit from 2gb to 4gb, right ? Is there a way to go over the 4gb limit ? Basically, could we go with an 8byte specifier ?

As for a test case, do you need a specific format or just a set of steps ?

htgoebel · 2019-01-07T13:42:39Z

The switch from signed to unsigned moves the limit from 2gb to 4gb, right ?

Right.

Is there a way to go over the 4gb limit ? Basically, could we go with an 8byte specifier ?

Basically we could be, but this required more changes. Also we should keep in mind 32-bit platforms, which might have trouble using 8-byte specifiers.

As for a test case, do you need a specific format or just a set of steps ?

Well, this should fit into our test-quite, which uses py-test. The most problematic point is to generate same data which surly ends up to be > 4gb when creating the CTOC.

I suggest several test-cases, which (as I assume) also ease developing tests:

an unit-test for CArchiveWriter and CArchiveReader (testing CTOC and CTOCReader is not of much use IMHO). Shall go into a new file in tests/unit/.
an end-to-end test to ensure the created executable actually works.
This test could create a huge file containing some known data at some relevant positions (e.g. start, just before 2GB, just after 2GB, just before 4GB, just after 4GB), pass it using pyi_args=["--add-data", "xxx:yyy"] and the code should verify the expected data can be read.
Shall go into tests/function/test_basic.py.

Kenblair1226 · 2019-09-17T02:47:21Z

The switch from signed to unsigned moves the limit from 2gb to 4gb, right ?

Right.

Sorry this might be a dumb question, but how to do it? Change '!iiiiBB' to '!IIIIBB' ?
how to to this for bootloader for they are all binary files?

Legorooj · 2020-02-26T05:00:14Z

Pinging

Red-Eyed · 2020-07-10T15:00:19Z

Any news?
I tried to change datastructure but have no luck

Legorooj · 2020-07-13T03:57:12Z

@Red-Eyed I've added this to my todo-list, and I'll get this in the 4.1 release, maybe even the 4.0 release depending on when that actually gets released.

sshuair · 2020-11-09T09:28:15Z

@Legorooj is this fixed now?

Red-Eyed · 2020-11-09T10:12:32Z

@Legorooj sorry for the annoying comments, but this issue is important to me: Currently I have to do my installer with additional tarballs rather than having just a single large executable.

So, If you don't mind, I just want to ask a few questions in order to have a better understanding the status of this issue:

Is any kind of work in progress?
Maybe there are some difficulties that doesn't allow to put > 2GB into the CArchive?
How do you think when (if ever) this will be resolved?

Thanks!

Legorooj · 2020-11-09T10:25:13Z

Ah right. Let me take a lok at this right now - my apologies, I completely forgot about this.

No. There will be within a week though hopefully.
Moving the limit from 2gb to 4gb should be easy enough I think, however I don't know beyond that.
Very soon.

bwoodsend · 2020-11-09T17:15:17Z

I question the usefulness of this. It'd take a good 30 minutes to pack a 4GB application then a good minute or so to unpack again followed by however long it takes for your antivirus to give it a good sniff which would have to happen every time the user opens the application. I already find ~200MB applications to be annoyingly slow to startup. I think you'd be better off turning your one-dir applications into an installable bundle using NSIS or something equivalent. That way your user only has to unpack it once.

Red-Eyed · 2020-11-09T17:21:06Z

@bwoodsend you described only ONE use case, and yes it's useless. Also, you do not take into account other OS than windows.

My use case is actually cross platform installer. And I do not want or need any third party installer, as my self written install in python code does the job.

Linux distributions do not have problems with filesystem and antiviruses so it's fast to unpack zstd archives in multithreaded mode.

Also, u said that u've done it, could you please share that branch with me?

Red-Eyed · 2020-11-09T17:29:36Z

@bwoodsend

It'd take a good 30 minutes to pack a 4GB

Idk, it takes me about 2 min to pack ~2GB.But note, I just pack "data"

bwoodsend · 2020-11-09T18:09:12Z

I don't have a branch for it. I was just messing about with zlib which is what PyInstaller uses to pack and unpack.

bwoodsend · 2020-11-09T18:16:17Z

If your large size is coming from data files is it possible to just put those into a zip, have your code read directly from said zip, then include the zip in your onefile app?

Red-Eyed · 2020-11-09T19:31:05Z

I just include tar.xz into my pyinstaller one file and then unpack it

Red-Eyed · 2020-11-09T19:33:40Z

my tar.xz is about 1.6 GB (the source size is 6GB)
I want to use zstd instead, because unpacking is done in multi threaded mode (which is much faster), but compression ratio is sightly lower compared to xz, and it's size is about 2.5 GB that doesn't fit into current pyinstaller implementation.

Red-Eyed · 2020-11-09T19:38:47Z

I don't have a branch for it. I was just messing about with zlib which is what PyInstaller uses to pack and unpack.

I played around with PyInstaller and CArchive, but I haven't make it to work.

So, If you're not going to create PR for this issue, I would like to see any kind of investigation work, even if it doesn't meet PR requirements, just to see what have u done.

Or u didn't change CArchive logic?

Thanks

sshuair · 2020-11-10T00:28:49Z

@bwoodsend if you pack the cuda cudnn libraries for deep learning, the package will up to 2GB archive. Please move the limit from 2GB to 4GB or larger. We really need this feature. Thanks bro.

Kenblair1226 · 2020-11-10T04:01:30Z

@bwoodsend if you pack the cuda cudnn libraries for deep learning, the package will up to 2GB archive. Please move the limit from 2GB to 4GB or larger. We really need this feature. Thanks bro.

Yes, same usage. I ended up with putting all model data into password protected zip. It will be great if we could go beyond 2GB limitation.

rokm · 2020-11-10T19:10:41Z

Uf, this is going to be all sorts of fun...

Here's an experimental branch that raises limit from 2 GB to 4 GB by switching from signed integers to unsigned ones: https://github.com/rokm/pyinstaller/tree/large-file-support

I think before even considering the move to 64-bit integers for raising the limit further, we'll need to rework the archive extraction in the bootloader. Because currently, it extracts the whole toc entry into an allocated buffer and, if compression is enabled, decompresses it in one go. This is done both for internal use and for extraction onto filesystem during unpacking... The decompression should definitely be done in a streaming manner (i.e., using smaller chunks; so that we can avoid having whole compressed data in memory at once). And when extraction is performed as a part of pyi_arch_extract2fs(), the input file reading should be done in a streaming manner as well, even if there's no compression (so we can avoid reading the whole file into memory).

bwoodsend · 2020-11-10T20:01:32Z

@rokm You mean it'll currently have all 2-4GB in RAM during decompression? That'd be horrific on a 4GB machine.

rokm · 2020-11-10T20:53:58Z

I'm not sure if the big data entries are actually compressed...

But even for uncompressed files, the current implementation of pyi_arch_extract2fs() uses pyi_arch_extract() to obtain the entry's data blob, and then writes it to the file in _MEIPASS dir... So unless I'm mistaken, if we add a 2 GB data file to the program, it will end up whole in the RAM during the unpacking...

pyinstaller/bootloader/src/pyi_archive.c

Lines 145 to 223 in e67a589

    
           unsigned char * 
        
           pyi_arch_extract(ARCHIVE_STATUS *status, TOC *ptoc) 
        
           { 
        
               unsigned char *data; 
        
               unsigned char *tmp; 
        
               if (pyi_arch_open_fp(status) != 0) { 
        
                   OTHERERROR("Cannot open archive file\n"); 
        
                   return NULL; 
        
               } 
        
               fseek(status->fp, status->pkgstart + ntohl(ptoc->pos), SEEK_SET); 
        
               data = (unsigned char *)malloc(ntohl(ptoc->len)); 
        
               if (data == NULL) { 
        
                   OTHERERROR("Could not allocate read buffer\n"); 
        
                   return NULL; 
        
               } 
        
               if (fread(data, ntohl(ptoc->len), 1, status->fp) < 1) { 
        
                   OTHERERROR("Could not read from file\n"); 
        
                   free(data); 
        
                   return NULL; 
        
               } 
        
               if (ptoc->cflag == '\1') { 
        
                   tmp = decompress(data, ptoc); 
        
                   free(data); 
        
                   data = tmp; 
        
                   if (data == NULL) { 
        
                       OTHERERROR("Error decompressing %s\n", ptoc->name); 
        
                       return NULL; 
        
                   } 
        
               } 
        
               pyi_arch_close_fp(status); 
        
               return data; 
        
           } 
        
           /* 
        
            * Extract from the archive and copy to the filesystem. 
        
            * The path is relative to the directory the archive is in. 
        
            */ 
        
           int 
        
           pyi_arch_extract2fs(ARCHIVE_STATUS *status, TOC *ptoc) 
        
           { 
        
               FILE *out; 
        
               size_t result, len; 
        
               unsigned char *data = pyi_arch_extract(status, ptoc); 
        
               /* Create tmp dir _MEIPASSxxx. */ 
        
               if (pyi_create_temp_path(status) == -1) { 
        
                   return -1; 
        
               } 
        
               out = pyi_open_target(status->temppath, ptoc->name); 
        
               len = ntohl(ptoc->ulen); 
        
               if (out == NULL) { 
        
                   FATAL_PERROR("fopen", "%s could not be extracted!\n", ptoc->name); 
        
                   return -1; 
        
               } 
        
               else { 
        
                   result = fwrite(data, len, 1, out); 
        
                   if ((1 != result) && (len > 0)) { 
        
                       FATAL_PERROR("fwrite", "Failed to write all bytes for %s\n", ptoc->name); 
        
                       return -1; 
        
                   } 
        
           #ifndef WIN32 
        
                   fchmod(fileno(out), S_IRUSR | S_IWUSR | S_IXUSR); 
        
           #endif 
        
                   fclose(out); 
        
               } 
        
               free(data); 
        
               return 0; 
        
           }

(And it's even worse if it is compressed, because then we keep both whole compressed and uncompressed data blobs in memory during decompression).

Red-Eyed · 2020-11-10T23:42:56Z

@rokm
Thanks for the effort!

I just tried your branch large-file-support (I didn't build anything, should I?)
Unfortunately, it doesn't work on Ubuntu 20.04
So it packs into one file, but it throws an error on the unpacking

I faced with the same error, I guess, in the bootloader when I tried to work on this issue

-> ~/my_cool_installer
[568688] Cannot open self /home/redeyed/my_cool_installer or archive /home/redeyed/my_cool_installer.pkg

rokm · 2020-11-10T23:48:06Z

(I didn't build anything, should I?)

Yes, you need to rebuild the bootloader yourself.

Red-Eyed · 2020-11-10T23:48:44Z

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?

That would be awesome. Thanks!

rokm · 2020-11-11T00:28:19Z

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?

That would be awesome. Thanks!

One of the commits adds a test that creates a 3 GB data file with random contents, computes its md5 hash, and then adds this file to a onefile build of a program, which in turn reads the unpacked file from its _MEIPASS dir, computes the md5 hash, and compares it to the one that was computed previously.

This test is now passing on my Fedora 33 box, and in a Windows 10 VM. (But you need to rebuild the bootloader, because that's where the unpacking actually takes place).

sshuair · 2020-11-11T01:45:48Z

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?

That would be awesome. Thanks!

Yes! In my case, the libtorch_cuda.so is 900+MB. The package will larger than 2 GB if you add the cuda, cudnn and TensorRT library.

Red-Eyed · 2020-11-11T09:21:43Z

Okay, will do that tomorrow.
Until then, I would like to ask you:
could you please confirm that your branch actually unpacks package (that was built in one file mode) larger than 2 GB) ?
That would be awesome. Thanks!

Yes! In my case, the libtorch_cuda.so is 900+MB. The package will larger than 2 GB if you add the cuda, cudnn and TensorRT library.

Currently I have 6 GB of python environment (pytorch, tensorflow, scipy etc) and pack it into 1.6GB tar.xz archive to get under 2GB limit

but I want to use zstandard compression (to speed up decompression) but zstandart compresses to 2.6GB which is above of 2GB limit

Red-Eyed · 2020-11-11T13:39:49Z

Note: I just built bootloader and it works, thank you @rokm !

rokm · 2020-11-11T21:43:02Z

Here's a further branch, in which 64-bit integers are used: https://github.com/rokm/pyinstaller/tree/large-file-support-v2

So now in theory, sky's the limit - you can chuck in all your deep learning frameworks, CUDA libraries, pretrained models, ...

In practice, however, the 5 GB onefile test passes only on linux (tested only 64-bit for now). Windows (even 64-bit) do not seem to support executables larger than 4 GB. On (64-bit) macOS, the macholib used in a signing preparation step in the assembly pipeline seems to assume that the file size can be represented with an unsigned integer, so 4 GB max as well (and this is consistent with macOS signature searching code in the bootloader, which uses 32-bit unsigned integers).

So really huge onefile executables (> 4 GB) work only on linux. But on all three OSes, this limitation can be worked around by using .spec file and adding append_pkg=False as an extra argument to EXE(). This will give you a small executable (e.g., program) and a single large archive file to go with it (e.g., program.pkg).

sshuair · 2020-11-18T09:05:01Z

@Legorooj any progress?

Legorooj · 2020-11-18T09:45:26Z

I stopped work on this because by the time I woke up the next morning @rokm had already written everything that needed to be😅. Maybe he could submit a PR?

sshuair · 2020-11-18T10:26:50Z

I stopped work on this because by the time I woke up the next morning @rokm had already written everything that needed to be😅. Maybe he could submit a PR?

Nice job!

rokm · 2020-11-18T11:21:49Z

@sshuair the current plan is to have the changes from those experimental branches submitted and merged gradually - first the endian-handling cleanup, then the file extraction cleanup, then switch to unsigned 32-bit ints, and finally to 64-bit ones.

In the meantime, if you require this functionality, you can use either of the experimental branches linked above.

Red-Eyed · 2020-11-18T11:44:30Z

@sshuair
If u need >2GB just use branch that @rokm published. I just use it and it works on Ubuntu 20.04 and Windows (but before using it, u need to rebuild the bootloader, it's easy, just read the documentation)

sshuair · 2020-11-20T02:43:07Z

@Red-Eyed the branch that @rokm published seems not work for me. When the file PKG-00.pkg up to 2.7BG, the error appeared again. My OS system is Ubuntu 16.04.

sshuair · 2020-11-20T03:36:31Z

@Red-Eyed Sorry~~It's my fault, I install the package from master branch but not the large-file-support-v2. Now it works for me.

EricPengShuai · 2022-09-14T08:55:07Z

@rokm so how to solve the error problem for binaries larger than 4G, my OS system is CentOS Linux release 7.6.1810 (Core)
the final error information is as follows.

struct.error: 'I' format requires 0 <= number <= 4294967295

rokm · 2022-09-14T09:21:38Z

@rokm so how to solve the error problem for binaries larger than 4G, my OS system is CentOS Linux release 7.6.1810 (Core) the final error information is as follows.
struct.error: 'I' format requires 0 <= number <= 4294967295

You cannot. If you wanted to generate embedded archive that's larger than 4 GB, you would need to switch the types used by corresponding PyInstaller code (both python and C) to 64-bit integers. While this would work on linux, neither Windows nor macOS support executables larger than 4 GB, so we only extended max size from 2 GB to 4 GB by switching to unsigned 32-bit integers.

htgoebel added area:bootloader Caused be or effecting the bootloader help wanted Seeking help by somebody with deeper knowledge on this topic. labels Jan 2, 2019

htgoebel added the pull-request wanted Please submit a pull-request for this, maintainers will not actively work on this. label Jan 7, 2019

This was referenced Jan 19, 2020

building: Support arbitrarily large files in onefile mode. #4640

Closed

struct.error when try to add-data a .pb tensorflow model #4617

Closed

Legorooj self-assigned this Jul 13, 2020

bwoodsend mentioned this issue Oct 20, 2020

Creation of a single file executable fail, if the total size of the included files is above 2 GB (windows 10) #5264

Closed

rokm mentioned this issue Dec 21, 2020

Issue when building a Pytorch based program using Pyinstaller #5402

Closed

rokm mentioned this issue Mar 24, 2021

archive: switch offsets and lengths to unsigned 32-bit integers #5667

Merged

bwoodsend closed this as completed in #5667 Mar 25, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022

Error for binaries larger than 2Gb #3939

Error for binaries larger than 2Gb #3939

Comments

lsoica commented Dec 20, 2018 • edited

htgoebel commented Jan 2, 2019

lsoica commented Jan 7, 2019

htgoebel commented Jan 7, 2019

Kenblair1226 commented Sep 17, 2019 • edited

Legorooj commented Feb 26, 2020

Red-Eyed commented Jul 10, 2020

Legorooj commented Jul 13, 2020

sshuair commented Nov 9, 2020

Red-Eyed commented Nov 9, 2020 • edited

Legorooj commented Nov 9, 2020 • edited

bwoodsend commented Nov 9, 2020

Red-Eyed commented Nov 9, 2020 • edited

Red-Eyed commented Nov 9, 2020 • edited

bwoodsend commented Nov 9, 2020

bwoodsend commented Nov 9, 2020

Red-Eyed commented Nov 9, 2020

Red-Eyed commented Nov 9, 2020 • edited

Red-Eyed commented Nov 9, 2020 • edited

sshuair commented Nov 10, 2020

Kenblair1226 commented Nov 10, 2020

rokm commented Nov 10, 2020 • edited

bwoodsend commented Nov 10, 2020

rokm commented Nov 10, 2020

Red-Eyed commented Nov 10, 2020 • edited

rokm commented Nov 10, 2020

Red-Eyed commented Nov 10, 2020 • edited

rokm commented Nov 11, 2020

sshuair commented Nov 11, 2020

Red-Eyed commented Nov 11, 2020 • edited

Red-Eyed commented Nov 11, 2020

rokm commented Nov 11, 2020

sshuair commented Nov 18, 2020

Legorooj commented Nov 18, 2020

sshuair commented Nov 18, 2020

rokm commented Nov 18, 2020

Red-Eyed commented Nov 18, 2020 • edited

sshuair commented Nov 20, 2020

sshuair commented Nov 20, 2020

EricPengShuai commented Sep 14, 2022

rokm commented Sep 14, 2022

lsoica commented Dec 20, 2018 •

edited

Kenblair1226 commented Sep 17, 2019 •

edited

Red-Eyed commented Nov 9, 2020 •

edited

Legorooj commented Nov 9, 2020 •

edited

Red-Eyed commented Nov 9, 2020 •

edited

Red-Eyed commented Nov 9, 2020 •

edited

Red-Eyed commented Nov 9, 2020 •

edited

Red-Eyed commented Nov 9, 2020 •

edited

rokm commented Nov 10, 2020 •

edited

Red-Eyed commented Nov 10, 2020 •

edited

Red-Eyed commented Nov 10, 2020 •

edited

Red-Eyed commented Nov 11, 2020 •

edited

Red-Eyed commented Nov 18, 2020 •

edited