split feature is non-compliant #98

jstanley0 · 2013-09-25T20:36:54Z

The split feature implemented in Pull request 75 doesn't conform to the standard, in that:

Local and central directory header records must never be split across a segment boundary. (8.5.2) The rubyzip split code doesn't even look at the zip file content; it just blindly chops its up into fixed-size pieces.
The central directory entry for each file should indicate a disk number where the file starts (4.3.12); this is hard-coded to 0 in rubyzip.
The end-of-central-directory record should indicate disk numbers where the central directory begins and ends, and also the number of entries located on the last disk in addition to the total number of entries. (4.3.16) rubyzip is again hard-coded to assume there is only one disk.

(section numbers refer to version 6.5.2 of http://www.pkware.com/documents/casestudies/APPNOTE.TXT)

In addition, the test code doesn't test whether anyone might be able to read the split archive; it merely strings the pieces back together and tests reading the reconstituted file. The zip specification is, however, designed not to require stringing pieces of split archives together--or even generating a single big archive to begin with, as fogeys like me who remember spanning archives across floppy disks would know. It's designed so that the central directory and any particular file can be located in-place in their segment (or "disk"). rubyzip doesn't accomplish this.

jstanley0 · 2013-09-25T20:44:01Z

FYI: This feature actually isn't important to me. I just happen to be in the middle of implementing zip64 write support, which adds more fields related to disk numbers (as the zip64 end of central directory record itself can be split across disks). And I am continuing to hard-code everything to assume a single disk. This behavior makes me feel bad unless the inconsistency with the split feature is noted.

simonoff · 2013-10-20T20:13:24Z

I don't like implementation of split in current version too but just now haven't time to do it proper way.

hainesr · 2021-05-29T07:08:12Z

I've been looking at the splitting code myself (and well remember plugging multiple floppies in to unzip large files). I do wonder if this feature is relevant anymore - but as we have it, it should at least follow the standard. I'm going to highlight the distinction between spanning [1] and splitting [2] and assume we don't need to support spanning anymore!

Anyway, this is to say I will try and get round to this but probably won't hit v3.0.

[1] "segmenting a ZIP file across multiple removable media"
[2] "does not require writing each segment to a unique removable medium and instead supports placing all pieces onto local or non-removable locations such as file systems, local drives, folders, etc"

hainesr added the bug label May 29, 2021

hainesr added this to the Future milestone May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split feature is non-compliant #98

split feature is non-compliant #98

jstanley0 commented Sep 25, 2013

jstanley0 commented Sep 25, 2013

simonoff commented Oct 20, 2013

hainesr commented May 29, 2021

split feature is non-compliant #98

split feature is non-compliant #98

Comments

jstanley0 commented Sep 25, 2013

jstanley0 commented Sep 25, 2013

simonoff commented Oct 20, 2013

hainesr commented May 29, 2021