You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Streaming currently works great, but only for streams where you can seek arbitrarily, like a file on a hard disk. Specifically, when we close an archive, we write the table of entries by setting the stream's pos for each entry and performing the write:
It should be possible, as zip_tricks and other libraries outside the Ruby ecosystem demonstrate, to create a streaming zip archive across a network without the ability seek to an arbitrary point in the stream. This approach relies on writing the entry table at the end. I haven't found a formal description of this approach outside of source code, but a plain english version is nicely summarized by the authors of the Python stream-zip package:
It's not possible to completely stream-write ZIP files. Small bits of metadata for each member file, such as its name, must be placed at the end of the ZIP. In order to do this, stream-zip buffers this metadata in memory until it can be output.
I found that the zip_tricks gem can be configured to write zip files this way and I did try to make rubyzip do the same. So I created my own IO adapter to wrap a network socket to see if it was possible to stream zip creations across a network using rubyzip and found that it is not possible because I had to implement pos = and it was not possible to correctly seek the underlying stream after having already sent the stream to a network socket. At that point the buffer is flushed to the network and there is no way I can rewind.
My use case is I need to create a large zip archive composed of objects from an object store (s3) and send the resulting archive back to an object store while working within disk space limitations. In theory, this should require no disk space and only a small amount of memory for a stream buffer and an entries table which would be written at the end of the stream.
The text was updated successfully, but these errors were encountered:
Hello, yes I agree we should support this feature; as you say the ZIP standard supports streaming where seek isn't available.
I'm not sure I'll be able to get this into v3 as it will require a fundamental change to how things are done quite deep in RubyZip. I'm working on this when I can though and will get done as soon as I can.
Streaming currently works great, but only for streams where you can seek arbitrarily, like a file on a hard disk. Specifically, when we close an archive, we write the table of entries by setting the stream's pos for each entry and performing the write:
rubyzip/lib/zip/output_stream.rb
Lines 177 to 184 in 750d372
It should be possible, as zip_tricks and other libraries outside the Ruby ecosystem demonstrate, to create a streaming zip archive across a network without the ability seek to an arbitrary point in the stream. This approach relies on writing the entry table at the end. I haven't found a formal description of this approach outside of source code, but a plain english version is nicely summarized by the authors of the Python stream-zip package:
I found that the zip_tricks gem can be configured to write zip files this way and I did try to make rubyzip do the same. So I created my own IO adapter to wrap a network socket to see if it was possible to stream zip creations across a network using rubyzip and found that it is not possible because I had to implement
pos =
and it was not possible to correctly seek the underlying stream after having already sent the stream to a network socket. At that point the buffer is flushed to the network and there is no way I can rewind.My use case is I need to create a large zip archive composed of objects from an object store (s3) and send the resulting archive back to an object store while working within disk space limitations. In theory, this should require no disk space and only a small amount of memory for a stream buffer and an entries table which would be written at the end of the stream.
The text was updated successfully, but these errors were encountered: