Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_buffer mangles the content of the buffer it is given. #280

Closed
weshatheleopard opened this issue Mar 9, 2016 · 11 comments
Closed

open_buffer mangles the content of the buffer it is given. #280

weshatheleopard opened this issue Mar 9, 2016 · 11 comments

Comments

@weshatheleopard
Copy link
Contributor

Here, you can use absolutely any ZIP file for test.zip:

require 'zip'
buf = nil
File.open('test.zip', "rb") { |f| buf = f.read }
File.open("before.zip", "wb") { |f| f << buf } 
Zip::File.open_buffer(buf) { |zf| nil # Do nothing at all!!! }
File.open("after.zip", "wb") { |f| f << buf }

Note that after.zip differs from before.zip, despite the fact that there's absolutely no reason to be changed.

However, if you do:

File.open('test.zip', "rb") { |f| buf = f.read }
Zip::File.open_buffer(buf.freeze) { |zf| zf.entries.first }
File.open("after.zip", "wb") { |f| f << buf }

then everything works perfectly, buf is not damaged and nothing crashes.

@seth-at-at
Copy link

seth-at-at commented Aug 2, 2017

Using the first code
before.zip
after.zip
Second code
after#2.zip

What you are missing is you forgot the buf.freeze in the first code. if you add that in and do

require 'zip'
buf = nil
File.open('test.zip', "rb") { |f| buf = f.read }
File.open("before.zip", "wb") { |f| f << buf } 
Zip::File.open_buffer(buf.freeze) { |zf| nil }# Do nothing at all!!! 
File.open("after.zip", "wb") { |f| f << buf }

you get before, after

@seth-at-at
Copy link

The reason for the increased amount of file size is due to the fact that sending just nil also sends the instance variables of nil while sending nil.freeze sends just the object, and since the object is nothing it doesn't increase the size of the file.

@weshatheleopard
Copy link
Contributor Author

weshatheleopard commented Aug 2, 2017

@seth-at-at: According to the documentation:

.open_buffer(io, options = {}) {|zf| ... } Like #open, but reads zip archive contents from a String or open IO stream, and outputs data to a buffer.

It does not say anything about modifying the source data.

When you read something from a magazine, you do not expect the letters to disappear as you read them, or for the magazine to become crumbled or dirty; when I open a magazine and read from it, I expect to be able to put it down (after I'm done reading) in the original condition (so my wife can pick it up and read it as if it was never touched). When I read things from a zip file, I do not expect the file to be altered in any way, either; I expect to also be able to "put it down" in the original condition. Whatever is happening in rubyzip code violates the principle of least surprise.

My point about freeze was that if buf is frozen, then rubyzip acts normally, but if it's not frozen, then the gem takes the liberty to mangle it. Which once again violates the principle of least surprise. Again, reading from the source should not affect the source in any way, shape or form.

@hainesr
Copy link
Member

hainesr commented Apr 3, 2018

This is replicated in RuyZip's own tests, here: https://github.com/rubyzip/rubyzip/blob/master/test/file_test.rb#L113

def test_close_buffer_with_io
  f = File.open('test/data/rubycode.zip')
  zf = ::Zip::File.open_buffer f
  assert zf.close
  f.close
end

The file is simply opened and closed, but the file changes - it is listed as modified by git status.

I really don't think this should happen. I am trying to work out what is going on and will submit a PR if I manage to figure it out.

@hainesr
Copy link
Member

hainesr commented Apr 4, 2018

I think I figured it out and was able to fix this in #360. More testing by others would be appreciated.

@jdleesmiller
Copy link
Member

In the course of looking at this, I wrote a small utility to see why the zip file was changing when it was rewritten. Not all zip files change when rubyzip rewrites them, so we were a bit lucky to have the rubycode.zip example to work with. In that case, the difference was that the order of the extra fields was changing when rubyzip rewrote the file. The extra fields that were being swapped were tagged as follows:

0x5455 extended timestamp
0x7855 Info-ZIP UNIX (new)

For want of a better place to put that utility, I'll put it here:

require 'pp'

STDIN.binmode

LOCAL_FILE_HEADER_SIGNATURE = 0x04034b50
CENTRAL_DIRECTORY_FILE_HEADER_SIGNATURE = 0x02014b50

loop do
  signature, = STDIN.read(4).unpack('L')
  case signature
  when LOCAL_FILE_HEADER_SIGNATURE
    # keep going
  when CENTRAL_DIRECTORY_FILE_HEADER_SIGNATURE
    break
  else
    raise "bad signature #{signature.to_s(16)}"
  end

  header = STDIN.read(26)

  version_needed_to_extract,
    general_purpose_bit_flag,
    compression_method,
    last_mod_file_time,
    last_mod_file_date,
    crc32,
    compressed_size,
    uncompressed_size,
    file_name_length,
    extra_field_length = header.unpack('SSSSSLLLSS')

  file_name = STDIN.read(file_name_length) if file_name_length > 0

  extra_fields = []
  total_size = 0
  while total_size < extra_field_length
    tag, size = STDIN.read(4).unpack('SS')
    extra_fields << {
      tag: tag.to_s(16),
      size: size,
      data: STDIN.read(size)
    }
    total_size += 4 + size
  end
  raise 'bad extra field size' if total_size != extra_field_length

  _payload = STDIN.read(compressed_size)

  pp(
    version_needed_to_extract: version_needed_to_extract,
    general_purpose_bit_flag: general_purpose_bit_flag,
    compression_method: compression_method,
    last_mod_file_time: last_mod_file_time,
    last_mod_file_date: last_mod_file_date,
    crc32: crc32,
    compressed_size: compressed_size,
    uncompressed_size: uncompressed_size,
    file_name: file_name,
    extra_fields: extra_fields
  )
end

@weshatheleopard
Copy link
Contributor Author

Allegedly this has started happening again, see the report

@HosokawaR
Copy link

HosokawaR commented Apr 8, 2024

Thanks for fowading the issue.
I have opened the above report.

Reproduction

This is the reproduction.
#open_buffer modifled buffer when it's called with block.

$ echo "This is a test file." > test.txt

$ zip test.zip test.txt
updating: test.txt (stored 0%)

$ gem list | grep rubyzip
rubyzip (2.3.2)

$ irb
irb(main):001> require 'zip'
irb(main):002> buffer = File.read('./test.zip')
irb(main):003> buffer.size
=> 187
irb(main):004> Zip::File.open_buffer(buffer) { |b| nil } # do nothing
irb(main):005> buffer.size
=> 208  # changed

Cause

I think the cause is here.

zf.write_buffer(io)

Looking back at history, it appears that in the initilal implementation #write_buffer does not write to the received io.
However, this commit was changing it to write to the received io to fix #119.

Solution suggestion

If the above is indeed a problem, how about the following solutions? I would like to send a PR if necessary.

  • Avoid modifying the received buffer
    I hope it doesn't change from the name open_buffer, but I'm not familiar with Ruby culture so I don't know...
  • Make it clear in the documentation that the received buffer can be modified

@hainesr
Copy link
Member

hainesr commented Apr 8, 2024

Oh goodness me I fixed this over 2 years ago in trunk (14b63f6)... I will backport the fix to the 2.x branch and try and release 2.4 ASAP.

Thanks for the heads-up that this is still an issue. And sorry it's still causing problems.

(I will reopen this issue for now to track.)

@hainesr hainesr reopened this Apr 8, 2024
@hainesr
Copy link
Member

hainesr commented Apr 8, 2024

I've pushed 2.4.rc1 to rubygems with this fix included.

Please test it if you can.

@hainesr hainesr closed this as completed Apr 8, 2024
@HosokawaR
Copy link

HosokawaR commented Apr 9, 2024

Thank you for replying and quick fix !!

Looks good in my tests.

$ echo "This is a test file." > test.txt

$ zip test.zip test.txt
updating: test.txt (stored 0%)

$ gem list | grep rubyzip
rubyzip (2.4.rc1, 2.3.2)

$ irb
irb(main):001> require 'zip'
irb(main):002> buffer = File.read('./test.zip')
irb(main):003> buffer.size
=> 187
irb(main):004> Zip::File.open_buffer(buffer) { |b| nil } # do nothing
irb(main):005> buffer.size
=> 187  # same size !
irb(main):006> buffer2 = File.read('./test.zip')
irb(main):007> buffer.bytes == buffer2.bytes
=> true  # same !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants