Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docx files are corrupted when zipped using RubyZip #449

Open
tinabel opened this issue May 28, 2020 · 4 comments
Open

Docx files are corrupted when zipped using RubyZip #449

tinabel opened this issue May 28, 2020 · 4 comments

Comments

@tinabel
Copy link

tinabel commented May 28, 2020

Hi there -- I'm having an issue with docx file headers being corrupted whenever I zip them up using RubyZip. I've tried to use the write_buffer solution, but I'm also zipping files recursively and it's not quite working right. Will there be any solution for docx that doesn't involve using write_buffer?

@tinabel
Copy link
Author

tinabel commented May 28, 2020

I was able to mitigate this issue by using Nokogiri to parse the XML while zipping the files. Here is the gist for what I was able to do: https://gist.github.com/tinabel/ddd5cc9b0dd762986918520a132800d2

@hainesr
Copy link
Member

hainesr commented May 29, 2020

I'd love to have a crack at investigating this issue, but I know nothing about Word documents and I'm struggling to reproduce it.

What I've tried:

  1. In Windows: Create new test.docx file in Word with a few random words in it.
  2. In Linux (in IRB):
zipfile = Zip::File.open("/opt/windows/word.zip", true)
zipfile.add("zipped.docx", "/opt/windows/test.docx")
zipfile.close
  1. In Linux:
$ zipinfo /opt/windows/word.zip 
Archive:  /opt/windows/word.zip
Zip file size: 9683 bytes, number of entries: 1
-rw-r--r--  5.2 unx    12278 t- defN 20-May-29 18:38 zipped.docx
1 file, 12278 bytes uncompressed, 9563 bytes compressed:  22.1%
  1. In Windows: Unzipped archive using 7Zip
  2. In Linux (no differences):
$ diff /opt/windows/test.docx /opt/windows/zipped.docx
  1. In Windows: Loaded zipped.docx into Word.

This all works. I'm using ruby 2.4.6p354 (2019-04-01 revision 67394) [x86_64-linux] and RubyZip HEAD.

I'm sure I'm missing something important!

@hainesr
Copy link
Member

hainesr commented Jul 11, 2021

I have now tried downloading the gist linked above and running it on a directory of assorted .docx and .xlsx files - both with and without the Nokogiri step - and both times all the resulting files can be unzipped and then loaded by Word and Excel with no issues at all. The .docx files were complex and had background images, tables, foreground images, etc.

I'm using ruby 2.7.2 and have repeated this with RubyZip 2.3.x and 3.0 (HEAD).

If someone can send me a file that is corrupted when zipped with RubyZip I'd love to investigate this further, or is there another way I should be testing this? Otherwise I wonder if files are getting corrupted in another step before RubyZip gets hold of them?

I cannot reproduce this issue with the current information available, sorry.

@sandstrom
Copy link

sandstrom commented Mar 7, 2024

@hainesr I'd open the discussions tab on the repo, and move this issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants