Possible regression from 1.2.0 to 1.2.1 when ZIP entry's filename contains backslash #324

Capncavedan · 2017-04-10T14:43:23Z

I experienced an issue with rubyzip last Fall where a customer-supplied ZIP had backslashes instead of forward-slashes for all the directory separators; I resolved it at the time by updating from rubyzip 0.9.9 to 1.2.0. It translated the backslashes to forward slashes e.g. directory separators on OS X and Linux. (0.9.9 behaved "correctly" on OS X but not on Linux).

The recent update to 1.2.1 has changed behavior on Linux such that the backslash is considered part of the filename again. OS X continues to exhibit what I would consider to be the desired behavior, treating backslashes as directory separators. See below.

➜  zipinfo file-with-backslashes-in-path.zip
Archive:  file-with-backslashes-in-path.zip
Zip file size: 200 bytes, number of entries: 1
-rw-r--r--  3.0 unx        6 tx stor 16-Nov-30 19:44 sample\sample_file.txt
1 file, 6 bytes uncompressed, 6 bytes compressed:  0.0%

# with rubyzip 1.2.1:
> src = ::Zip::File.new "file-with-backslashes-in-path.zip"
> src.entries.first.name
=> "sample\\sample_file.txt"

# with rubyzip 1.2.0:
> src = ::Zip::File.new "file-with-backslashes-in-path.zip"
> src.entries.first.name
=> "sample/sample_file.txt"

Is this change in behavior from 1.2.0 to 1.2.1 intentional? Would you consider it a bug that it behavior is different from Linux to OS X?

The text was updated successfully, but these errors were encountered:

KierranM · 2017-06-13T23:39:16Z

It would seem that creating a zip in Windows via Right Click > Send To > Compressed Folder, or using PowerShell always creates the folders with backslashes as the separator. We've had failures because of this when attempting to access the files using a forward slash and getting an File does not exist error.

rlaveycal · 2018-03-07T12:09:42Z

There seems to be a missing translation of \ to / in the read_c_dir_entry method of Entry.rb. This means that when you iterate over the zip entries they have the "wrong" name.

rubyzip/lib/zip/entry.rb

Line 242 in 2f80da6

@name.tr!('\\', '/')

Just needs adding after

rubyzip/lib/zip/entry.rb

Line 369 in 2f80da6

@name = io.read(@name_length)

simonista · 2018-03-15T21:23:49Z

This is hitting us as well, definitely a regression between 1.2.0 and 1.2.1.

You can see the commit that caused it here: #308

I'm not an expert in character encodings, but it seems like the issue it was trying to address may only be an issue in older ruby versions? I have both ruby 2.2 and 2.4 installed locally (on macOS), and in both cases '表'.encode('windows-31j') == '表'.encode('windows-31j').tr('\\', '/').

@dogatana can you comment at all?

dogatana · 2018-03-18T03:16:03Z

Hi, guys.

The following irb session log explains the issue caused by tr.

Zip file doesn't have any encoding information about the file names in it.
So the name "表1.txt" is extracted as a string "\x95\1.txt" encoded as ASCII-8BIT.
If we apply tr('\', '/') on this string, the second byte of '表' will be removed.
We cannot use it as a valid file name.

Rubyzip provides us the way to handle file names in zip file, right?
If original byte sequence of the name is maintained, we can avoid trobules.
In this case, original sequences are altered, so I made PR for it.

Regards,

irb
>> file = '表1.txt'
=> "表1.txt"
>> p file
"表1.txt"
=> "表1.txt"
>> file.tr('\\', '/')
=> "表1.txt"
>> file.force_encoding('ascii-8bit')
=> "\x95\\1.txt"
>> new_name = file.tr('\\', '/')
=> "\x95/1.txt"
>> new_name.encode('windows-31j')
Encoding::UndefinedConversionError: "\x95" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to
Windows-31J
        from (irb):17:in `encode'
        from (irb):17
        from C:/Ruby24/bin/irb.cmd:19:in `<main>'
>>

bbuchalter · 2021-01-26T12:08:37Z

@sofie-c and I were recently bit by this issue and were hoping to make an attempt at a fix. @dogatana, above you mentioned:

Rubyzip provides us the way to handle file names in zip file, right? If original byte sequence of the name is maintained, we can avoid troubles.

Can you clarify what you meant here? Are you suggesting we apply the .tr('\\', '/') change to the file name some where else, or restore the .tr('\\', '/'), but use it conditionally based on charactered encoding?

Either way, I think we need some failing tests to demonstrate both the issue #308 was meant to solve, alongside this issue. Perhaps that's where I'll start.

If anyone solution ideas, please share; we're new to this codebase.

bbuchalter · 2021-01-26T13:55:58Z

I modified the rubyzip test suite so that when it generates test zip files, it includes a file named p932_encoding_in_filename_表.txt. This causes many tests to fail.

The first failure I'm examining is against test/central_directory_test.rb:13. The purpose of this test is to assert that we get the expected entries in the zip. The test fails because we've told it to expect a file named cp932_encoding_in_filename_表.txt but the entry's name is cp932_encoding_in_filename_\xE8\xA1\xA8.txt.

表 is represented as \xE8\xA1\xA8 because:

irb(main):012:0> '表'.force_encoding('ascii-8bit')
=> "\xE8\xA1\xA8"
irb(main):006:0> '表'.bytes
=> [232, 161, 168]
irb(main):004:0> "\xE8\xA1\xA8".bytes
=> [232, 161, 168]

I'm trying to decide how to interpret this. Is this failure telling me the application code is wrong or if my test assertion is wrong?

hainesr · 2021-01-26T15:06:16Z

There are tests with Unicode filenames that work though, aren't there?

Do you have the Unicode option on when doing this?

bbuchalter · 2021-01-26T16:00:08Z

Thanks for the reply @hainesr! Are you talking about #340 - if so, no I don't. I'll give it a try.

hainesr · 2021-05-29T22:36:16Z

@bbuchalter any joy? One thing I have learned about character encoding over the years is that sometimes you just need to know what character encoding you have, and there's no way to guess/infer it.

hainesr · 2021-05-29T22:44:35Z

Maybe we could reinstate the .tr('\\', '/') after the encoding has been enforced?

rubyzip/lib/zip/entry.rb

Line 371 in 2f80da6

@name.force_encoding(::Zip.force_entry_names_encoding)

hainesr · 2021-06-01T20:58:38Z

Having played around with just about every combination of characters and encodings that I can think of, I think the solution to this to reinstate the .tr('\\', '/') after the encoding of the filename has been enforced. The various encodings of '表' will survive the tr intact if the encoding is correct.

If you have non-ascii characters in your filenames I think the solution is that you need to set ::Zip.force_entry_names_encoding appropriately. There is just no way of inferring what encoding a particular string of bytes is intended to be interpreted as.

@bbuchalter you are seeing '表'.force_encoding('ascii-8bit') => "\xE8\xA1\xA8" because your '表' is encoded in UTF-8! This is because irb and ruby source files are read as UTF-8 (I think). And force-encoding only changes the encoding, not the bytes, so you're seeing the raw bytes of the UTF-8 encoding of '表'. See this for an explanation.

Some working from irb:

> '表'.tr('\\', '/')
 => "表"

> '表\表'.tr('\\', '/')
 => "表/表"

> '表\表'.encode('windows-31j')
 => "\x{955C}\\\x{955C}"

> '表\表'.encode('windows-31j').tr('\\', '/')
 => "\x{955C}/\x{955C}"

> '表\表'.encode('windows-31j').tr('\\', '/').encode('utf-8')
 => "表/表"

But only do it after we have set filename encoding appropriately to avoid breaking multibyte characters with `\`s in them. Fixes rubyzip#324.

hainesr · 2021-06-02T07:55:35Z

Another thought on this: the zip spec says nothing about handling any other encodings other than IBM Code Page 437 and UTF-8. See APPENDIX D in the spec.

Obviously people are using other encodings out there in the wild, which is presumably why we have ::Zip.force_entry_names_encoding available.

But only do it after we have set filename encoding appropriately to avoid breaking multibyte characters with `\`s in them. Fixes #324.

But only do it after we have set filename encoding appropriately to avoid breaking multibyte characters with `\`s in them. Fixes rubyzip#324.

hainesr added the question label May 29, 2021

hainesr mentioned this issue Jun 1, 2021

Path traversal vulnerability #354

Closed

hainesr added the bug label Jun 1, 2021

hainesr added a commit to hainesr/rubyzip that referenced this issue Jun 2, 2021

Reinstate normalising pathname separators to /.

529634e

But only do it after we have set filename encoding appropriately to avoid breaking multibyte characters with `\`s in them. Fixes rubyzip#324.

hainesr mentioned this issue Jun 2, 2021

Reinstate normalising pathname separators to /. #487

Merged

hainesr closed this as completed in #487 Jun 4, 2021

hainesr added a commit that referenced this issue Jun 4, 2021

Reinstate normalising pathname separators to /.

22a5485

But only do it after we have set filename encoding appropriately to avoid breaking multibyte characters with `\`s in them. Fixes #324.

jspanjers pushed a commit to jspanjers/rubyzip that referenced this issue Jun 5, 2021

Reinstate normalising pathname separators to /.

9222242

But only do it after we have set filename encoding appropriately to avoid breaking multibyte characters with `\`s in them. Fixes rubyzip#324.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible regression from 1.2.0 to 1.2.1 when ZIP entry's filename contains backslash #324

Possible regression from 1.2.0 to 1.2.1 when ZIP entry's filename contains backslash #324

Capncavedan commented Apr 10, 2017 •

edited

KierranM commented Jun 13, 2017

rlaveycal commented Mar 7, 2018

simonista commented Mar 15, 2018 •

edited

dogatana commented Mar 18, 2018

bbuchalter commented Jan 26, 2021 •

edited

bbuchalter commented Jan 26, 2021

hainesr commented Jan 26, 2021

bbuchalter commented Jan 26, 2021

hainesr commented May 29, 2021 •

edited

hainesr commented May 29, 2021

hainesr commented Jun 1, 2021

hainesr commented Jun 2, 2021

Possible regression from 1.2.0 to 1.2.1 when ZIP entry's filename contains backslash #324

Possible regression from 1.2.0 to 1.2.1 when ZIP entry's filename contains backslash #324

Comments

Capncavedan commented Apr 10, 2017 • edited

KierranM commented Jun 13, 2017

rlaveycal commented Mar 7, 2018

simonista commented Mar 15, 2018 • edited

dogatana commented Mar 18, 2018

bbuchalter commented Jan 26, 2021 • edited

bbuchalter commented Jan 26, 2021

hainesr commented Jan 26, 2021

bbuchalter commented Jan 26, 2021

hainesr commented May 29, 2021 • edited

hainesr commented May 29, 2021

hainesr commented Jun 1, 2021

hainesr commented Jun 2, 2021

Capncavedan commented Apr 10, 2017 •

edited

simonista commented Mar 15, 2018 •

edited

bbuchalter commented Jan 26, 2021 •

edited

hainesr commented May 29, 2021 •

edited