Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add length method to InputStream #277

Conversation

zacstewart
Copy link
Contributor

To make this thing act more realistically like an IO, it needs a length method.

One useful example is being able to iterate through a zip of images and upload them to an HTTP server using UploadIO. In order to multi-part such a POST body, the length of the IO must be known.

Fixes #276

To make this thing act more realistically like an IO, it needs a length
method.
@mnaberez
Copy link
Contributor

The omission of the length was likely intentional because there is not a way to know the size of the decompressed data without decompressing it. The size reported by a zip entry should be considered advisory only and can't be relied on in this way. The entry is easily manipulated and this is how zip bombs work; the entry reports a small size but the data decompresses much larger.

If you are creating the ZIP files yourself and know that the entry size is reliable, using the entry size in this way might be acceptable. However, I don't support this as a general addition to RubyZip because it can't be used reliably on ZIP files from other sources.

@zacstewart
Copy link
Contributor Author

Are there any risks that outweigh the benefits? From my perspective, a zip with inaccurately reported entry sizes is just invalid, and with invalid data you can expect invalid behavior.

Unless there are security implications, it seems useful to me to be able to treat the stream like an IO, even if it requires some implicit trust in the archive being truthful about the decompressed size of its entries

@mnaberez
Copy link
Contributor

Are there any risks that outweigh the benefits? From my perspective, a zip with inaccurately reported entry sizes is just invalid, and with invalid data you can expect invalid behavior.

RubyZip is sometimes used in web applications, and the ZIP files being decompressed may come from user uploads. Zip bombs are well known and commonly used as a denial of service attack. The ZIP bomb works by exploiting the possible inconsistency between the size reported in the entry and the actual size the data will decompress to. A very tiny ZIP file may expand to many gigabytes. If the consumer of the decompressed data is holding it in RAM, it may consume all available RAM and swap. If the consumer of the decompressed data is storing it on disk, it may consume all space on the disk. Web developers can't ignore these kinds of ZIP files as simply "invalid" or else they will be vulnerable to these attacks.

@hainesr
Copy link
Member

hainesr commented Feb 26, 2016

Playing Devil's Advocate a bit here, but is it the responsibility of RubyZip to save badly designed Web applications from themselves? Or, to put it another way: is it reasonable to omit a useful, sensible and (maybe) expected feature just because people may abuse it?

I suppose a third way of looking at it might also be: There's no length attribute now, so how would a Web application make decisions on how to unpack things as they stand? At least if there was an advertised length then one could stop reading when one hit that length.

@mnaberez
Copy link
Contributor

I suppose a third way of looking at it might also be: There's no length attribute now, so how would a Web application make decisions on how to unpack things as they stand?

The expected size of the decompressed data is already available from ZipEntry#size. The application can call entry.size to get the expected size before calling entry.get_input_stream to get the stream.

This pull request is about adding a length method to the stream itself, and having it return entry.size. I think the reason length was not added to the stream originally is because the real size of the decompressed data can't be known without decompressing. I don't support adding length for that reason and because I think it makes it easier for users to implement a bad practice. The size reported by the entry should not be trusted blindly.

@simonoff
Copy link
Member

simonoff commented Nov 8, 2017

I don't agree a reasons for it. If you need such method you can wrap it in your code.

@simonoff simonoff closed this Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants