Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to limit output size for zip unarchiver #115

Closed
wants to merge 3 commits into from

Conversation

satamas
Copy link
Contributor

@satamas satamas commented Jul 15, 2019

This is needed as a protection against zip bombs.
(Recently found bomb example).

@@ -35,6 +27,11 @@
import org.codehaus.plexus.util.IOUtil;
import org.codehaus.plexus.util.StringUtils;

import java.io.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually not use stars import (please avoid changing the order of import)

@plamentotev
Copy link
Member

Hi,

Thanks for contributing. The Zip bomb method you shared is quite clever and interesting. I just wonder if Apache Commons Compress is more suitable place to address this issue? It have knowledge about the Zip file internals so it is probably better equipped to deal with zip bombs (and other archive bombs for that matter). And solving it there will protect wider audience (plexus archiver including).

@satamas
Copy link
Contributor Author

satamas commented Jul 15, 2019

Protection agains concrete way of creating zip bomb is task for Apache Commons Compress for sure, but the only way of protection agains all kind of such bombs is to limit produced output size.
Same idea was mentioned by author of article about new zip bomb

Detecting the specific class of zip bomb we have developed in this article is easy: just look for overlapping files. Mark Adler has written a patch for Info-ZIP UnZip that does just that. In general, though, rejecting overlapping files does not protect against all classes of zip bomb. It is difficult to predict in advance whether a zip file is a bomb or not, unless you have precise knowledge of the internals of the parsers that will be used to parse it. Peeking into the headers and summing the "uncompressed size" fields of all files does not work, in general, because the value stored in the headers may not agree with the actual uncompressed size. (See the "permits too-short file size" row in the compatibility table.) Robust protection against zip bombs involves placing time, memory, and disk space limits on the zip parser while it operates. Handle zip parsing, as any complex operation on untrusted data, with caution.

And zip stream from Apache Commons implements InputStreamStatistics for the same purposes. I didn't used it in order to make method re-usable by other unarchivers in future.

@@ -44,6 +43,9 @@

private String encoding = "UTF8";

@Nullable
private Long maxOutputSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use long with -1 meaning unlimited in order to avoid NPEs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes sense to have it for other archives (such as tar)?

Copy link
Member

@plamentotev plamentotev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good. I have a couple of comments. But I think it would really help if you do the refactoring (using try-with-resources, inlining extractFileIfIncluded, etc) in separate pull request. I'll merge it and you can rebase your changes on top of it. This would make this pull request a lot easier to read.

Also there are some places where spaces are missing around braces. Not a biggie I can fix them myself before merging.

org.apache.commons.compress.archivers.zip.ZipFile zf = null;
InputStream in = null;
try
try(ZipFile zipFile = new ZipFile( getSourceFile(), encoding, true ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please keep the diff minimal. I think this change makes the diff difficult to read. Using try-with-resource is better, but it would be better if this refactoring is made in separate merge request in order to make the commit easier to understand.

final Enumeration e = zf.getEntriesInPhysicalOrder();
getLogger().debug( "Expanding: " + getSourceFile() + " into " + getDestDirectory() );
Long remainingSpace = maxOutputSize;
final Enumeration e = zipFile.getEntriesInPhysicalOrder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here. Initially I thought the order in which the entries are iterated is changed :) Refactoring unrelated to the change(like renaming variables, changing the lines order, etc) is best left in separate merge request.


in.close();
in = null;
try (InputStream in = zipFile.getInputStream(ze)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

protected void extractFile( final File srcF, final File dir, final InputStream compressedInputStream,
String entryName, final Date entryDate, final boolean isDirectory,
final Integer mode, String symlinkDestination, final FileMapper[] fileMappers )
protected File extractFile(final File srcF, final File dir, final InputStream compressedInputStream,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the coding style. There are spaces around the braces.

Please correct me if I'm wrong, but I think that would break existing code (if somebody have overridden the protected method). Would it make sense to keep this method void. If you don't have limit set then you don't care about the file anyway.

}

protected File extractFile(final File srcF, final File dir, final InputStream compressedInputStream,
Long remainingSpace, String entryName, final Date entryDate, final boolean isDirectory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sizeLimit or something among those lines makes more sense as this is what this variable is used for. It is the remaining size for the client of the method. For the method it is the size limit it should impose to the output stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would it make sense to have different name for this method (as the other extractFile does not return File) to avoid confusion. I'm not saying it is. I'm just wondering.

@@ -44,6 +43,9 @@

private String encoding = "UTF8";

@Nullable
private Long maxOutputSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes sense to have it for other archives (such as tar)?

finally
}

private String resolveSymlink( ZipFile zf, ZipArchiveEntry ze )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is resolve link changed?

@@ -345,7 +354,7 @@ protected void extractFile( final File srcF, final File dir, final InputStream c
{
if ( !isOverwrite() && f.exists() && ( f.lastModified() >= entryDate.getTime() ) )
{
return;
return f;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is not really extracted. Then should the file be returned?

resolveSymlink(zipFile, ze), getFileMappers());
if (file != null)
{
remainingSpace -= file.length();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is file.length() reliable way to get the file size. What I mean is (quoting the File class API docs):

Where it is required to distinguish an I/O exception from the case that 0L is returned, or where several attributes of the same file are required at the same time, then the java.nio.file.Files#readAttributes method may be used.

@plamentotev
Copy link
Member

I've asked because Apache Commons Compress already uses BoundedInputStream to limit the ZIP entry size. But the limit is set to the Zip entry size. I just wonder if makes sense to be able to set the global limit there. That way not only Plexus Archiver users will benefit from your change. Anyway doing it here also makes sense.

@plamentotev plamentotev added this to the 4.1.1 milestone Jul 27, 2019
@plamentotev
Copy link
Member

Hi @satamas, I've merged your changes for the try-with-resource into master. Just wanted to ping you to check if there is anything else I can do to help with moving this merge request forward.

@satamas
Copy link
Contributor Author

satamas commented Aug 12, 2019

I'm on long vacation now. I've asked @serejke to take care of this PR

@serejke
Copy link

serejke commented Aug 13, 2019

I've reworked this pull-request quite completely, so to simplify reviewing I opened a new pull request #117.
This pull request will be dismissed. Please refer to #117.

@plamentotev
Copy link
Member

Closing because it was implemented in #117

@plamentotev plamentotev removed this from the 4.2.0 milestone Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants