Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for garbage after comment data #178

Merged
merged 3 commits into from Nov 15, 2020

Conversation

davide-romanini
Copy link
Contributor

Many zip in the wild could have some garbage after comment data, and normal zip tools silently ignore the garbage and preserve the good data.
This PR just removes the check that, in case of garbage, removes all the comment, preserving more zip data.

Copy link
Contributor

@rylev rylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Could you add a test for this?

@Plecra Plecra mentioned this pull request Aug 19, 2020
@Plecra
Copy link
Member

Plecra commented Aug 19, 2020

I'm not sure this is permitted by the ZIP spec. @davide-romanini Could you give us example(s!) of ZIP files like this?

@davide-romanini
Copy link
Contributor Author

@rylev I've added a test with a small sample of the issue.
@Plecra specs may or may be not allow for this. But reality is often different, especially for a format so old.
In my case I found that python zipfile generates garbage when opening a zip in mode a and setting a comment with a shorter length than the initial one:

>>> from zipfile import ZipFile
>>> with ZipFile('comment_garbage.zip', 'a') as z:
...     z.comment = b'long comment bla bla bla'
...
>>> with ZipFile('comment_garbage.zip', 'a') as z:
...     z.comment = b'short.'
...
>>>

This doesn't cause any problem for other zip reading libraries, so I think it should be handled even if not officially supported by the spec.

@rylev
Copy link
Contributor

rylev commented Aug 25, 2020

@Plecra This seems reasonable to me. Thoughts?

@Plecra
Copy link
Member

Plecra commented Aug 25, 2020

@davide-romanini Thanks for such a simple example 😀 This has made it very clear.

7zip lists it as a warning, but the file doesn't seem to break any of the common ZIP readers. So the question is how far we should go in supporting these files: The comment length is the only thing that bounds the time it takes to open the ZIP files (which is important for negative ID), so will we still only search within the first ~65536 bytes? I'd like to check InfoZIP's implementation for how they handle this, in case there's another oddity.

@Plecra Plecra merged commit f5061c2 into zip-rs:master Nov 15, 2020
@Plecra
Copy link
Member

Plecra commented Nov 15, 2020

No need for this to wait any longer. Thanks for your work @davide-romanini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants