New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new APIs that allow copying zip file entries between zip files #98
Conversation
Just pinging this pull request. Please tell me any changes required to accept this feature. |
Any idea when this will be merged? Currently depending on this branch, no issues so far, great work! :) |
@robmv unfortunately this branch needs a bunch of conflicts to be resolved. Would you be able to do that? I can then give this a review |
Quickly skimming through, it doesn't look like this method validates the original zip file, which could result in a malformed archive. Ideally, the library shouldn't be able to do that by default. I don't think using CRC should impose much extra overhead. If it causes a regression, it'd be worth making a new issue to track it |
I will try to update it in the following days and see about the @Plecra suggestion about the CRC. The CRC will need the file to be decompressed If I am correct and that will add some overhead, It is probably negligible because the compression is the more CPU intensive operation and that will not be needed. But if this gives me a little overhead I will try to add an option to do the CRC or not. In my use case I really don't need the CRC check because the files are digitally signed and the signature stored on an external file, and it is checked when then ZIP file is retrieved so the CRC check is redundant and I need speed on removing things from the file, but I understand some people will want the CRC to be validated. |
Ah, I had been under the impression that the CRC was performed on the compressed data. I can completely understand why you wouldn't care for that, I'd just like to make the difference clear to users. Is there a convention like |
Rebased. The only mayor difference with the previous patch is that the implementation of I still doesn't do decompression and CRC checks when copying files, if anyone has suggestions for naming an API that does that and one that doesn't, I would gladly update it. |
@rylev Still interested on doing the review? |
@Plecra can you review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @robmv, just one little nitpick
In time, I'd also like to expand on the documentation of this feature. A doctest would be great. I'd hope we could avoid using two methods here too, but I'm not sure there's a good solution with the current API |
@Plecra: done!, doctest added to the raw_copy* methods. |
@rylev I'm happy with this. Could you give the changes a once over? |
Does this new API work only for copying from an existing |
@eulerdisk true, only from an existing file. Your use case sounds interesting, but maybe an API for that will be more complex. what if the files are larger? leave the compressed version on RAM or dump to a temporary file? With this proposed PR API you could create a temporary ZIP archive (RAM or file backed) and get the files to reuse from there. |
Yep, now I can do that thanks to your PR :-)
My idea is that you should be able to create a |
@eulerdisk Interesting idea! Compression is pretty fast, but there are bound to be use cases for caching its output. We couldn't create a simple wrapper around That said, I think it's beyond the scope of this PR and would prefer some design discussion before any code changes are proposed. |
Sorry about this @robmv but we had to change the API of the crypto methods, and this PR now has merge conflicts. I'll probably figure out how to resolve them in time, so I'm just letting you know it could take a while |
The copy is done directly using the raw compressed data, avoiding decompression and recompression.
Updated, hopefully this one is the winner! |
Lovely! I'm sorry to say I'm not sure when I'll next be able to complete the review, but this PR's a top priority 😉 |
Ace! Finally time to get this published. Thanks for your contribution @robmv |
refactor: Make `ZipWriter::finish()` consume the `ZipWriter`
The copy is done directly using the raw compressed data, avoiding decompression and recompression.
In summary, the changes:
copy_file
andcopy_file_rename
toZipWriter
that use that raw stream to copy the file.I get nearly 95% faster Zip file modifications when I copy nearly all files, and just ignore the files I am removing from the Zip.
closes #95