New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decryption of Large Files/Streams - Update documentation and Guard Test case #1462
Decryption of Large Files/Streams - Update documentation and Guard Test case #1462
Conversation
Helpful advice for processing very large streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for proposing the changes and adding the test. We were looking into the best way to benchmark using large data, and using a generator does seem to work well!
I left some comments, since the code needs some changes to reduce variability of benchmark results across runs.
I would also add a second version of the same test, but setting allowUnauthenticatedStream = false
(like the version you committed before). For that reason, I left a comment on the "outdated" code too.
About documentation, I'd rather not include "how-to" information in the source code, as it will still miss a lot of context. Instead, we are planning to add a Wiki page about streaming with all the relevant info. This is also because, for instance, config.allowUnauthenticatedStream
will become less relevant once the new OpenPGP spec is released (#1442), thanks to better support for streaming at the protocol level.
test/benchmarks/memory_usage.js
Outdated
@@ -213,6 +213,38 @@ class MemoryBenchamrkSuite { | |||
await openpgp.decrypt({ message: encryptedMessage, passwords, config }); | |||
}); | |||
|
|||
suite.add('openpgp.encrypt/decrypt (PGP, text @ 100MB, with streaming)', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally, and as implemented, this test has too large variability in memory usage across runs, due to the key generation part and the compression.
This is why in the other tests we encrypt using passwords, and disable compression altogether.
Could you do the same here? It should suffice to take the code from the MDC, text, with streaming
test, and change the input to createMessage, to pass the stream as you are already doing.
test/benchmarks/memory_usage.js
Outdated
@@ -213,6 +213,42 @@ class MemoryBenchamrkSuite { | |||
await openpgp.decrypt({ message: encryptedMessage, passwords, config }); | |||
}); | |||
|
|||
suite.add('openpgp.encrypt/decrypt (PGP, text @ 6000MB, with streaming)', async () => { | |||
await stream.loadStreamsPonyfill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For benchmark stability and consistency with the other test, we need to use password encryption and disable compression here too (see other comment).
Also, I don't think we need to decrypt 6GB of data for spotting a regression in this case, 100MB will suffice, and allows comparing the benchmark with the test where allowUnauthenticatedStream
is off (aside from that, we do not want to overload the CI too much).
Also add comparison large text data tests for AEAD and authenticated CFB stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @justin-lovell . I've taken over since it's been some time.
In addition to adding tests over large files, I uniformed data generation across all streaming tests to be able to compare the results. I also added output stream consumption, otherwise the decryption operation isn't actually triggered in all cases (this was already an issue with the existing AEAD streaming tests as well).
Introduction
We are working on larger files which result in the process crashing with the following stack trace, and reproduced with the following commit: justin-lovell@68c7c29
This has been a long standing issue as per the following issues:
Solution
The reason why memory allocations grow is due to the eager loading of the stream to perform integrity checks on the checksum. The key lines are the following:
openpgpjs/src/packet/sym_encrypted_integrity_protected_data.js
Lines 137 to 139 in 6da1c53
There is an
option
which may be set forallowUnauthenticatedStream
-- this is largely undocumented and not mentioned. This PR would advise on the option and provide an assertion that the client should perform a double stream pass to ensure the integrity check had the opportunity to verify the packet.Additional Test Case
An additional test case was added to guard against an accidental regression for processing large files.