Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory when decrypting large files without MDC #1449

Open
yegorpetrov opened this issue Dec 6, 2021 · 10 comments
Open

Out of memory when decrypting large files without MDC #1449

yegorpetrov opened this issue Dec 6, 2021 · 10 comments

Comments

@yegorpetrov
Copy link

yegorpetrov commented Dec 6, 2021

  • OpenPGP.js version: 5.0.1
  • Affected platform (Browser or Node.js version): both? Node.js v14.16.0, did not try browsers

OpenPGP.js is crashing with OOM issues when I feed it files larger than about 100 MB. Granted, it's working in a resource-constained environment, but I wouldn't expect memory consumption to be proportional to the input file size in general.

So, I found this line that tries to swallow the entire input stream into memory before passing it to the decryption stage: await stream.readToEnd(stream.clone(this.encrypted)). Naturally, it fails for larger files. And for smaller ones you get all decrypted data inside one large chunk. It does not care about things like allowUnauthenticatedStream: true and allowUnauthenticatedMessages: true (not sure which one is applicable).

A limited-use workaround is to encrypt files with MDC enabled, so that they're processed by a different class. In other words, when I encrypt files using gpg --force-mdc ... they don't cause any similar issues when decrypted with this package.

I played around with the source code a little and came up with the following to replace that block of code:

let decrypted;
if (stream.isArrayStream(this.encrypted)) {
  // TODO remove this special case when ArrayStream.slice() is implemented in @openpgp/web-stream-tools
  const encrypted = await stream.readToEnd(stream.clone(this.encrypted));
  decrypted = await crypto.mode.cfb.decrypt(sessionKeyAlgorithm, key,
    encrypted.subarray(blockSize + 2),
    encrypted.subarray(2, blockSize + 2)
  );
} else {
  decrypted = await crypto.mode.cfb.decrypt(sessionKeyAlgorithm, key,
    stream.slice(stream.clone(this.encrypted), blockSize + 2),
    await stream.readToEnd(stream.slice(stream.clone(this.encrypted), 2, blockSize + 2))
  );
}

This way, it seems to deliver properly-chunked decrypted output without choking on large files. Does this look like a proper solution?

@larabr
Copy link
Collaborator

larabr commented Dec 6, 2021

I am not following what kind of files you are trying to decrypt.
If it's a file without the MDC, then the allowUnauthenticatedMessages does matter, otherwise you are not allowed to decrypt at all.
Could you share one encrypted file you are dealing with (of smaller size it's fine too)? and the code you are decrypting with?

@yegorpetrov
Copy link
Author

yegorpetrov commented Dec 6, 2021

I'm trying to decrypt large files provided by a client and this task needs to be automated on the daily basis. Those files contain PII, so I can't share them (even without decryption keys; sorry). The only special thing about them is that they're encrypted without MDC (tag=9), which means they're picked up by the SymmetricallyEncryptedDataPacket class of this package for decryption, which has a problem that I described above.

I'm not saying allowUnauthenticatedMessages does not matter per se. It just does not (cannot) alleviate this problem. Actually, wouldn't readToEnd in that method create OOM/performance issues regardless of any configuration? It seems to read the entire input stream into a memory buffer unconditionally, which is something I wouldn't expect this package to do.

Anyway, here's a repro including test keys and files: openpgpjs_1449_repro.zip. Test files with and without MDC were generated using the ancient gpg 1.4.23, because recent versions seem to enforce MDC.

index.js for your convenience
const assert = require('assert');
const { createReadStream } = require('fs');
const { readFile } = require('fs').promises;
const { decrypt, readPrivateKey, readMessage } = require('openpgp');

async function decryptTestFile(filename) {
    const decryptionKeys = await readPrivateKey({
        armoredKey: await readFile('./test_private.pgp', { encoding: 'utf-8' })
    });
    // the underlying txt file size is 10000002 bytes
    const binaryMessage = createReadStream(filename);
    const { data } = await decrypt({
        message: await readMessage({ binaryMessage }),
        decryptionKeys,
        config: {
            allowUnauthenticatedMessages: true,
            allowUnauthenticatedStream: true
        }
    })

    const bytesReadInitially = binaryMessage.bytesRead;
    
    let chunksCount = 0;
    let decryptedLength = 0;
    for await (const chunk of data) {
        decryptedLength += chunk.length;
        chunksCount++;
    }

    const bytesReadInTheEnd = binaryMessage.bytesRead;

    return {
        bytesReadInitially,
        bytesReadInTheEnd,
        decryptedLength,
        chunksCount
    };
}

(async function() {
    try {
        const mdcFileName = './large_file_with_mdc.txt.gpg';
        const noMdcFileName = './large_file_without_mdc.txt.gpg';

        const [mdc, noMdc] = await Promise.all([mdcFileName, noMdcFileName].map(decryptTestFile));

        // MDC: file has been decrypted in numerous
        // chunks with little memory overhead.
        // Good.
        assert(mdc.bytesReadInitially < mdc.bytesReadInTheEnd / 10);
        assert(mdc.decryptedLength === 10000002);
        assert(mdc.chunksCount > 10);

        // no MDC: file has been loaded into memory,
        // then decrypted as a single chunk.
        // Bad.
        assert(noMdc.bytesReadInitially === noMdc.bytesReadInTheEnd);
        assert(noMdc.decryptedLength === 10000002);
        assert(noMdc.chunksCount === 1);

        console.log(mdc.chunksCount); // prints 1259 on my machine
        console.log(noMdc.chunksCount === 1); // prints true

        process.exit(0);

    } catch (e) {
        console.error(e);
    }
})();

@yegorpetrov yegorpetrov changed the title Out of memory when decrypting large files Out of memory when decrypting large files without MDC Dec 6, 2021
@twiss
Copy link
Member

twiss commented Dec 7, 2021

Hey 👋 Thanks for the report. First of all, regardless of this issue, I would strongly suggest using --force-mdc, or using a newer version of GPG, as non-MDC-protected messages are insecure.

That being said, I'm not opposed to adding streaming decryption for non-MDC-protected messages, as indeed in principle the streaming should apply to everything. Your patch is a good start; to make it slightly more compact you could take a page from sym_encrypted_integrity_protected_data.js and do something like:

    let encrypted = stream.clone(this.encrypted);
    if (stream.isArrayStream(encrypted)) encrypted = await stream.readToEnd(encrypted);
    const iv = await stream.readToEnd(stream.slice(stream.clone(encrypted), 2, blockSize + 2));
    const ciphertext = stream.slice(encrypted, blockSize + 2);
    const decrypted = await crypto.mode.cfb.decrypt(sessionKeyAlgorithm, key, ciphertext, iv);

rather than having two entirely separate cases.

A PR would be welcome 👍

@yegorpetrov
Copy link
Author

A PR would be welcome 👍

Can do. But let's agree on testing first.

Seeing this issue undetected by tests makes me think that a new kind of test is required to make sure there are no accidental "readToEnd-s" like this very case. I even tried to botch decrypt() in SymEncryptedIntegrityProtectedDataPacket to make it readToEnd entire input unconditionally just like SymmetricallyEncryptedDataPacket, yet all tests remained green 🤷

Now, looking at streaming.js, I see tests operating on the higher level API, i.e. openpgp.encrypt and openpgp.decrypt. The problem is that there's no configuration property to choose non-MDC-protected encryption (if there was one, it would probably be used to select SymmetricallyEncryptedDataPacket in message.js:encrypt()). And if I make a high level test to detect accidental readToEnd calls alongside other tests in streaming.js, it won't be able to cover the original issue.

I have doubts that a new configuration property should be defined just for the sake of testing. But may be that's ok to add one (like, disableIntegrityProtectedEncryption or something)? Or may be I'm taking this issue too far and a lower level test against SymmetricallyEncryptedDataPacket would suffice?

@twiss
Copy link
Member

twiss commented Dec 22, 2021

Hey 👋 Sorry, I forgot to respond to this.

Seeing this issue undetected by tests makes me think that a new kind of test is required to make sure there are no accidental "readToEnd-s" like this very case. I even tried to botch decrypt() in SymEncryptedIntegrityProtectedDataPacket to make it readToEnd entire input unconditionally just like SymmetricallyEncryptedDataPacket, yet all tests remained green shrug

Just as a general remark, OpenPGP.js today does not really offer the guarantee that when streaming, the data is not buffered in memory. So in that sense, the readToEnd here was not really accidental but rather a lack of support for stream-decrypting SymmetricallyEncryptedDataPackets. Even though I agree it would be a good thing to guarantee streaming in all cases, we can't always do so. As a few other examples;

  • We don't currently support stream-decompressing bzip2 (see compressed_data.js). We probably should, but this would require switching to a different bzip2 library.
  • We don't stream unauthenticated data by default (config.allowUnauthenticatedStream), as you know.
  • Similarly, when decrypting AEAD-encrypted messages (from the draft of the next version of the spec, disabled by default) we only release one chunk at a time. The default chunk size (for us) is 256 KiB, but the maximum chunk size is 4 MiB, which may matter in resource-constrained environments (how resource-constrained is yours? There has been a lot of debate about this in the spec, so getting some real-world feedback may be valuable.)

That being said, there are some tests in streaming.js that aim to test that stream-decrypting SymEncryptedIntegrityProtectedDataPackets works correctly, but perhaps they fail to detect this; I'll look into that.

I have doubts that a new configuration property should be defined just for the sake of testing. But may be that's ok to add one (like, disableIntegrityProtectedEncryption or something)?

I would prefer not to, as we're trying to get rid of configuration options that are insecure.

Or may be I'm taking this issue too far and a lower level test against SymmetricallyEncryptedDataPacket would suffice?

Yeah, that could work 👍

@fi4sk0
Copy link

fi4sk0 commented Jan 10, 2022

Sorry to intrude, I'd just like to know if the problem described here is what I'm experiencing too: I'm trying to decrypt a file while downloading it on the fly:

const url = "some url to encrypted file"
const res = await fetch(url);
const message = await readMessage({
  binaryMessage: res.body
})

console.log("before decrypt")
const decrypted = await decrypt({
  message, 
  passwords: 'secret',
  format: "binary",
})
console.log("after decrypt")

// code to consume decrypted.data as ReadableStream

Unfortunately, I observe that "before decrypt" is logged practically right away and "after decrypt" only shortly after the whole file has been downloaded (as I can see in the Network tab of chrome). I expected the decrypt call to fulfill the returned promise as soon as there is enough data to construct a ReadableStream. From my limited knowledge I assume I'm facing the same problem that is described in this issue. Is this correct? If so, I'll keep tabs on the linked WIP.

@larabr
Copy link
Collaborator

larabr commented Jan 17, 2022

Hi @fi4sk0 , the problem described here is relevant to a legacy encryption option, so I am not sure whether it applies to your case. Could you check if passing config: { allowUnauthenticatedStream: true } to decrypt results in the behaviour you expect (see documentation)?

@fi4sk0
Copy link

fi4sk0 commented Jan 21, 2022

Hey @larabr,
thanks for coming back to me. You're right, { allowUnauthenticatedStream: true } does solve my problem!

@fi4sk0
Copy link

fi4sk0 commented Jan 23, 2022

Sorry to bother you again @larabr but I’d like to know the implications of that flag: the way I read the documentation I understood that although I won’t be notified of message modifications during the streaming, decode will throw after the stream has completed and was recognized as modified. Is this correct?

@larabr
Copy link
Collaborator

larabr commented Jan 24, 2022

@fi4sk0 correct, it will still throw when you finish reading out the decrypted data stream returned by decrypt. The flag simply allows the function to release decrypted chunks before any modification could be detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants