Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChaCha20 API does not allow updating the nonce #10193

Open
jlaine opened this issue Jan 16, 2024 · 7 comments
Open

ChaCha20 API does not allow updating the nonce #10193

jlaine opened this issue Jan 16, 2024 · 7 comments

Comments

@jlaine
Copy link
Contributor

jlaine commented Jan 16, 2024

Hello cryptographistas!

Currently aioquic's QUIC header protection and encryption features use C code linked against OpenSSL. I am considering porting this to pure Python using only cryptography. On the whole things work, and a work in progress is here:

aiortc/aioquic#457

One thing I noticed is that the ChaCha20 class takes the nonce in its constructor. Unfortunately QUIC uses a nonce which is derived from each packet's header. This means that when using the CHACHA20_POLY1305_SHA256 cipher suite, for every packet we need to tear down / recreate the ChaCha20 instance which is not cheap.

Would you consider an API which allows updating the nonce? This can be implemented by calling:

EVP_CipherInit_ex(ctx, NULL, NULL, NULL, nonce, operation)

@reaperhulk
Copy link
Member

reaperhulk commented Jan 17, 2024 via email

@alex
Copy link
Member

alex commented Jan 17, 2024 via email

@jlaine
Copy link
Contributor Author

jlaine commented Jan 26, 2024

I tried putting together a very crude benchmark to get feel for how much it cost to create a new Cipher instance vs how much time actual encryption takes. Note this benchmark is somewhat artificial since it does not address the initial problem which is "how to cheaply update the nonce".

import time

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms

ITERATIONS = 1000000
KEY = bytes(32)
NONCE = bytes(16)
DATA = bytes(64)

start = time.time()
for i in range(ITERATIONS):
    r = Cipher(
        algorithm=algorithms.ChaCha20(KEY, NONCE),
        mode=None,
    ).encryptor().update(DATA)
elapsed = time.time() - start
print(f"{ITERATIONS} iterations with teardown in {elapsed:.1f} s")

start = time.time()
encryptor = Cipher(
    algorithm=algorithms.ChaCha20(KEY, NONCE),
    mode=None,
).encryptor()
for i in range(ITERATIONS):
    encryptor.update(DATA)
elapsed = time.time() - start
print(f"{ITERATIONS} iterations without teardown in {elapsed:.1f} s")

I then ran these tests against both main and @alex's crypto-rust branch.

main - OpenSSL 3.0

1000000 iterations with teardown in 11.1 s
1000000 iterations without teardown in 1.5 s

crypto-rust - OpenSSL 3.0

1000000 iterations with teardown in 7.5 s
1000000 iterations without teardown in 1.5 s

AFAICT it tells us the crypto-rust does bring down the setup cost, but the cost is definitely still significant. I also tried against OpenSSL 3.2 but saw no meaningful changes.

@reaperhulk
Copy link
Member

In your test example you're using the Cipher/ChaCha20 construction but in your draft PR you're using ChaCha20Poly1305 so I'm a little confused 😄

In cryptography 42.0.x we shifted ChaCha20Poly1305 to rust. As you pointed out, there's a separate branch moving Cipher to rust, but that's not shipped yet.

If we use this test script

import random
import time

from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305

ITERATIONS = 1000000
KEY = bytes(32)
DATA = bytes(64)

start = time.time()
c = ChaCha20Poly1305(KEY)
for i in range(ITERATIONS):
    nonce = random.randbytes(12) # non-CSPRNG but just used to ensure there's always a new nonce
    c.encrypt(nonce, DATA, None)
elapsed = time.time() - start
print(f"{ITERATIONS} iterations with teardown in {elapsed:.1f} s")

And some performance:

Version Time
42.0.1 2.2s
ChaCha20Reuseable 2.1s
#10266 1.6s

So with the latest changes we are now >20% faster than the old reuseable context hack (which is broken for 42.0.0+ anyway).

Is there an easy way for me to do aioquic perf testing?

@reaperhulk
Copy link
Member

reaperhulk commented Jan 26, 2024

Looking a bit more closely I realize you're also using the traditional Cipher API later in the file. Well, I made ChaCha20Poly1305 much faster at least 😄. We would definitely need to come up with a new API for allowing nonce update on an instantiated CipherContext.

@jlaine
Copy link
Contributor Author

jlaine commented Jan 27, 2024

Looking a bit more closely I realize you're also using the traditional Cipher API later in the file. Well, I made ChaCha20Poly1305 much faster at least 😄. We would definitely need to come up with a new API for allowing nonce update on an instantiated CipherContext.

QUIC indeed uses both AEAD for payload encryption (AES-GCM or ChaCha20Poly1305) and AES-ECB or ChaCha20 to mask the packet headers. I currently have a huge performance gap (about 10x) between my C implementation and the Python one, but its not all due to encryption, I suspect my bytearray manipulations are sub-optimal. I'd really like to get rid of the C code, both because of the inherent risks of a foot-gun and because it forces me to vendor another copy of OpenSSL,

I know there are performance gains coming thanks to Alex's PRs, but what struck me was the assumption in the Cryptography API that a nonce is set only once for ChaCha20, which does not line up with how QUIC uses it. Hence my ticket :)

@reaperhulk
Copy link
Member

Some draft work on this in #10437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants