Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hex-literal instead of blobby? #642

Open
tarcieri opened this issue May 26, 2021 · 3 comments
Open

hex-literal instead of blobby? #642

tarcieri opened this issue May 26, 2021 · 3 comments

Comments

@tarcieri
Copy link
Member

tarcieri commented May 26, 2021

In RustCrypto/stream-ciphers#244 and RustCrypto/stream-ciphers#245 we encountered the problem of the blobby test vectors being opaque. As serendipity would have it two people opened PRs because they thought test vectors were missing.

It appears they were, but it's presently difficult to ascertain that because all of the test vectors are in an opaque binary blob. I ended up having to write a small tool to dump the test vectors as hex so I could compare them to the ones in the RFC.

If the goal of blobby is to reduce the size of the test vectors in a compiled binary, it seems like hex-literal accomplishes that by encoding the test vectors as binary at compile time.

If the goal is to reduce the size of the crate, I really doubt it's accomplishing much over gzipped hex. But in the process, we also lose comments about the progeny of test vectors, which makes it difficult to answer questions like what ones we have included for each crate (see aforementioned PRs).

I would suggest that each of the crates in traits defines a test vector struct, and we populate an array of those using hex-literal.

Separately I think it'd be great to have a test-vectors crate feature for each of the algorithms that would allow such vectors to be used as part of an initialization self-test (which is a common thing for e.g. FIPS).

@newpavlov
Copy link
Member

I do think that we should improve blobby tooling and its visibility, but it's a bit hard for me to do since I already know how it works. So I would appreciate a fresh pair of eyes to look at this problem.

I am not sure why you have written your own tool. We already can inspect content of blb files using the conversion utility, so I wouldn't call them opaque, but I guess we should improve it by allowing it to accept column names (e.g. key, plaintext, ciphertext), which would be used in generated text files (i.e. we would get key=01AB...\nplaintext=01AB..\nciphertext=01AB...\n\nkey=...). Also probably having it in the examples folder is a bit confusing as well.

in the process, we also lose comments about the progeny of test vectors, which makes it difficult to answer questions like what ones we have included for each crate (see aforementioned PRs).

We don't have to dump all vectors into the same file, i.e. you can have a separate file per vectors origin with a comment in Rust files containing a link to it. IIRC in some crates we do exactly that. In hindsight we should've been more rigorous with recording test vector origins.

If the goal is to reduce the size of the crate, I really doubt it's accomplishing much over gzipped hex.

Crates are not always get gzipped. The most obvious and common examples are repository cloning and crate vendoring.

Also I don't think that having a ton of hex!("..") lines in test files would improve readability, especially if number of test vectors will be in hundreds.

@tarcieri
Copy link
Member Author

tarcieri commented May 27, 2021

The most obvious and common examples are repository cloning

...which always grabs gzipped git pack files 😉

But perhaps more importantly, the size of the test vectors uncompressed is still not particularly large. I think these vectors (aside from the blobby Wycheproof test vectors) are relevant examples:

https://github.com/RustCrypto/elliptic-curves/tree/master/p256/src/test_vectors

These files are on the order of 10-20kB. The Wycheproof blobby files weigh in a bit bigger at ~30kB, but we're mostly talking about files that are <100kB tops.

Also I don't think that having a ton of hex!("..") lines in test files would improve readability, especially if number of test vectors will be in hundreds.

The main part I really like about this is it makes it possible to directly compare the hex strings in the code versus ones in e.g. RFCs, and also annotate each of those hex strings with code comments about their provenance.

I think RustCrypto/stream-ciphers#245 is a good example of this. If we look at RFC8439 Section 2.4.2:

For a test vector, we will use the following inputs to the ChaCha20
block function:

o Key = 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f:10:11:12:13:
14:15:16:17:18:19:1a:1b:1c:1d:1e:1f.

o Nonce = (00:00:00:00:00:00:00:4a:00:00:00:00).

o Initial Counter = 1.
[...]
Plaintext Sunscreen:
000 4c 61 64 69 65 73 20 61 6e 64 20 47 65 6e 74 6c Ladies and Gentl
016 65 6d 65 6e 20 6f 66 20 74 68 65 20 63 6c 61 73 emen of the clas
032 73 20 6f 66 20 27 39 39 3a 20 49 66 20 49 20 63 s of '99: If I c
048 6f 75 6c 64 20 6f 66 66 65 72 20 79 6f 75 20 6f ould offer you o
064 6e 6c 79 20 6f 6e 65 20 74 69 70 20 66 6f 72 20 nly one tip for
080 74 68 65 20 66 75 74 75 72 65 2c 20 73 75 6e 73 the future, suns
096 63 72 65 65 6e 20 77 6f 75 6c 64 20 62 65 20 69 creen would be i
112 74 2e t.

This is easy to compare to the hex strings in the PR:

    //
    // ChaCha20 test vectors from:
    // <https://datatracker.ietf.org/doc/html/rfc8439#section-2.4.2>
    //

    const KEY: [u8; 32] = hex!(
        "
        000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
    "
    );

    const IV: [u8; 12] = hex!(
        "
        000000000000004a00000000
    "
    );

    const PLAINTEXT: [u8; 114] = hex!(
        "
        4c616469657320616e642047656e746c
        656d656e206f662074686520636c6173
        73206f66202739393a20496620492063
        6f756c64206f6666657220796f75206f
        6e6c79206f6e652074697020666f7220
        746865206675747572652c2073756e73
        637265656e20776f756c642062652069
        742e
    "
    );

See also these FIPS 186-4 test vectors for ECDSA expressed using hex!:

https://github.com/RustCrypto/elliptic-curves/blob/master/p256/src/test_vectors/ecdsa.rs

@tarcieri
Copy link
Member Author

tarcieri commented May 27, 2021

Regarding the inspectability of .blb files:

I am not sure why you have written your own tool. We already can inspect content of blb files using the conversion utility, so I wouldn't call them opaque, but I guess we should improve it by allowing it to accept column names (e.g. key, plaintext, ciphertext), which would be used in generated text files (i.e. we would get key=01AB...\nplaintext=01AB..\nciphertext=01AB...\n\nkey=...). Also probably having it in the examples folder is a bit confusing as well.

The nice part of using a text/hex format is there's no tool at all required, and it's easy to compare things via just looking at the original test vectors versus the hex! ones in whatever text viewer/editor or web UI that you prefer. Blobby requires a "transformation" of the original vectors into something that requires tools to inspect/compare and can't easily be reviewed in e.g. GitHub PRs.

You've mentioned blobby's convert.rs example before, but I guess I forgot. Nevertheless if you want to go this way, I think we need a real CLI for .blb files, ideally one you can install via cargo install. The discoverability of convert.rs is not good and it's implemented as an example rather than a [[bin]] crate.

If you really want to put all test vectors in a binary format, I say this reluctantly but maybe ASN.1 DER would be a better choice. We have a der crate (which has an order of magnitude more code than blobby, but it's not that much larger, 3 klocs vs ~300ish lines for blobby), but the real benefit is being able to get a rich inspection of the data using e.g. web based tools:

https://lapo.it/asn1js/#MIGTME8GCSqGSIb3DQEFDTBCMCEGCSsGAQQB2kcECzAUBAjmIR4jSK1p4AICQAACAQgCAQEwHQYJYIZIAWUDBAEqBBCb0KYlHyJU-f1ZY4h8J88BBEDMYrp3PA9JX6s2aOT8782wjnig7hXgoVAT9iq-CNqnQgZe6zZtbmyYzDsOfmm9yGHIiv648D26HixtmdBtFzYM

It would also make it easier to have a more flexible, evolvable binary format, with things like optional description strings, or retroactively adding additional fields in a way where the parser can still interpret the old format in a backwards compatible way without requiring an eager transformation of the .blb files to upgrade them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@tarcieri @newpavlov and others