Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fixed width serde impls for keys #406

Merged
merged 4 commits into from Jun 15, 2022

Conversation

tcharding
Copy link
Member

@tcharding tcharding commented Feb 16, 2022

Currently we serialize keys using the BytesVisitor, this causes the serialized data to contain additional metadata encoding the length (an extra 8 bytes) when serialized with bincode.. This extra data is unnecessary since we know in advance the length of these two types.

We do not control the data output by serialization of our types because it depends on which crate is used to do the serialization. This PR improves the situation for serialization using the bincode crate, and this PR introduces mentions of bincode in the rustdocs, is this acceptable? See below for a table that describes binary serialization by other crates.

Implement a sequence based visitor that encodes the keys as fixed width data for:

  • SecretKey
  • PublicKey
  • KeyPair
  • XOnlyPublicKey

Fixes: #295

Question: PR only does keys, do we want to do signatures as well?

src/serde_util.rs Outdated Show resolved Hide resolved
@tcharding tcharding force-pushed the fixed-width-serde branch 6 times, most recently from 6e3e256 to d8a0399 Compare February 21, 2022 14:15
@tcharding tcharding changed the title Use fixed width serde impls for secret/public key Use fixed width serde impls for key Feb 21, 2022
@tcharding tcharding changed the title Use fixed width serde impls for key Use fixed width serde impls for keys Feb 21, 2022
@tcharding tcharding force-pushed the fixed-width-serde branch 6 times, most recently from cd44138 to 4e8e095 Compare February 23, 2022 10:33
@tcharding tcharding marked this pull request as ready for review February 24, 2022 09:21
@apoelstra
Copy link
Member

I think we can just use a fixed-size array of size 74 rather than a Vec. AFAIK there is no object in this library which has a longer serialization.

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! I'd have to see some analysis on real impact of this change on existing formats to believe we should do this. I worry many formats will store type along with each item.

let _ = SharedSecret::new(&public_key, &secret_key);
let _ = ecdh::shared_secret_point(&public_key, &secret_key);

#[cfg(feature = "use-serde")] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style is inconsistent with the rest of the code and quite confusing.

Copy link
Member Author

@tcharding tcharding Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean inconsistent with the rust-secp256k1 crate? This is a separate crate and the features are only used, as far as I can tell, to run this crate and selectively run parts of the test (start function). scratch through because removal of Vec removes the need for this feature anyways.

FTR I'm not sure why this is a separate crate, I'm not super deep in the no-std stuff, @elichai do you remember off the top of your head why we had to put it in a separate crate and not in the tests directory (i.e., standard integration tests directory)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to simulate macro usage from rust-bitcoin, which has less visibility of symbols than anything within the crate (including, I think (?) stuff in the tests/ directory).

I believe this also predates the tests directory.

src/key.rs Outdated
impl<'de> ::serde::Deserialize<'de> for PublicKey {
fn deserialize<D: ::serde::Deserializer<'de>>(d: D) -> Result<PublicKey, D::Error> {
if d.is_human_readable() {
d.deserialize_str(super::serde_util::FromStrVisitor::new(
"an ASCII hex string representing a public key"
))
} else {
d.deserialize_bytes(super::serde_util::BytesVisitor::new(
d.deserialize_seq(super::serde_util::SeqVisitor::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deserialize_tuple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so implement de::Visitor for each fixed size type we have and use deserialize_tuple, interesting. I'll give it a shot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you want to keep it generic I suspect you'll have to use PhantomData<fn() -> T>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a macro for the serde impls, and serializing with tuple this PR is way cleaner now!

src/key.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated
@@ -39,6 +39,7 @@ global-context-less-secure = []
secp256k1-sys = { version = "0.4.2", default-features = false, path = "./secp256k1-sys" }
bitcoin_hashes = { version = "0.10", optional = true }
rand = { version = "0.6", default-features = false, optional = true }
# Our custom serde code requires an allocator, enable secp256k1 feature `std` or `alloc`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it allocate? I don't see any such instance.

Besides, TODO after MSRV bump: use crate renaming trick instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocates due to use of the Vec in visit_seq method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Looks like using array is feasible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, TODO after MSRV bump: use crate renaming trick instead.

Not sure what you mean here mate, is this comment still relevant with the removal of alloc/vec stuff?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not.

@tcharding
Copy link
Member Author

I think we can just use a fixed-size array of size 74 rather than a Vec. AFAIK there is no object in this library which has a longer serialization.

If we follow @Kixunil's suggestion above the Vec goes away also. Commenting to acknowledge seeing this comment (and not ignoring it :)

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generics would be more elegant than macro but without const generics it'd be a bit more boilerplate, so I think it's OK.

Do we know for sure binary codecs don't store type with each element?

src/serde_util.rs Outdated Show resolved Hide resolved
src/serde_util.rs Outdated Show resolved Hide resolved
v[i] = value;
} else {
let msg = format!("tuple too short, index: {}, expected length: {}", i, $len);
return Err(de::Error::custom(msg));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid_length() would be better. Also I think de::Error is a bit confusing because it refers to the trait not type. Err:: would look better but Err is also special - even used here - how come it even works? I suggest either calling it Error or just E.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks.

return Err(de::Error::custom(msg));
}
}
(self.parse_fn)(&v).map_err(de::Error::custom)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error could be specialized too. Maybe parse_fn should return appropriate serde error already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will re-think, thanks.

macro_rules! impl_tuple_visitor {
($thing:ident, $len:expr) => {
/// Visitor type for de/ser a 32 byte sequence.
#[cfg(any(feature = "std", feature = "alloc"))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer needed and the reason why test fails. FTR, if it was needed it'd be a demonstration of the compatibility issues I mentioned a few times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sloppy work by me. This is not needed.

@tcharding
Copy link
Member Author

Do we know for sure binary codecs don't store type with each element?

Don't the unit tests check that? They test the token is exactly the expected type? Or do you mean that we should pick a binary codec and add a unit test that serializes and checks the number of bytes is as expected?

@tcharding
Copy link
Member Author

Thanks for the review @Kixunil, I'll re-spin tomorrow.

@Kixunil
Copy link
Collaborator

Kixunil commented Mar 1, 2022

I mean this could be regression in data size for some formats. We don't have tests for specific data.

/// Implements de/serialization with the `serde` feature enabled. Serialized data is fixed width (32
/// bytes) if you use `bincode` to do serialization, other binary formats may add metadata to the
/// serialized data (e.g. cbor and tycho).
///
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be very useful to explicitly say we treat it as tuple of 32 u8s for non-human formats. I think it's a good idea to mention serialization formats but I would word it like this:

"This representation is optimal for e.g. bincode format. Some formats may be less optimal."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While implementing this suggestion I realized that we do not differentiate between human-readable and non-humand-readable in secp as we do in bitcoin. Is this by design or an omission?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that the docs are now slightly misleading because they imply that we do something different for human-readable formats.

Implements de/serialization with the serde feature enabled. We treat the byte value as a tuple of 33 u8s for non-human-readable formats. This representation is optimal for for some formats (e.g. bincode) however other formats may be less optimal (e.g. cbor).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I realized it may sound like we don't implement human-readable format. So should say we do it as hex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I documented the hex string for human-readable formats.

tests/serde.rs Outdated
let mut e = cbor::Encoder::from_memory();
e.encode(sk.as_ref()).unwrap();

assert_ne!(e.as_bytes().len(), 32);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cbor adds a byte for each item shouldn't this be 64?

Copy link
Member Author

@tcharding tcharding Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only adds metadata if the value is < 24 (no clue why). I've changed the test to check against len==52 and explained the '52' in a code comment. This is a regression test after all so I think its ok to have that hard coded value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's weird.

tests/serde.rs Outdated
let pk = PublicKey::from_slice(&PK_BYTES).expect("failed to create pk from slice");
let ser = bincode::serialize(&pk).unwrap();

// We cannot use assert_eq on the whole array because 33 byte array does not implement Debug.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe assert_eq!(ser, &PK_BYTES as &[u8]) should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legend, thanks!

@tcharding tcharding force-pushed the fixed-width-serde branch 2 times, most recently from f498e1b to 7ebb6a5 Compare March 8, 2022 21:56
@tcharding
Copy link
Member Author

Changes in foce-push:

  • Fixed docs as suggested by @Kixunil
  • Improved unit test as discussed above

@tcharding
Copy link
Member Author

Changes in force-push: Updated serde unit test for XOnlyPublicKey to include the new return type of from_keypair (returns parity now too).

@tcharding tcharding force-pushed the fixed-width-serde branch 2 times, most recently from f7eba2b to db656a5 Compare May 11, 2022 23:35
@tcharding
Copy link
Member Author

And added the hex docs suggestion from above.

apoelstra
apoelstra previously approved these changes May 13, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK db656a5

Thank you for checking the actual behavior! And for adding integration tests that will detect if we break things in the future.

I am a little bit wary of this PR beacuse it will result in serialization breakage. It will definitely need to go in a major release at least.

@tcharding
Copy link
Member Author

I am a little bit wary of this PR beacuse it will result in serialization breakage.

Do you want me to look into adding some sort of compatibility code for migrating from the current serialization to the one in this PR?

@apoelstra
Copy link
Member

@tcharding sure -- but I'm not optimistic because there is no versioning or anything in our encoding.

One strategy might be to continue to allow decoding the old format, while only encoding the new format ... but that has the potential to just make the breakage harder-to-find for downstream users.

@tcharding
Copy link
Member Author

Ok, without a clear positive way forward I'll leave it as is and hide behind 'not version 1.0 yet' :)

@tcharding
Copy link
Member Author

Changes in force push: rebase, the only merge conflict was a path (add crate prefix) in unit tests. No other changes.

Currently we serialize keys using the `BytesVisitor`, this causes the
serialized data to contain additional metadata encoding the length (an
extra 8 bytes) when serialized with the `bincode` crate. This extra data
is unnecessary since we know in advance the length of these types.

It would be useful for users of the lib to be able to get a fixed width
binary serialization, this can be done but it depends on the crate used
to do the serialization. We elect to optimise for `bincode` and add docs
noting that other binary serialization crates may differ (rustdocs added
in separate patches).

Implement a tuple based visitor that encodes the keys as fixed width
data.

Do fixed width serde implementations for:

- `SecretKey`
- `PublicKey`
- `KeyPair`
- `XOnlyPublicKey`
Currently the rustdocs for `KeyPair` are stale in regards to serde, we
_do_ implement `Serialize` and `Deserialize` for `KeyPair`.

Improve the rustdocs for `KeyPair` by removing stale docs and adding
docs on fixed width binary serialization.
We recently added fixed width serialization for some types however
serialization is only fixed width when data is serialized with the
`bincode` crate.

Add rustdocs describing fixed width serde to `SecretKey`, `PublicKey`,
and `XOnlyPublicKey` (`KeyPair` is already done).
Add a `tests` directory. Add `serde` tests for the recently added fixed
width binary serialization code.

Please note, serialization is only fixed width when serialized with
the `bincode` crate.
@tcharding
Copy link
Member Author

FTR I do not intend to push for this PR to be merged, I am unclear whether it is worth having or not. I will keep rebasing it to give others a chance to comment.

@Kixunil
Copy link
Collaborator

Kixunil commented Jun 14, 2022

I think up to 25% space saving is worth breaking already unstable library. :)

@apoelstra
Copy link
Member

I have mixed feelings, but concept ACK from me because @Kixunil is so persuasive.

Also, selfishly, I think that my personal applications will be unaffected because I always store keys in the ASCII serialization..

Will re-review and merge this shortly.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 3ca7f49

@apoelstra apoelstra merged commit 613d7dc into rust-bitcoin:master Jun 15, 2022
@tcharding tcharding deleted the fixed-width-serde branch July 21, 2022 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants