Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PublicKey::to_sort_key method for use with sorting by key #1084

Merged
merged 1 commit into from Jul 13, 2022

Conversation

junderw
Copy link
Contributor

@junderw junderw commented Jul 3, 2022

Replaces #524

See previous PR for reasoning.

This solution is a little more straightforward. The name and documentation should be enough to prevent misuse.

We can also impl a to_sort_key for any CompressedKey added later. (or just impl Ord in a BIP67 compliant way)

TODO:

  • Add more sorting test vectors. Ideas of edge cases to test are welcome.

@junderw
Copy link
Contributor Author

junderw commented Jul 4, 2022

I added a vector with mixed uncompressed and compressed.

Other vectors are welcome.

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks usable. Though I have some more ideas, I'm not opposed to merging, except I'd really like to not have the get_ prefix.

It'd be nice to have some benchmarks too. I suspect for small slices (2-3 elements or so) sort_by_cached_key is slower than sort_unstable_by_key and if so it'd be handy to document it.

src/util/key.rs Outdated Show resolved Hide resolved
src/util/key.rs Show resolved Hide resolved
@junderw
Copy link
Contributor Author

junderw commented Jul 4, 2022

New fixes:

  1. 1.41.1 CI was broken because const generics don't exist, which means [u8; 65] is not Ord. This is fixed by converting to a tuple of (u8, [u8; 32], Option<[u8; 32]>) which will make the same order. (Only compressed keys will have a None in the last position)
  2. Changed the name to to_sort_key which is in line with the API Guidelines
  3. Added a key to the last vector which mirrors another key on the X axis. This creates a sort with the first 33 bytes exactly the same and only the Y value differs.

@junderw junderw changed the title Add PublicKey::get_sort_key method for use with sorting by key Add PublicKey::to_sort_key method for use with sorting by key Jul 4, 2022
@Kixunil
Copy link
Collaborator

Kixunil commented Jul 4, 2022

Hmm, Option adds one more byte to the type and given your reasoning it shouldn't be needed. Either way the type is becoming ugly so I propose to either:

  • Make it opaque; we can change it to type alias one day when we bump MSRV again
  • Throw it away and change to_bytes to return SerializedPublicKey exactly as I wrote it. (You reminded me Ord/PartialOrd is missing, will add soon.)

I'm starting to like the latter because it serves another purpose too, so it's more useful but the former may be very slightly faster when sorting (skips computing length; though length computing is very fast)

@junderw
Copy link
Contributor Author

junderw commented Jul 4, 2022

Removed Option. Wrapped tuple in opaque struct that derives Debug, PartialEq, Eq, PartialOrd, Ord, Clone, Copy

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to accepting this (modulo doc) but would like to see more discussion from others, especially regarding idea of changing to_bytes.

if self.compressed {
let bytes = self.inner.serialize();
let mut res = [0; 32];
res[..].copy_from_slice(&bytes[1..33]);
(bytes[0], res, None)
SortKey(bytes[0], res, [0; 32])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For #1060 we could use slice patterns here: let [prefix, x @ ..] = self.inner.serialize(); SortKey(prefix, x, [0; u32])

src/util/key.rs Outdated Show resolved Hide resolved
src/util/key.rs Outdated Show resolved Hide resolved
@junderw
Copy link
Contributor Author

junderw commented Jul 7, 2022

I improved the doc example and squashed everything into 1 commit.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 24f0441

@apoelstra
Copy link
Member

I'm a little tempted to bikeshed on the Option thing ... yes, we save a byte, but we plausibly speed up comparison time.

On the other hand I really doubt the speedup is measurable, so without benchmarks I won't push it. I'm just mentioning it for the record.

@junderw
Copy link
Contributor Author

junderw commented Jul 7, 2022

re: Option: I don't care either way. I ran some simple benchmarks locally (on rust playground) and it seems like an extra 32 0 == 0 u8 comparisons are about 0.8 ns.

However, like I said in the discussion, if you think about the most common use cases for this sorting, having two or more compressed pubkeys that are equal (which is the condition for hitting the 0 comparisons) is always a sign of something going wrong.

I can't think of one use case for sorting PublicKey objects where having two identical compressed keys isn't an indication of a serious bug somewhere else, and most correct programs will therefore never even hit the condition for the extra compares.

I would be interested if someone can think of a use case where two equal pubkeys isn't a sign of another issue...

But like you said, I think we can just merge as-is.

Just wanted to share my self-justification.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 7, 2022

@apoelstra the good thing about the newtype is we can change our minds without breaking the API. :) We could also use zeroth byte to decide whether to compare whole array or not, it can even be branch-free.

@junderw block explorers counting address reuse within blocks. Not sure if useful but at least shouldn't be obviously wrong.

@junderw
Copy link
Contributor Author

junderw commented Jul 10, 2022

Also as an added note for reference on how Bitcoin Core does it:

Here is the area where sortedmulti() logic is written. It just uses std::sort on a vector of CPubKey.

Here is the logic for CPubKey comparison operators. vch is a 65 byte array with the first byte being 0x02|0x03|0x04 (so same format as we have here) and being compared... then memcmp is used to compare the rest (size() returns 33 when compressed, so it won't compare the 32 empty bytes in the array)

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 24f0441

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 11, 2022

@junderw the Core logic looks very similar to my suggestion. :)

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to accept this code if others say "we don't want to have SerializedPublicKey". If others want SerializedPublicKey then this would be duplication of code since that one would provide same thing with additional features.

However if others think that to_sort_key returning type alias of SerializedPublicKey is good for consumer code clarity, I'm also willing to accept this. :)

/// `SortKey` is not too useful by itself, but it can be used to sort a
/// `[PublicKey]` slice using `sort_unstable_by_key`, `sort_by_cached_key`,
/// `sort_by_key`, or any of the other `*_by_key` methods on slice.
/// Pass the method into the sort method directly. (ie. `PublicKey::to_sort_key`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to document that the conversion is non-trivial, so cached versions may perform better.

@apoelstra
Copy link
Member

I kinda like the idea of having SerializedPublicKey actually ... it would make a lot of Core behavior easy to emulate, not just sort order.

@junderw
Copy link
Contributor Author

junderw commented Jul 12, 2022

Since the link wasn't posted here I'll post a link to the SerializedPublicKey that was posted in the other thread.

@junderw
Copy link
Contributor Author

junderw commented Jul 12, 2022

tbh I think it would be better if SerializedPublicKey Derefed into bitcoin::PublicKey.

Since we already would be manually impl all the Ord Eq etc. we might as well hold on to the bitcoin::PublicKey internally.

@junderw
Copy link
Contributor Author

junderw commented Jul 12, 2022

Then it would be:

  1. All functionality is on bitcoin::PublicKey
  2. If the serialization is important, .into() will consume and wrap the bitcoin::PublicKey in a SerializedPublicKey with the 65 byte array.
  3. Perhaps also offer an .into_inner() for SerializedPublicKey that returns bitcoin::PublicKey and throws away the [u8; 65].

It will increase the data size of each key, but makes it more flexible, and we don't need to reimplement anything / worry about conversions when dealing with secp256k1 methods.

@junderw
Copy link
Contributor Author

junderw commented Jul 12, 2022

Looking into it, it seems like most functionality on bitcoin::PublicKey serializes internally, so a simple Deref would leave us with an awkward inefficient API where SPK would be caching the serialization, but using one of the Derefed methods would serialize again and not make use of the cached serialization in SPK.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 12, 2022

I think it would be better if SerializedPublicKey Derefed into bitcoin::PublicKey.

Even if I agreed, it's impossible without ~doubling it's size. SerializedPublicKey is just a bunch of bytes and it's equivalent to SmallVec<const LEN = 65, u8>

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 12, 2022

Hmm, but maybe methods for computing hashes of SerializedPublicKey would be helpful. And maybe also CachedPublicKey as a new type.

@apoelstra
Copy link
Member

If SerializedPublicKey actually contained a public key then I would no longer understand its purpose. I suppose in theory you would save a couple nanoseconds on comparisons, but you'd no longer be able to parse quickly and your data structure would be much larger.

I'm also not sure what CachedPublicKey would do?

@junderw
Copy link
Contributor Author

junderw commented Jul 12, 2022

I think everyone's vision of what SerializedPublicKey should be is all over the place and causing confusion.

Currently, bitcoin::PublicKey can:

  1. Create an instance from secp256k1::Pubkey which is guaranteed valid key.
  2. Get keyhash and wkeyhash (uses serialize*())
  3. Write into writer / output a Vec (uses serialize*())
  4. Read from reader / instanciate from &[u8] (uses the fallible secp256k1::Pubkey::from_slice)

So SerializedPublicKey should:

  1. Use secp256k1::Pubkey::from_slice etc. for any instantiation to ensure valid point / accept secp256k1::Pubkey
  2. Get keyhash and wkeyhash
  3. Write into writer
  4. Read from reader (get an instance)

In which case... we'd basically be replacing PublicKey completely...

Maybe the SerializedPublicKey discussion should happen separately and be introduced along with a deprecation of PublicKey... or replace PublicKey with it and release as breaking change.

SortKey and to_sort_key seems like a minimal non-breaking addition that provides a simple use case that isn't too performance critical to begin with.

Perhaps we should just merge and continue discussion about the specifics of SerializedPublicKey for a breaking change?

Thoughts?

@apoelstra
Copy link
Member

If we validate SerializedPublicKey then we ought to just go all the way and convert it to a PublicKey. This is barely more expensive and would save space.

Serializing is basically free. Parsing and validation are not. If we were to have a SerializedPublicKey the point would be to basically be a wrapper around a bytestring, analogous to CPublicKey in Core.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 12, 2022

@apoelstra CachedPublicKey would do exactly what @junderw proposes but being another type people could just choose it if they find it better based on their benchmarks. However since we don't have such benchmarks perhaps we should ignore it for now and only add it if many people report it's actually useful.

@junderw
Copy link
Contributor Author

junderw commented Jul 13, 2022

So basically, what @apoelstra wants is something similar to what @Kixunil except without using secp256k1::PublicKey for instantiation at all.

Basically any instance would not be validated upon instantiation beyond a simple length check and maybe check that first byte is in the set [2,3,4] (Bitcoin Core also allows [6,7] which I am not sure of their meaning beyond the [2,3] bit patterns with a 4 bit added on)

There is a public method on CPubKey that does simple validation (which checks that it isn't the Invalid default constructor... not sure if this is needed for Rust) and then another method for more expensive validation.

It also looks like they add overrides to allow CPubKey to be used similar to its underlying data structure, so we should implement Deref<Target = [u8]> and Index< usize >

We might as well just name it CPubKey, I guess.

However, I wonder about the utility of some of the methods on Bitcoin Core's CPubKey for the port to this library. (some of them seem to be working around C++ limitations)

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 13, 2022

Firstly, we're programming in Rust, not C++ so please avoid taking too much inspiration from Core design.

I'm not sure if we should support directly converting arbitrary slices to SerializedPublicKey. I'd prefer to avoid it and only implement it i someone has an actual use case.

Bitcoin Core also allows [6,7] which I am not sure of their meaning

Those are exotic encodings we don't support here.

we should implement Deref<Target = [u8]> and Index< usize >

The code I reference indeed has Deref but not Index because that one is provided by deref coercions.

We might as well just name it CPubKey, I guess.

Please no. This is C++/Core naming convention ("C" meaning "class") and it could be confusing since we will not match it entirely. I'm not opposed to saying in the doc "This is similar, but not exactly same as CPubKey in core." though.

However, I wonder about the utility of some of the methods on Bitcoin Core's CPubKey for the port to this library. (some of them seem to be working around C++ limitations)

Exactly, I think we should stick to it being a heap-allocation-free replacement for Vec<u8> returned from to_bytes(), maybe with hashing methods.

@junderw
Copy link
Contributor Author

junderw commented Jul 13, 2022

I'm just trying to figure out what it is about the CPubKey design that we want to emulate besides the ordering.

It (Core's CPubKey) validates signatures, verifies signatures, checks if signatures are lowS, recovers pubkeys from signatures+hashes into itself, derives child keys using bip32 by passing chain codes as an argument etc... all of which are... not things I would normally expect we'd want to emulate.

My stance is still unchanged:

  • This should be merged as-is to at least give a method for ordering pubkeys, and it's a simple opaque type.
  • Discussion on SerializedPublicKey and its API should be in a different issue / PR (since it is a source of endless bike-shedding on the nitty gritty details)
    • I think there is a high chance that whatever SerializedPublicKey eventually turns into will end up completely replacing bitcoin::PublicKey, as well.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 13, 2022

OK, I take back my willingness to mention CPubKey in the docs, it's completely confusing and it shouldn't affect the design at all. I'm taking Rust-first approach. I only want SerializedPublicKey to be a bunch of bytes similar to Box<[u8]> (yep, not even Vec<u8>) with the only difference being it's not heap allocated. The fact that it happens to provide ordering that happens to be in line with Core/BIP-67 is a nice side effect.

I think there is a high chance that whatever SerializedPublicKey eventually turns into will end up completely replacing bitcoin::PublicKey

I think the chance is near zero. It has concept NACK from me and I'd need some very serious argument to consider it.

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 24f0441

On the basis that this can be type aliased anyway and perhaps the clarity is useful. The discussion derailed and it's not worth it holding this longer.

@Kixunil Kixunil mentioned this pull request Jul 13, 2022
@apoelstra
Copy link
Member

Sounds good to me.

@apoelstra apoelstra merged commit 758dbfa into rust-bitcoin:master Jul 13, 2022
Copy link
Member

@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post merge ACk 24f0441

ChallengeDev210 pushed a commit to ChallengeDev210/rust-bitcoin that referenced this pull request Aug 1, 2022
…od for use with sorting by key

24f0441 Add PublicKey::to_sort_key method for use with sorting by key (junderw)

Pull request description:

  Replaces #524

  See previous PR for reasoning.

  This solution is a little more straightforward. The name and documentation should be enough to prevent misuse.

  We can also impl a to_sort_key for any CompressedKey added later. (or just impl Ord in a BIP67 compliant way)

  TODO:

  - [x] Add more sorting test vectors. Ideas of edge cases to test are welcome.

ACKs for top commit:
  apoelstra:
    ACK 24f0441
  tcharding:
    ACK 24f0441
  Kixunil:
    ACK 24f0441

Tree-SHA512: 92d68cccaf32e224dd7328edeb3481dd7dcefb2f9090b7381e135e897c21f79ade922815cc766c5adb7ba751b71b51a773d103af6ba13fc081e1f5bfb846b201
@Kixunil Kixunil added the minor API Change This PR should get a minor version bump label Aug 1, 2022
@Kixunil Kixunil added this to the 0.29.0 milestone Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor API Change This PR should get a minor version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants