Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement human-readable serde for Witness #1068

Merged
merged 6 commits into from Jul 18, 2022

Conversation

tcharding
Copy link
Member

@tcharding tcharding commented Jun 28, 2022

This is dr-orlovsky's PR picked up at his permission in the discussion thread.

I went through the review comments and implemented everything except the perf optimisations. Also includes a patch at the front of the PR that adds a unit test that can be run to see the "before and after", not sure if we want it in, perhaps it should be removed before merge.

This PR implicitly fixes 942.

To test this PR works as advertised run cargo test display_transaction --features=serde -- --nocapture after creating a unit test as follows:

    // Used to verify that parts of a transaction pretty print.
    // `cargo test display_transaction --features=serde -- --nocapture`
    #[cfg(feature = "serde")]
    #[test]
    fn serde_display_transaction() {
        let tx_bytes = Vec::from_hex(
            "02000000000101595895ea20179de87052b4046dfe6fd515860505d6511a9004cf12a1f93cac7c01000000\
            00ffffffff01deb807000000000017a9140f3444e271620c736808aa7b33e370bd87cb5a078702483045022\
            100fb60dad8df4af2841adc0346638c16d0b8035f5e3f3753b88db122e70c79f9370220756e6633b17fd271\
            0e626347d28d60b0a2d6cbb41de51740644b9fb3ba7751040121028fa937ca8cba2197a37c007176ed89410\
            55d3bcb8627d085e94553e62f057dcc00000000"
        ).unwrap();
        let tx: Transaction = deserialize(&tx_bytes).unwrap();
        let ser = serde_json::to_string_pretty(&tx).unwrap();
        println!("{}", ser);
    }

Fixes: #942

src/blockdata/witness.rs Outdated Show resolved Hide resolved
DanGould added a commit to DanGould/rust-bitcoin that referenced this pull request Jun 28, 2022
@tcharding tcharding force-pushed the 06-28-display-witness branch 2 times, most recently from 5e47fe4 to 2181c0a Compare June 28, 2022 23:58
@tcharding
Copy link
Member Author

Changes in force push:

  • Use loop suggestion from @RCasatta to remove allocation

@tcharding tcharding added this to the 0.29.0 milestone Jun 29, 2022
Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review for now.


for elem in self.iter() {
if human_readable {
seq.serialize_element(&elem.to_hex())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates, however if we use collect_str it will be slower for default implementations because we can't reserve(). So I suggest not to worry about it at least for now.

let vec = Vec::<u8>::from_hex(&elem).map_err(|e| {
match e {
InvalidChar(c) => {
let c = core::char::from_u32(c as u32).unwrap_or('-');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid as, use .into() instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I think you told me this before. I must not have fully groked it then, so I went looking :) Is it because as can be lossy and from only does lossless conversions?

ref: https://stackoverflow.com/questions/48795329/what-is-the-difference-between-fromfrom-and-as-in-rust

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case as is guaranteed to be lossless but it's better to not have to manually check that and the property survives across refactors. (Such refactors almost certainly will not happen here but let's keep good habits, OK? :))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this still contains as and was marked resolved? Also still uses -.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that fresh mountain air on my week off broke my brain, I thought we had a discussion about no using default '-' and I don't see that. Not to mention that the '-' is still there like you say. Anyways, I've removed it and use Unexpected::Unsigned when its not a char (like I remember you suggesting previously). Thanks for patiently, repeatedly pointing out my mistakes :)

src/blockdata/witness.rs Outdated Show resolved Hide resolved
src/blockdata/witness.rs Outdated Show resolved Hide resolved
if deserializer.is_human_readable() {
deserializer.deserialize_seq(Visitor)
} else {
let vec: Vec<Vec<u8>> = serde::Deserialize::deserialize(deserializer)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could avoid allocations if we used custom visitor. That's why I originally suggested to call the one above HRVisitor so that the other one can be called BinaryVisitor (suggest better name please).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Righto, I'll have a play with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must not be fully understanding where all the allocations are occurring. I came up with the code below but it seems to me that this code is going to contain the exact same number of allocations as line 343 above (one for each element, and the one outer vector).

            fn visit_seq<A: serde::de::SeqAccess<'de>>(self, mut a: A) -> Result<Self::Value, A::Error>
            {
                let mut ret = Vec::new();
                while let Some(elem) = a.next_element::<Vec<u8>>()? {
                    ret.push(elem);
                }
                Ok(Witness::from_vec(ret))
            }

What am I missing?

Copy link
Collaborator

@Kixunil Kixunil Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to build Witness as non-nested Vec directly. You need to use next_element_seed and a bit more machinery to support it. This example looks very similar to what we have here, you can pretty much copy it minus some generics (we have u8). Note that that example forgot to call reserve() in front of the inner for loop. Should have this:

if let Some(size_hint) = seq.size_hint() {
    self.0.reserve(size_hint);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a good play with this, was educational. Its separate from this PR because its an optimisation, right? I think the correct way to go about it is merge this naive version (fixes the ugly serialization) and then create an separate issue/PR to do it as in the example you link. Would that be fair?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care if it's merged in this one or a separate one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll do it as a separate PR. Cheers.

apoelstra added a commit that referenced this pull request Jun 29, 2022
9bf9591 Optimize Witness Serialization (DanGould)

Pull request description:

  fix #942
  > self.to_vec() allocates, it should be possible to avoid it - just feed the items into serializer.

  based on #1068

ACKs for top commit:
  Kixunil:
    ACK 9bf9591
  apoelstra:
    ACK 9bf9591

Tree-SHA512: 14553dfed20aee50bb6361d44986b38556cbb3e112e1b4d9e3b401c3da831b6bdf159089966254cf371c225ae929fc78516c96a6114b40a7bc1fda7305295e4a
As is customary leave a line of white space between functions.
No need for a line of whitespace immediately starting an impl block.
Re-order the import statements in `witness` module to separate `crate`
imports from dependency imports.
@apoelstra
Copy link
Member

I suspect that serializaiton could also avoid the to_hex allocation -- but yes, let's go ahead and do that in another PR :)

apoelstra
apoelstra previously approved these changes Jul 11, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 4c1dfaf

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 12, 2022

@apoelstra It's not a clear win - if the serializer uses the default implementation we end up having slower code. I just posted to IRLO about this

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were some changes lost by accident?

).unwrap();
let tx: Transaction = deserialize(&tx_bytes).unwrap();
let ser = serde_json::to_string_pretty(&tx).unwrap();
println!("{}", ser);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use assert_eq. I know changes to serde would break this but seems less bad than manually checking with funny flags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove the test all together. I put it in the PR description in case anyone wants to verify that this PR works as advertised.

use hashes::hex::FromHex;
use hashes::hex::Error::*;

let mut ret = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_capacity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, cheers.

let vec = Vec::<u8>::from_hex(&elem).map_err(|e| {
match e {
InvalidChar(c) => {
let c = core::char::from_u32(c as u32).unwrap_or('-');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this still contains as and was marked resolved? Also still uses -.

dr-orlovsky and others added 3 commits July 13, 2022 09:23
This also removes tests for JSON backward-compatible encoding. Human-readable 
encoding will be changed in the next commit and this will break backward 
compatibility, thus that part of the test is removed.
Previous implementations of Witness (and Vec<Vec<u8>>) serde serialization
didn't support human-readable representations. This resulted in long unreadable
JSON/YAML byte arrays, which were especially ugly when pretty-printed (a line
per each byte).

Co-authored-by: Tobin C. Harding <me@tobin.cc>
@tcharding
Copy link
Member Author

Changes in force push:

  • Dropped patch that added the unit test displaying Witness
  • Modified commit 68577dfb Witness human-readable serde by doing
    • Added import for serde stuff to shorten types/lines
    • Removed use of unwrap_or('-'), used multiple different serde error types
    • Used into instead of as

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a1df62a

@tcharding tcharding added the one ack PRs that have one ACK, so one more can progress them label Jul 18, 2022
Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a1df62a

@Kixunil Kixunil added the API break This PR requires a version bump for the next release label Jul 18, 2022
@apoelstra apoelstra merged commit cd790dc into rust-bitcoin:master Jul 18, 2022
@tcharding tcharding deleted the 06-28-display-witness branch July 20, 2022 01:37
ChallengeDev210 pushed a commit to ChallengeDev210/rust-bitcoin that referenced this pull request Aug 1, 2022
9bf9591 Optimize Witness Serialization (DanGould)

Pull request description:

  fix #942
  > self.to_vec() allocates, it should be possible to avoid it - just feed the items into serializer.

  based on rust-bitcoin/rust-bitcoin#1068

ACKs for top commit:
  Kixunil:
    ACK 9bf9591
  apoelstra:
    ACK 9bf9591

Tree-SHA512: 14553dfed20aee50bb6361d44986b38556cbb3e112e1b4d9e3b401c3da831b6bdf159089966254cf371c225ae929fc78516c96a6114b40a7bc1fda7305295e4a
ChallengeDev210 pushed a commit to ChallengeDev210/rust-bitcoin that referenced this pull request Aug 1, 2022
…for `Witness`

a1df62a Witness human-readable serde test (Dr Maxim Orlovsky)
68577df Witness human-readable serde (Dr Maxim Orlovsky)
93b66c5 Witness serde: test binary encoding to be backward-compatible (Dr Maxim Orlovsky)
b409ae7 witness: Refactor import statements (Tobin C. Harding)
e23d3a8 Remove unnecessary whitespace (Tobin C. Harding)
ac55b10 Add whitespace between functions (Tobin C. Harding)

Pull request description:

  This is dr-orlovsky's [PR](rust-bitcoin/rust-bitcoin#899) picked up at his permission in the discussion thread.

  I went through the review comments and implemented everything except the perf optimisations. Also includes a patch at the front of the PR that adds a unit test that can be run to see the "before and after", not sure if we want it in, perhaps it should be removed before merge.

  This PR implicitly fixes 942.

  To test this PR works as advertised run `cargo test display_transaction --features=serde -- --nocapture` after creating a unit test as follows:
  ```rust

      // Used to verify that parts of a transaction pretty print.
      // `cargo test display_transaction --features=serde -- --nocapture`
      #[cfg(feature = "serde")]
      #[test]
      fn serde_display_transaction() {
          let tx_bytes = Vec::from_hex(
              "02000000000101595895ea20179de87052b4046dfe6fd515860505d6511a9004cf12a1f93cac7c01000000\
              00ffffffff01deb807000000000017a9140f3444e271620c736808aa7b33e370bd87cb5a078702483045022\
              100fb60dad8df4af2841adc0346638c16d0b8035f5e3f3753b88db122e70c79f9370220756e6633b17fd271\
              0e626347d28d60b0a2d6cbb41de51740644b9fb3ba7751040121028fa937ca8cba2197a37c007176ed89410\
              55d3bcb8627d085e94553e62f057dcc00000000"
          ).unwrap();
          let tx: Transaction = deserialize(&tx_bytes).unwrap();
          let ser = serde_json::to_string_pretty(&tx).unwrap();
          println!("{}", ser);
      }
  ```

  Fixes: #942

ACKs for top commit:
  apoelstra:
    ACK a1df62a
  Kixunil:
    ACK a1df62a

Tree-SHA512: d0ef5b8cbf1cf8456eaaea490a793f1ac7dfb18067c4019a2c3a1bdd9627a231a4dd0a0151a4df9af2b32b909d4b384a5bec1dd3e38d44dc6a23f9c40aa4f1f9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API break This PR requires a version bump for the next release one ack PRs that have one ACK, so one more can progress them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Witness serialization
6 participants