Add a performant way to (de-)serialize fixed-size byte arrays #127

hrxi · 2024-01-27T22:54:41Z

First observed in est31/serde-big-array#19, there's no performant way to serialize fixed-size byte arrays with serde/postcard.

Add a FixedSizeByteArray type that can be used to serialize fixed-size byte arrays faster than all other available methods. This type only works with postcard. A more general solution would need to be implemented in serde itself.

This works around serde-rs/serde#2680.

serialize16/own                 [3.8846 ns 3.8909 ns 3.8988 ns]
serialize16/bytes               [9.9192 ns 9.9503 ns 9.9907 ns]
serialize16/byte_array          [4.1374 ns 4.1461 ns 4.1564 ns]
serialize16/big_array           [68.245 ns 68.344 ns 68.454 ns]
serialize16/fixed_size          [9.9966 ns 10.030 ns 10.076 ns]
serialize16/variable_size       [15.696 ns 15.795 ns 15.898 ns]

serialize32/own                 [4.1744 ns 4.1857 ns 4.2014 ns]
serialize32/bytes               [9.6590 ns 9.6825 ns 9.7151 ns]
serialize32/byte_array          [4.5584 ns 4.5663 ns 4.5747 ns]
serialize32/big_array           [135.38 ns 135.78 ns 136.34 ns]
serialize32/fixed_size          [13.989 ns 14.034 ns 14.091 ns]
serialize32/variable_size       [21.223 ns 21.270 ns 21.328 ns]

deserialize16/own               [5.9018 ns 5.9143 ns 5.9272 ns]
deserialize16/bytes             [21.198 ns 21.278 ns 21.376 ns]
deserialize16/byte_array        [6.8560 ns 6.8719 ns 6.8934 ns]
deserialize16/big_array         [19.027 ns 19.098 ns 19.187 ns]
deserialize16/fixed_size        [7.9197 ns 7.9471 ns 7.9817 ns]
deserialize16/variable_size     [36.804 ns 36.871 ns 36.946 ns]

deserialize32/own               [8.8860 ns 8.8989 ns 8.9139 ns]
deserialize32/bytes             [21.241 ns 21.271 ns 21.321 ns]
deserialize32/byte_array        [11.441 ns 11.459 ns 11.481 ns]
deserialize32/big_array         [31.553 ns 31.618 ns 31.697 ns]
deserialize32/fixed_size        [17.334 ns 17.360 ns 17.386 ns]
deserialize32/variable_size     [59.705 ns 59.815 ns 59.949 ns]

own is the new FixedSizeByteArray, no length prefix.
bytes is serde_bytes::Bytes, it has a length prefix.
byte_array is serde_byte_array::ByteArray, it has a length prefix.
big_array is serde_big_array::Array, no length prefix.
fixed_size is [u8; _], no length prefix.
variable_size is [u8], it has a length prefix.

First observed in est31/serde-big-array#19, there's no performant way to serialize fixed-size byte arrays with serde/postcard. Add a `FixedSizeByteArray` type that can be used to serialize fixed-size byte arrays faster than all other available methods. This type only works with postcard. A more general solution would need to be implemented in serde itself. This works around serde-rs/serde#2680. ``` serialize16/own [3.8846 ns 3.8909 ns 3.8988 ns] serialize16/bytes [9.9192 ns 9.9503 ns 9.9907 ns] serialize16/byte_array [4.1374 ns 4.1461 ns 4.1564 ns] serialize16/big_array [68.245 ns 68.344 ns 68.454 ns] serialize16/fixed_size [9.9966 ns 10.030 ns 10.076 ns] serialize16/variable_size [15.696 ns 15.795 ns 15.898 ns] serialize32/own [4.1744 ns 4.1857 ns 4.2014 ns] serialize32/bytes [9.6590 ns 9.6825 ns 9.7151 ns] serialize32/byte_array [4.5584 ns 4.5663 ns 4.5747 ns] serialize32/big_array [135.38 ns 135.78 ns 136.34 ns] serialize32/fixed_size [13.989 ns 14.034 ns 14.091 ns] serialize32/variable_size [21.223 ns 21.270 ns 21.328 ns] deserialize16/own [5.9018 ns 5.9143 ns 5.9272 ns] deserialize16/bytes [21.198 ns 21.278 ns 21.376 ns] deserialize16/byte_array [6.8560 ns 6.8719 ns 6.8934 ns] deserialize16/big_array [19.027 ns 19.098 ns 19.187 ns] deserialize16/fixed_size [7.9197 ns 7.9471 ns 7.9817 ns] deserialize16/variable_size [36.804 ns 36.871 ns 36.946 ns] deserialize32/own [8.8860 ns 8.8989 ns 8.9139 ns] deserialize32/bytes [21.241 ns 21.271 ns 21.321 ns] deserialize32/byte_array [11.441 ns 11.459 ns 11.481 ns] deserialize32/big_array [31.553 ns 31.618 ns 31.697 ns] deserialize32/fixed_size [17.334 ns 17.360 ns 17.386 ns] deserialize32/variable_size [59.705 ns 59.815 ns 59.949 ns] ``` - `own` is the new `FixedSizeByteArray`, no length prefix. - `bytes` is `serde_bytes::Bytes`, it has a length prefix. - `byte_array` is `serde_byte_array::ByteArray`, it has a length prefix. - `big_array` is `serde_big_array::Array`, no length prefix. - `fixed_size` is `[u8; _]`, no length prefix. - `variable_size` is `[u8]`, it has a length prefix.

netlify · 2024-01-27T22:55:10Z

✅ Deploy Preview for cute-starship-2d9c9b canceled.

Name	Link
🔨 Latest commit	`b0a07fd`
🔍 Latest deploy log	https://app.netlify.com/sites/cute-starship-2d9c9b/deploys/65bbc34b4cfe0d0008d490b9

hrxi · 2024-01-27T22:56:28Z

This was motivated by benchmarking the serialization of a Merkle tree structure that serializes lots of hashes represented as fixed size byte arrays. The serialization of that tree has seen a >2x performance increase using this code.

jamesmunns · 2024-01-31T10:47:04Z

src/ser/serializer.rs

@@ -300,8 +300,12 @@ where
    }

    #[inline]
-    fn serialize_struct(self, _name: &'static str, _len: usize) -> Result<Self::SerializeStruct> {
-        Ok(self)
+    fn serialize_struct(self, name: &'static str, _len: usize) -> Result<Self::SerializeStruct> {


This makes me a little worried about degrading performance for all other cases to optimize specifically for the byte array case.

I believe (have not measured) that the performance degradation is going to be minimal. The extra code is a single pointer comparison, likely optimized out because this is in monomorphized code.

It's also the same way serde_json::value::RawValue is implemented. I believe serde_json is a high-performance serde JSON implementation ensured by benchmarks.

src/de/deserializer.rs

jamesmunns · 2024-01-31T10:57:00Z

Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.

This adds conditionals and panicking branches to every ser/de struct field path.

I think it should be possible to implement Serialize and Deserialize for FixedSizeByteArray without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple", like what other fixed size arrays already do today.

hrxi · 2024-01-31T14:32:00Z

I think it should be possible to implement Serialize and Deserialize for FixedSizeByteArray without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple"

Yes, that would be possible. It's what e.g. serde_big_array does. However, it leads to subpar performance, seen in the PR description. The motivation of this PR is precisely fixing this subpar performance. If there's another way to do that, I'd be more than happy to implement it. I wasn't able to come up with another strategy though.

This adds conditionals and panicking branches to every ser/de struct field path.

I'm not sure if this additional conditional/panic is a problem, I'd guess "no". Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?

hrxi · 2024-01-31T14:48:01Z

Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.

Thanks for taking a look at the PR and being quick to answer with your feelings. :)

jamesmunns · 2024-01-31T14:59:45Z

Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?

I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.

I'm interested in poking around with what serde-big-array is doing. I wonder if you had a SPECIFIC type for bytes, rather than "any T" like serde-big-array, if the performance delta would be so great.

This isn't a hard no on merging this, it just doesn't sit great with me due to the special-cased nature of it. If it is really the only way and doesn't have other perf impact tradeoffs, then I'm happy to consider it further.

I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.

hrxi · 2024-01-31T15:25:30Z

I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.

Note that the serialization in serde-big-array doesn't do anything weird, it's just a simple for loop, but it's still slow.

jamesmunns · 2024-01-31T15:31:53Z

I see what you mean, but there's a fair bit of state in PartiallyInitialized to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.

Feel free to ignore my gut until I (or you!) come up with data to prove it right/wrong.

hrxi · 2024-01-31T15:49:57Z

I see what you mean, but there's a fair bit of state in PartiallyInitialized to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.

AFAICT, this is only in the Deserialize implementation. The Serialize implementation looks completely plain to me:

impl<'de, T, const N: usize> BigArray<'de, T> for [T; N] {
    fn serialize<S>(&self, serializer: S) -> result::Result<S::Ok, S::Error>
    where
        S: Serializer,
        T: Serialize,
    {
        let mut seq = serializer.serialize_tuple(self.len())?;
        for elem in &self[..] {
            seq.serialize_element(elem)?;
        }
        seq.end()
    }
    […]
}

I agree that the Deserialize implementation might have some overhead.

hrxi · 2024-01-31T17:24:28Z

I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.

Not looking good from a first benchmark:

log/postcard/serialize    [317.97 µs 319.60 µs 321.70 µs]
log/postcard127/serialize [427.51 µs 429.38 µs 431.44 µs]
log/postcard/deserialize    [2.3656 ms 2.3784 ms 2.3933 ms]
log/postcard127/deserialize [2.3488 ms 2.3555 ms 2.3632 ms]

mesh/postcard/serialize    [669.90 µs 673.52 µs 678.01 µs]
mesh/postcard127/serialize [959.39 µs 962.73 µs 966.47 µs]
mesh/postcard/deserialize    [1.6938 ms 1.7085 ms 1.7258 ms]
mesh/postcard127/deserialize [1.3297 ms 1.3331 ms 1.3367 ms]

minecraft_savedata/postcard/serialize    [383.24 µs 386.63 µs 390.71 µs]
minecraft_savedata/postcard127/serialize [451.78 µs 453.60 µs 455.74 µs]
minecraft_savedata/postcard/deserialize    [2.0008 ms 2.0067 ms 2.0133 ms]
minecraft_savedata/postcard127/deserialize [1.9978 ms 2.0061 ms 2.0156 ms]

mk48/postcard/serialize    [1.7979 ms 1.8187 ms 1.8451 ms]
mk48/postcard127/serialize [1.8483 ms 1.8572 ms 1.8679 ms]
mk48/postcard/deserialize    [4.2330 ms 4.2469 ms 4.2622 ms]
mk48/postcard127/deserialize [4.2577 ms 4.2743 ms 4.2925 ms]

hrxi · 2024-01-31T18:28:43Z

Moving the serialization hack from serialize_struct to serialize_tuple_struct, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…

hrxi · 2024-02-01T15:54:15Z

Moving the serialization hack from serialize_struct to serialize_tuple_struct, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…

Would you be sympathetic to have this, or to have this as an opt-in feature?

jamesmunns · 2024-02-01T15:55:30Z

I'm not sure yet! I likely won't be able to poke around in this before the weekend.

They're used less often in real code.

hrxi · 2024-02-19T15:13:54Z

I'm interested in the general sentiment on this PR because depending on the result, I have to take different routes for the serialization in my project. Have you had time to look at it yet?

jamesmunns · 2024-02-19T15:21:39Z

Hey @hrxi, I haven't had a chance to prioritize this. I have a client deadline coming up soon, and will likely not have time to spend on this in the near future.

I'm interested in hearing about your exploration in this space, but I'd generally like to keep postcard as straightforward and predictable as possible, rather than optimizing any one specific use case.

If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard or your-optimized-postcard interchangeably as long as the wire format does not change.

hrxi · 2024-02-19T15:30:44Z

Thanks for the quick answer!

If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard or your-optimized-postcard interchangeably as long as the wire format does not change.

Yup, that's definitely an option I'll consider.

Add a FixedSizeByteArray type that can be used to serialize fixed-size byte arrays faster than all other available methods. This type only works with postcard. A more general solution would need to be implemented in serde itself. This works around serde-rs/serde#2680.

jamesmunns reviewed Jan 31, 2024

View reviewed changes

src/de/deserializer.rs Show resolved Hide resolved

Move fixed size byte array stuff to tuple structs

b0a07fd

They're used less often in real code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a performant way to (de-)serialize fixed-size byte arrays #127

Add a performant way to (de-)serialize fixed-size byte arrays #127

hrxi commented Jan 27, 2024

netlify bot commented Jan 27, 2024 •

edited

hrxi commented Jan 27, 2024

jamesmunns Jan 31, 2024

hrxi Jan 31, 2024

jamesmunns commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Jan 31, 2024

jamesmunns commented Jan 31, 2024

hrxi commented Jan 31, 2024

jamesmunns commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Feb 1, 2024

jamesmunns commented Feb 1, 2024

hrxi commented Feb 19, 2024

jamesmunns commented Feb 19, 2024

hrxi commented Feb 19, 2024

Add a performant way to (de-)serialize fixed-size byte arrays #127

Are you sure you want to change the base?

Add a performant way to (de-)serialize fixed-size byte arrays #127

Conversation

hrxi commented Jan 27, 2024

netlify bot commented Jan 27, 2024 • edited

✅ Deploy Preview for cute-starship-2d9c9b canceled.

hrxi commented Jan 27, 2024

jamesmunns Jan 31, 2024

Choose a reason for hiding this comment

hrxi Jan 31, 2024

Choose a reason for hiding this comment

jamesmunns commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Jan 31, 2024

jamesmunns commented Jan 31, 2024

hrxi commented Jan 31, 2024

jamesmunns commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Jan 31, 2024

hrxi commented Feb 1, 2024

jamesmunns commented Feb 1, 2024

hrxi commented Feb 19, 2024

jamesmunns commented Feb 19, 2024

hrxi commented Feb 19, 2024

netlify bot commented Jan 27, 2024 •

edited