Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce unsafe in encoding::string::merge #196

Merged
merged 1 commit into from
Jan 12, 2020
Merged

Conversation

danburkert
Copy link
Collaborator

#194 introduced a ~3% regression on some decoding
benchmarks. My best guess is that swapping the value temporarily with a
new string is not being optimized well. This commit reverts to using
String::as_mut_vec, but carefully clears the string on decode failure.
It also fixes up the associated regression test, which wasn't properly
covering the malformed UTF-8 case.

// Before #194
dataset/google_message3_2/decode
                        time:   [219.45 ms 219.48 ms 219.51 ms]
                        thrpt:  [458.64 MiB/s 458.72 MiB/s 458.78 MiB/s]

// #194
dataset/google_message3_2/decode
                        time:   [226.08 ms 226.11 ms 226.18 ms]
                        thrpt:  [445.12 MiB/s 445.25 MiB/s 445.32 MiB/s]

// This commit
dataset/google_message3_2/decode
                        time:   [219.05 ms 219.76 ms 221.78 ms]
                        thrpt:  [453.94 MiB/s 458.12 MiB/s 459.61 MiB/s]

cc @mzabaluev @nrc

@danburkert
Copy link
Collaborator Author

The benchmark numbers above were run on the https://github.com/danburkert/prost/tree/benchmarks branch. I'm putting some TLC into the benchmark suite so I can properly review #187, which lead to finding this regression.

src/encoding.rs Outdated
// inserted into it the resulting vec are valid UTF-8. We check
// explicitly in order to ensure this is safe.
let value = value.as_mut_vec();
super::bytes::merge(wire_type, value, buf, ctx)?;
Copy link
Contributor

@mzabaluev mzabaluev Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves the String value in a potentially invalid state if any method on buf panics.
A scope guard, like originally proposed in 55f644e for #194 but working on the &mut Vec<u8> to avoid technical unsoundness, would be more bulletproof.

@danburkert
Copy link
Collaborator Author

@mzabaluev

This still leaves the String value in a potentially invalid state if any method on buf panics.
A scope guard, like originally proposed in 55f644e for #194 but working on the &mut Vec to > avoid technical unsoundness, would be more bulletproof.

Great point. I've pushed a new implementation with a drop guard. It's simpler than the original, since it turned out the implemention should have been clearing the string/bytes values on merge all along. This was revealed to be an issue in the conformance tests during the move from protobuf 3.7 to 3.11. I'm happy to report this drop guard impl has the same performance as without the drop guard.

3ab36fda9a1788e0 introduced a ~3% regression on some decoding
benchmarks. My best guess is that swapping the value temporarily with a
new string is not being optimized well. This commit reverts to using
String::as_mut_vec, but carefully clears the string on decode failure.
It also fixes up the associated regression test, which wasn't properly
covering the malformed UTF-8 case.

    // Before #194
    dataset/google_message3_2/decode
                            time:   [219.45 ms 219.48 ms 219.51 ms]
                            thrpt:  [458.64 MiB/s 458.72 MiB/s 458.78 MiB/s]

    // #194
    dataset/google_message3_2/decode
                            time:   [226.08 ms 226.11 ms 226.18 ms]
                            thrpt:  [445.12 MiB/s 445.25 MiB/s 445.32 MiB/s]

    // This commit
    dataset/google_message3_2/decode
                            time:   [219.05 ms 219.76 ms 221.78 ms]
                            thrpt:  [453.94 MiB/s 458.12 MiB/s 459.61 MiB/s]
@danburkert danburkert merged commit 420e032 into master Jan 12, 2020
@danburkert danburkert deleted the 194-followup branch January 12, 2020 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants