Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of String::push_str #229

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented Feb 9, 2024

I noticed when using collections::String::push_str with large slices that the performance was very poor compared to std::string::String.

I believe the reason is that std's Vec has a specialised implementation of extend_from_slice for Vec<u8>, but bumpalo's Vec does not (and probably can't without using nightly features).

Consequently, push_str(str) where str is 16 KB is currently equivalent to 16000 x individual calls to push() for each byte.

This PR fixes that by using ptr::copy_nonoverlapping to copy the entire slice in one go.

For a 16 KiB string, it's a 80x speed-up:

time:  [403.92 ns 429.47 ns 453.96 ns]                           
       thrpt:  [36.091 Gelem/s 38.149 Gelem/s 40.563 Gelem/s]
change:
       time:   [-98.836% -98.766% -98.700%] (p = 0.00 < 0.05)
       thrpt:  [+7592.7% +8002.3% +8491.5%]
       Performance has improved.

Also added a test for push_str with a long string, and a benchmark.

I've written the code in quite a verbose style with lengthy "SAFETY" comments. Maybe this is overkill, but you don't know me, so I wanted to make it clear from the code that the change is valid.

@overlookmotel
Copy link
Contributor Author

This change should also produce similar speed-up on String::from_str_in because it uses push_str internally.

It may also be worth investigating whether from_str_in can get a further speed-up if it doesn't use push_str at all, because the length + offset calculations and call to Vec::reserve in push_str are all extraneous when the String has been freshly created with sufficient capacity. If the compiler inlines push_str, it would hopefully figure that out itself, but I'm not sure if it will. If it doesn't:

pub fn from_str_in(s: &str, bump: &'bump Bump) -> String<'bump> {
  let len = s.len();
  let mut t = String::with_capacity_in(len, bump);
  unsafe {
    ptr::copy_nonoverlapping(s.as_ptr(), t.vec.as_mut_ptr(), len);
    t.vec.set_len(len);
  }
  t
}

Would hugely appreciate it if someone had time to review this PR. It appears to be a significant speed boost with no downside, as far as I can see.

OXC compiler uses bumpalo (to great effect, bumpalo is brilliant) and I imagine this small optimization to bumpalo would have a significant impact on OXC's benchmarks.

Copy link
Owner

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for your patience.

@fitzgen
Copy link
Owner

fitzgen commented Feb 13, 2024

It looks like there are some test failures in CI: https://github.com/fitzgen/bumpalo/actions/runs/7848854221/job/21528762407?pr=229#step:6:110

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Feb 13, 2024

My apologies! That's really weird, I was pretty sure I'd covered all the bases. I'll look into it.

And thanks very much for coming back. Sorry I hassled you.

@overlookmotel
Copy link
Contributor Author

Is it possible to re-run the "Rust / build (stable, --features collections,boxed)" CI job? Am wondering if it could be related to something in nightly, or genuinely 100% my stupidity.

@fitzgen
Copy link
Owner

fitzgen commented Feb 13, 2024

Retriggered.

Also possible that this test is failing on main right now, but I don't have time to investigate at the moment.

@overlookmotel
Copy link
Contributor Author

I've reproduced it locally. Yes, it is failing on main too.

Have opened #230 with the details.

The failure does appear to be completely unrelated to this PR, so would you consider merging this in the meantime?

@overlookmotel
Copy link
Contributor Author

I've also added a bit to the test to include pushing an empty string.

@overlookmotel
Copy link
Contributor Author

CI should pass if rebased on top of #232.

@fitzgen
Copy link
Owner

fitzgen commented Feb 14, 2024

This should be ready for a rebase.

@overlookmotel
Copy link
Contributor Author

Thanks. Now rebased on main.

@fitzgen fitzgen merged commit 1d31c76 into fitzgen:main Feb 14, 2024
8 checks passed
@overlookmotel overlookmotel deleted the faster-push-str branch February 14, 2024 18:03
@overlookmotel
Copy link
Contributor Author

@fitzgen I know I've failed on the follow-on for #232, but could I possibly make a request if you could do a release on crates.io containing the changes in this and #231?

It's a fairly significant speed-boost for commonly used functions, and would be a real help to be able to integrate these changes into OXC.

@fitzgen
Copy link
Owner

fitzgen commented Feb 15, 2024

Yeah I can do a release shortly. Thanks for the PRs!

@overlookmotel
Copy link
Contributor Author

Thanks @fitzgen. That'd be very much appreciated. I hope I can contribute more to bumpalo in future, just right now is not ideal timing for me.

@fitzgen
Copy link
Owner

fitzgen commented Feb 15, 2024

FYI 3.15.0 is now published on crates.io

@overlookmotel
Copy link
Contributor Author

FYI 3.15.0 is now published on crates.io

Amazing! Thank you.

Boshen pushed a commit to oxc-project/oxc that referenced this pull request Feb 18, 2024
Latest version of `bumpalo` includes a couple of performance fixes for
`String` (e.g. fitzgen/bumpalo#229) which may
help the parser a little.
IWANABETHATGUY pushed a commit to IWANABETHATGUY/oxc that referenced this pull request May 29, 2024
Latest version of `bumpalo` includes a couple of performance fixes for
`String` (e.g. fitzgen/bumpalo#229) which may
help the parser a little.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants