Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimise rfc3339 (and rfc2822) #844

Merged
merged 6 commits into from Oct 26, 2022
Merged

Conversation

conradludgate
Copy link
Contributor

@conradludgate conradludgate commented Oct 15, 2022

We use to_rfc3339() a lot in our observability libraries at TrueLayer. Upon profiling, I noticed that it took up a large portion of time, so I looked into optimising it.

Screenshot 2022-10-15 at 10 43 30

Screenshot 2022-10-15 at 10 43 39

I was able to do more optimisations but the requirement of the year formatting causes the code to be pretty tricky. I can attempt it in a later PR if this is accepted.

In our observability code I know that the year is 0..=9999 and the timezone is Utc so I had a bit more gains from cheating 😅

@conradludgate conradludgate force-pushed the optimise-rfc3339 branch 2 times, most recently from a2e61fc to 431307c Compare October 15, 2022 09:46
@djc
Copy link
Contributor

djc commented Oct 15, 2022

Nice results!

Is the datetime_from_str regression stable? Why did you move all that code out of the Debug impls into an inherent method?

@conradludgate
Copy link
Contributor Author

conradludgate commented Oct 15, 2022

Why did you move all that code out of the Debug impls into an inherent method

I was just adding some comments to the code 😅
This allows for better code-reuse while allowing more inlining to avoid dynamic dispatch overhead

Is the datetime_from_str regression stable

Yeah I'm not sure on that one. I'm still looking into it

The regression is a false positive. It's a little flaky on my hardware but I can't reproduce any conclusive evidence that it's slower on either branch

Screenshot 2022-10-15 at 11 42 41

It dances around between 122ns to 127ns for both 0.4.x and this branch.

@conradludgate
Copy link
Contributor Author

Ok, I'm done tinkering 🙏

@conradludgate conradludgate changed the title optimise rfc3339 optimise rfc3339 (and rfc2822) Oct 15, 2022
Copy link
Collaborator

@esheppa esheppa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, it's nice to specialize the to_rfc* implementations and that is an impressive performance uplift

src/format/mod.rs Outdated Show resolved Hide resolved
src/format/mod.rs Outdated Show resolved Hide resolved
@djc
Copy link
Contributor

djc commented Oct 16, 2022

So can you explain what changes caused the majority of the performance improvements?

In particular I'm still unclear why it's good for performance to move the Debug code into write_into() methods; is that actually necessary for performance or is there a way of calling Debug::fmt without a performance penalty relative to calling write_into()? All other things being equal, I think leaving that code in the Debug impl would be more idiomatic.

@conradludgate
Copy link
Contributor Author

conradludgate commented Oct 16, 2022

Using the fmt methods needs a Formatter type. We don't always have access to one in our implementations. Currently in std there's no way to use fmt into a String without using dynamic dispatch. This is what the generic write_into methods provide by making it generic over fmt::Write. This then supports Formatter and String directly, and it inlines very well.

I'm no longer sure if the byte arrays provide a huge benefit of speed, but it is the only way to avoid the dyn overhead of integer writing, so it remains for now. Initially the idea was that writing into a static buffer would reduce the number of allocation checks and potential reallocs, although considering we specify the capacity upfront I believe this is negligible

@conradludgate
Copy link
Contributor Author

I've just stumbled upon ufmt, which is a dyn-free formatting machinery library and it shows similar gains https://docs.rs/ufmt/latest/ufmt/#benchmarks

Their Debug trait is basically what the write_into is trying to be, since its generic over the writer

Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I reviewed this in more detail and I think all the changes here are good, but I would like to have it split up in smaller commits in order to review the changes more carefully. Probably one commit for moving things into write_into(), also added a bunch of suggestions for things I think should be separate commits (separate PRs is also fine if you prefer).

I also think the docstrings you added need some work. Please write docstrings that describe the invariants/interface the function provides, instead of adding comments-as-docstrings. Preferably the first line of a docstring can stand alone and is no longer than a single line.

I would also like some benchmarks that compare writing into a byte array, then copying into a String with directly writing into the String (should probably reserve capacity up front). I'm definitely not a fan of the write_utf8_bytes() strategy...

Thank you for working on this!

src/format/mod.rs Show resolved Hide resolved
src/format/mod.rs Show resolved Hide resolved
src/format/mod.rs Show resolved Hide resolved
src/format/mod.rs Show resolved Hide resolved
src/naive/time/mod.rs Outdated Show resolved Hide resolved
src/format/mod.rs Outdated Show resolved Hide resolved
@conradludgate
Copy link
Contributor Author

Latest net improvements from the refactor

Screenshot 2022-10-17 at 11 48 34

@esheppa
Copy link
Collaborator

esheppa commented Oct 17, 2022

So can you explain what changes caused the majority of the performance improvements?

I did some experimentation around this, and found:

  • Pre allocating in the format function, gave about 15% of speedup
  • The new implementations of RFC3339 and RFC2822 alone, but with the original code in to_rfc* (eg with the Item::Fixed(Fixed::Rfc3339), gives about half of the speedup
  • My assumption is that the rest comes from the specialized impl avoiding the format function

@conradludgate
Copy link
Contributor Author

@djc the commits are all there now, in small logical chunks with the benchmark results in the commit message. Hope that helps

Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thanks!

@djc djc mentioned this pull request Oct 18, 2022
Copy link
Collaborator

@esheppa esheppa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @conradludgate - I'm happy with this as is but I've left a few comments to discuss prior to merging

src/format/mod.rs Show resolved Hide resolved
src/format/mod.rs Outdated Show resolved Hide resolved
src/naive/date.rs Outdated Show resolved Hide resolved
@greyblake
Copy link
Contributor

greyblake commented Oct 21, 2022

@djc @conradludgate Any updates? (looking forward for the release #850 👀 )

@djc djc merged commit 3e2f151 into chronotope:0.4.x Oct 26, 2022
@djc
Copy link
Contributor

djc commented Oct 26, 2022

Thanks for sticking with it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants