Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Spanless Tokens as Strings #160

Open
rylev opened this issue Jun 17, 2020 · 15 comments
Open

Store Spanless Tokens as Strings #160

rylev opened this issue Jun 17, 2020 · 15 comments

Comments

@rylev
Copy link

rylev commented Jun 17, 2020

Thanks for the awesome crate! 馃帀 For the WinRT crate we generate a lot of code, and so performance is a concern. We noticed that when using the quote! macro a lot of destructors and clone implementations were being called. This is because the quote! macro conservatively borrows the variables it interpolates calling ToTokens::to_tokens on these variables which takes &self. This is presumably to support allowing interpolation of the same variable multiple times which definitely seems like a good default. However, this can be quite expensive to constantly clone and destroy the temporary variables that are only needed inside of the quote! macro. This might not be possible to avoid with some values like Strings, but we have many temporary TokenStreams where this cloning can be avoided.

As a workaround, I created a wrapper type which implements ToTokens and effectively moves the value it wraps into the resulting TokenStream like so:

use std::cell::RefCell;
use  proc_macro2::TokenStream; 
use quote::{quote, ToTokens}; 

struct TokenStreamWrapper(RefCell<Option<TokenStream>>);

impl TokenStreamWrapper {
    fn new(stream: TokenStream) -> Self {
        Self(RefCell::new(Some(stream)))
    }
}

impl ToTokens for TokenStreamWrapper {
    fn to_tokens(&self, tokens: &mut TokenStream) {
        let stream = self.0.replace(None).unwrap();
        tokens.extend(stream)
    }
}

This dramatically reduces the amount of times we run destructors. Is this the best way to tackle this issue? Would it be good for quote to have an API that allowed for this in cases where performance is needed?

@rylev rylev changed the title Temporary items are copied and destroyed Temporary items are cloned and destroyed even when they don't need to be Jun 17, 2020
@rylev rylev changed the title Temporary items are cloned and destroyed even when they don't need to be Temporary items are cloned and destroyed Jun 17, 2020
@dtolnay
Copy link
Owner

dtolnay commented Jun 17, 2020

proc_macro::TokenStream is reference counted so I would not expect that wrapper to make a dramatic difference when using quote inside of a proc macro. Can you share commands I can run to reproduce this? Based on what I can tell from https://github.com/microsoft/winrt-rs you only use quote inside of a proc macro (as opposed to a build script or something), but based on https://github.com/rylev/winrt-profile it appears that your profiling is not in a proc macro context. Those are going to have very different performance characteristics because proc macros use real libproc_macro objects from the compiler while non-macros use a minimal reimplementation of that API provided by proc-macro2.

@rylev
Copy link
Author

rylev commented Jun 17, 2020

Correct this is only being used inside of proc-macros, so this won鈥檛 help. Then I need to figure out how to profile inside of a proc-macro as we need to get our code generation time down. If you want to see this code used in a test project you can find that here.

@mystor
Copy link
Collaborator

mystor commented Jun 18, 2020

In order to properly test the compiler's performance your best bet is probably to run the benchmark within a proc macro while compiling a dummy crate. The macro will be dynamically loaded into the rustc executable, so profiling rustc while running the benchmark might get you useful information.

Trying to stand up a minimal slice of the compiler to run it separately is possible, but might not win you much. You need a significant amount of the compiler initialized to run the libsyntax proc_macro server. Using something like https://github.com/mystor/pm_fallback_server would get you a tiny bit closer to actual libsyntax proc_macro behaviour, as it goes through the libproc_macro server, but the actual performance of the server implementation would also be completely different, so it wouldn't be a great choice for profiling for the same reasons.

@rylev
Copy link
Author

rylev commented Jun 18, 2020

@mystor Unfortunately it seems that self profiling rustc does not yield much information as proc macros seem to show up as "macro_expand_crate" without any detailed information.

@dtolnay
Copy link
Owner

dtolnay commented Jun 18, 2020

perf record should work on macro expansion. https://gist.githubusercontent.com/dtolnay/5c5f4c403fa45d2b7d839e8d4b0497d9/raw/5328ec712bfe3e381f755a485cf667b7292b0b9a/winrt-flamegraph.svg
I just ran cargo check --verbose to grab the last rustc invocation and ran flamegraph rustc ....

@rylev
Copy link
Author

rylev commented Jun 18, 2020

Thanks! That helps! I notice a few interesting tidbits like the fact that quote::__private::push_ident is much more expensive appearing than I would have thought.

Screenshot 2020-06-18 at 19 31 06

Also, a good place to focus on is this particular method in the winrt code. I wouldn't see it seems very complicated but it's dominating time. It does however get called fairly often.

Is going through this flamegraph together helpful? I still think for WinRT, we're far from being at a point we're comfortable with performance wise.

@dtolnay
Copy link
Owner

dtolnay commented Jun 18, 2020

I think the flamegraph is a good direction for what to optimize. I am not able to dedicate more time to this right now but I would accept perf fixes in quote and proc-macro2. My guess is the main improvements will come from doing fewer calls over the proc macro bridge by smarter grouping/caching/short-circuiting in proc-macro2.

Also, since your use case doesn't seem like it particularly cares about preserving input token spans into the macro output, it wouldn't be fun but you can compare against doing all the expansion using String and doing only a big parse at the end to make tokens.

@rylev
Copy link
Author

rylev commented Jun 19, 2020

After doing more profiling I thinking going the route of string concatenation is best. The only down side of this is losing the nice experience of using the quote! macro. I think I will work on a new macro that works very similarly to quote! but that concatenates strings instead of builds TokenStreams.

@rylev
Copy link
Author

rylev commented Jun 22, 2020

@dtolnay So I worked on adding a new TokenString type to quote which simply wraps a string buffer instead of a TokenStream. I also crated a squote! which mimics quote! but uses the TokenString type instead. This is much more performant for my use case, and just simply switching over has decreased the amount of code generation time for winrt down to about 20% of what it was before. Is this something you'd be interested in supporting in quote or should I create a new crate for this?

@rylev
Copy link
Author

rylev commented Jul 1, 2020

@dtolnay ping! 馃榾 just need to know if a string backed quote macro would be interesting to add to this crate or if I should make a new crate for it. Cheers! 馃帀

@dtolnay
Copy link
Owner

dtolnay commented Jul 1, 2020

I would prefer not to put that in quote. I would be concerned of people using it unjustified (in procedural macros where the spans do matter and not having measured whether expansion is a perceptible fraction of their compile time) and having that harm the ecosystem through worse diagnostics and user experience.

@dtolnay
Copy link
Owner

dtolnay commented Jul 1, 2020

OTOH I would accept a fix in proc-macro2 + quote which defers instantiation of libproc_macro objects, which would give just about all the performance benefit without harming diagnostics or developer experience. It doesn't need any API change.

let b = /* from macro input */;

let t = quote! { a #b c d };

// `t` stores "a" and "c d" as strings not backed by libproc_macro object

let u = quote! { #t e };
// still string

let v: proc_macro::TokenStream = u.into(); // deferred libproc_macro calls take place here

@rylev
Copy link
Author

rylev commented Jul 2, 2020

@dtolnay Is the suggestion to do the following:

  • change proc_macro2::TokenStream to contain an ordered collection of proc_macro2::imp::TokenStream and String. The proc_macro2::imp::TokenStreams would be for any case where eager evaluation of the token stream is necessary (e.g., to preserve span information). Only when proc_macro::TokenStream as From<proc_macro2::TokenStream>>::from is called do we fold the collection into a proc_macro::TokenStream by parsing the strings and concatenating the resulting token streams together.
  • remove all push_$punct functions (leaving the spanned varieties), and store all punctuation directly in the string buffer.
  • runtime::push_ident would no longer call proc_macro2::Ident::new, but rather just push the ident string on to the string. The same would happen with runtime::parse.

proc_macro2::TokenStreams would then be composed of proc_macro::TokenStream for when spans are needed and String for when spans are not needed. Do I have this correct?

@dtolnay
Copy link
Owner

dtolnay commented Jul 3, 2020

Yes exactly. That means that in main/build.rs use cases (as opposed to procedural macros) with no existing spanned tokens coming in from the macro input, everything would end up being done in terms of String only, matching the performance characteristics of squote. Similarly if a particular proc macro only interpolates spanned input tokens in a few places but the majority of the output is built up using default spans, the majority of the processing would be done using the String backend only.

@rylev rylev changed the title Temporary items are cloned and destroyed Store Spanless Tokens as Strings Jul 3, 2020
@dtolnay
Copy link
Owner

dtolnay commented Dec 27, 2021

I've put up dtolnay/proc-macro2#320 with a draft of what I had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants