Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using syn for code generation instead of quote #2251

Closed
pvdrz opened this issue Aug 8, 2022 · 6 comments
Closed

Using syn for code generation instead of quote #2251

pvdrz opened this issue Aug 8, 2022 · 6 comments

Comments

@pvdrz
Copy link
Contributor

pvdrz commented Aug 8, 2022

This is a rather loose proposal but I'd like to start this discussion as it might provide a more straightforward way to handle issues like #564 and #1743. I've seen that most of the code generation is done relying on quote! which is a very straightforward and simple way to emit rust code.

However, tacking any issue that requires reordering or modifying the generated code would require several workarounds due to the unstructured nature of proc_macro2::TokenStream.

I was wondering if it would be feasible and useful use syn::parse_quote! and other tools provided by the syn crate for code generation in order to keep all the generated code in a more structured way before emitting it so bindgen can manipulate it in later stages in an easier way.

cc @amanjeev

Edit: I forgot to mention that one advantage of being able to handle all the items generated by bindgen as syn types is that we could implement "passes" in a modular way so we can easily enable/disable them. So for example, we could deduplicate extern "C" { ... } blocks in one pass and independently write another pass to reorganize items. In general, it would help a lot when having to handle generated code in my opinion.

Another important point is that we should be able to do this transition incrementally as we can always produce TokenStreams from the syn data structures by using the ToToken trait.

However, we think it's pretty important to know if this change would be worth it from the maintainers point of view or if a less intrusive way to solve the issues mentioned above would be better.

@kulp
Copy link
Member

kulp commented Aug 9, 2022

we think it's pretty important to know if this change would be worth it from the maintainers point of view or if a less intrusive way to solve the issues mentioned above would be better.

@emilio surely knows a lot more about the tradeoffs than I do. Personally, I have only ever written ordinary macros, and no procedural ones, so my opinion should not carry much weight, but to the level that I understand this issue's proposal, it sounds worthwhile to me.

Another important point is that we should be able to do this transition incrementally

An incremental approach, using relatively small, reversible changes, is a strongly favorable thing for me.

@pvdrz
Copy link
Contributor Author

pvdrz commented Aug 9, 2022

Hi @kulp thanks for you answer :)

An incremental approach, using relatively small, reversible changes, is a strongly favorable thing for me.

Yeah I think this is definitely possible, We could take functions and self-contained sections of code like this:

fn top_level_path(
    ctx: &BindgenContext,
    item: &Item,
) -> Vec<proc_macro2::TokenStream> {
    let mut path = vec![quote! { self }];

    if ctx.options().enable_cxx_namespaces {
        for _ in 0..item.codegen_depth(ctx) {
            path.push(quote! { super });
        }
    }

    path
}

and replace them by

fn top_level_path(
    ctx: &BindgenContext,
    item: &Item,
) -> Path {
    let mut segments = Punctuated::<PathSegment, Colon2>::new();

    segments.push(parse_quote!(self));

    if ctx.options().enable_cxx_namespaces {
        for _ in 0..item.codegen_depth(ctx) {
            path.push(parse_quote!(super));
        }
    }

    Path {
        leading_colon: None,
        segments,
    }
}

Then we can replace each use of top_level_path to either handle this as an actual Path (like when you want to write quote!(use #path)) or as a token stream (by calling path.into_token_stream()). This could be done in a contained and incremental manner in several places.

One aspect I'm not sure about is performance issues so we would also have to consider this.

@emilio
Copy link
Contributor

emilio commented Aug 10, 2022

Can you elaborate what use cases do you envision for this? And why are they not addressable for example if you use parse_quote! on the output of bindgen?

@pvdrz
Copy link
Contributor Author

pvdrz commented Aug 10, 2022

it would be definitely possible to use the output of codegen::codegen and doing parse_quote!(mod something { #( #items )* }), and then taking the contents of the module but it feels a bit hacky imo. Parsing each individual "item" in the output wouldn't be as straightforward because each "item" can be actually more than one item (For example, structs/enums and their impls are on the same "item").

@pvdrz
Copy link
Contributor Author

pvdrz commented Aug 10, 2022

That being said, moving to syn completely could be used to do some of these fixes "earlier" in code generation. But for the two issues I mentioned above, we can just parse the codegen output and do it afterwards.

@pvdrz
Copy link
Contributor Author

pvdrz commented Aug 19, 2022

Given that #2254 is close to being merged I'm marking this as solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants