Using `syn` for code generation instead of `quote` #2251

pvdrz · 2022-08-08T19:11:08Z

This is a rather loose proposal but I'd like to start this discussion as it might provide a more straightforward way to handle issues like #564 and #1743. I've seen that most of the code generation is done relying on quote! which is a very straightforward and simple way to emit rust code.

However, tacking any issue that requires reordering or modifying the generated code would require several workarounds due to the unstructured nature of proc_macro2::TokenStream.

I was wondering if it would be feasible and useful use syn::parse_quote! and other tools provided by the syn crate for code generation in order to keep all the generated code in a more structured way before emitting it so bindgen can manipulate it in later stages in an easier way.

cc @amanjeev

Edit: I forgot to mention that one advantage of being able to handle all the items generated by bindgen as syn types is that we could implement "passes" in a modular way so we can easily enable/disable them. So for example, we could deduplicate extern "C" { ... } blocks in one pass and independently write another pass to reorganize items. In general, it would help a lot when having to handle generated code in my opinion.

Another important point is that we should be able to do this transition incrementally as we can always produce TokenStreams from the syn data structures by using the ToToken trait.

However, we think it's pretty important to know if this change would be worth it from the maintainers point of view or if a less intrusive way to solve the issues mentioned above would be better.

The text was updated successfully, but these errors were encountered:

kulp · 2022-08-09T18:43:20Z

we think it's pretty important to know if this change would be worth it from the maintainers point of view or if a less intrusive way to solve the issues mentioned above would be better.

@emilio surely knows a lot more about the tradeoffs than I do. Personally, I have only ever written ordinary macros, and no procedural ones, so my opinion should not carry much weight, but to the level that I understand this issue's proposal, it sounds worthwhile to me.

Another important point is that we should be able to do this transition incrementally

An incremental approach, using relatively small, reversible changes, is a strongly favorable thing for me.

pvdrz · 2022-08-09T20:38:00Z

Hi @kulp thanks for you answer :)

An incremental approach, using relatively small, reversible changes, is a strongly favorable thing for me.

Yeah I think this is definitely possible, We could take functions and self-contained sections of code like this:

fn top_level_path(
    ctx: &BindgenContext,
    item: &Item,
) -> Vec<proc_macro2::TokenStream> {
    let mut path = vec![quote! { self }];

    if ctx.options().enable_cxx_namespaces {
        for _ in 0..item.codegen_depth(ctx) {
            path.push(quote! { super });
        }
    }

    path
}

and replace them by

fn top_level_path(
    ctx: &BindgenContext,
    item: &Item,
) -> Path {
    let mut segments = Punctuated::<PathSegment, Colon2>::new();

    segments.push(parse_quote!(self));

    if ctx.options().enable_cxx_namespaces {
        for _ in 0..item.codegen_depth(ctx) {
            path.push(parse_quote!(super));
        }
    }

    Path {
        leading_colon: None,
        segments,
    }
}

Then we can replace each use of top_level_path to either handle this as an actual Path (like when you want to write quote!(use #path)) or as a token stream (by calling path.into_token_stream()). This could be done in a contained and incremental manner in several places.

One aspect I'm not sure about is performance issues so we would also have to consider this.

emilio · 2022-08-10T11:10:55Z

Can you elaborate what use cases do you envision for this? And why are they not addressable for example if you use parse_quote! on the output of bindgen?

pvdrz · 2022-08-10T15:11:54Z

it would be definitely possible to use the output of codegen::codegen and doing parse_quote!(mod something { #( #items )* }), and then taking the contents of the module but it feels a bit hacky imo. Parsing each individual "item" in the output wouldn't be as straightforward because each "item" can be actually more than one item (For example, structs/enums and their impls are on the same "item").

pvdrz · 2022-08-10T15:19:12Z

That being said, moving to syn completely could be used to do some of these fixes "earlier" in code generation. But for the two issues I mentioned above, we can just parse the codegen output and do it afterwards.

pvdrz · 2022-08-19T15:28:14Z

Given that #2254 is close to being merged I'm marking this as solved

pvdrz closed this as completed Aug 19, 2022

kulp mentioned this issue Sep 10, 2022

add BindgenOptions::require_syn method #2265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `syn` for code generation instead of `quote` #2251

Using `syn` for code generation instead of `quote` #2251

pvdrz commented Aug 8, 2022 •

edited

kulp commented Aug 9, 2022

pvdrz commented Aug 9, 2022

emilio commented Aug 10, 2022

pvdrz commented Aug 10, 2022 •

edited

pvdrz commented Aug 10, 2022

pvdrz commented Aug 19, 2022

Using syn for code generation instead of quote #2251

Using syn for code generation instead of quote #2251

Comments

pvdrz commented Aug 8, 2022 • edited

kulp commented Aug 9, 2022

pvdrz commented Aug 9, 2022

emilio commented Aug 10, 2022

pvdrz commented Aug 10, 2022 • edited

pvdrz commented Aug 10, 2022

pvdrz commented Aug 19, 2022

Using `syn` for code generation instead of `quote` #2251

Using `syn` for code generation instead of `quote` #2251

pvdrz commented Aug 8, 2022 •

edited

pvdrz commented Aug 10, 2022 •

edited