Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

Closed
alexcrichton opened this issue Mar 1, 2018 · 2 comments
Closed
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Mar 1, 2018

I ran across a very curious case today... Given this procedural macro:

#![crate_type = "proc-macro"]
#![feature(proc_macro)]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_derive(Foo, attributes(foo))]
pub fn foo(input: TokenStream) -> TokenStream {
    println!("{}", input.to_string());
    print_tts(input, 0);

    TokenStream::empty()
}

fn print_tts(input: TokenStream, tab: usize) {
    let mut t = String::new();
    for _ in 0..tab {
        t.push_str("  ");
    }
    for token in input {
        match token.kind {
            proc_macro::TokenNode::Group(d, others) => {
                println!("{}{:?}", t, d);
                print_tts(others.into(), tab + 1);
            }
            s => println!("{}{:?}", t, s),
        }
    }
}

and this expansion:

#![crate_type = "rlib"]
#![feature(proc_macro)]

#[macro_use]
extern crate foo;

#[derive(Foo)]
pub struct MyStructc {
    #[cfg_attr(my_cfg, foo)]
    _a: i32,
}

when compiled I get:

$ rustc +nightly foo.rs && rustc +nightly bar.rs -L .
pub struct MyStructc {
    _a: i32,
}
Term(Term(pub))
Term(Term(struct))
Term(Term(MyStructc))
Brace
  Op('#', Alone)
  Bracket
    Term(Term(cfg_attr))
    Parenthesis
      Term(Term(my_cfg))
      Op(',', Alone)
      Term(Term(foo))
  Term(Term(_a))
  Op(':', Alone)
  Term(Term(i32))
Op(',', Alone)

which is quite curious!

The to_string() representation has the field present (even though the --cfg isn't supplied) and the token stream has the actual #[cfg_attr] there.

I was actually expecting neither outcome to happen (how naive of me!) in terms of #[cfg] processing typically happening before custom derives, but maybe that's not possible? -- er, just wrong thinking here

cc @dtolnay
cc @jseyfried
cc @nrc

@alexcrichton alexcrichton added the A-macros-2.0 Area: Declarative macros 2.0 (#39412) label Mar 1, 2018
@dtolnay
Copy link
Member

dtolnay commented Mar 1, 2018

Seems similar to how mod also causes an inconsistency. #47627

@alexcrichton
Copy link
Member Author

Ok I think I see what's going on here.

This has to do with this comment introduced in #43230. Basically I think this boils down to #43081, we fundamentally cannot losslessly tokenize an AST item into a list of tokens to hand to a procedural macro.

What's happening here is that all the tokens of the struct MyStructc are saved off in the parser. This is the original source token stream which produced the struct. Later on when we tokenize the struct to hand off to a procedural macro we will use this cached token stream, if available, and tokenize it out and return it. When stringifying, however, we use the actual AST node itself (the source of truth).

The bug is that cfg processing does indeed happen before the procedural macro runs. The cfg processing is modifying the AST item's internal fields but it's not invalidating the cached token stream or modifying it. Hence when the procedural macro runs the cached token stream is invalid wrt the internal state of the AST node, causing two different results to come out depending on how you look at it.

Possible solutions here are:

  1. Invalidate the cached list of tokens if the AST node is modified. This seems tricky to get right, are there other locations where a modification can happen?
  2. Actually implemented Compiler loses location information before calling macros (sometimes) #43081, but holy cow that's a lot of work.
  3. Compare the two results of tokenization (stringify + parse and the tokens themselves) and return the string one if they differ.

None of them seem like great solutions :(

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 10, 2018
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
kennytm added a commit to kennytm/rust that referenced this issue Apr 14, 2018
…r=nrc

proc_macro: Avoid cached TokenStream more often

This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)
Projects
None yet
Development

No branches or pull requests

2 participants