proc_macro: Avoid cached TokenStream more often #49852

alexcrichton · 2018-04-10T20:06:15Z

This commit adds even more pessimization to use the cached TokenStream inside
of an AST node. As a reminder the proc_macro API requires taking an arbitrary
AST node and transforming it back into a TokenStream to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way proc_macro works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a TokenStream
cache representing the tokens they were originally parsed from. This
TokenStream cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example #[cfg] processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.

This commit tweaks the usage of the cached TokenStream to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes #48644
Closes #49846

This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846

rust-highfive · 2018-04-10T20:06:20Z

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2018-04-10T20:06:23Z

r? @nrc

cc @dtolnay

alexcrichton · 2018-04-10T20:08:42Z

Note that the real underlying issue here, #43081, I intend to leave open as that'll just get more pressing over time

nrc · 2018-04-10T22:33:18Z

I don't mind landing this, but it feels like a hack on top of a hack :-( We should definitely be thinking about a long-term solution.

Have you confirmed that the simple equality check works in practice? I could imagine it failing if there has been macro expansion resulting in interpolated tokens or with tokens which are otherwise ignored (comments, whitespace, etc.), etc.

Assuming that it works most of the time and it is thus an improvement, even if imperfect then r=me.

alexcrichton · 2018-04-11T02:11:23Z

I did some testing locally and it looks like this at least papers over the issue for now. I definitely agree this is piling hacks on hacks :(

The only real way forward that I know though is to take a strategy like syn and refactor the entire AST to have a lossless to_tokens method. That's a pretty significant undertaking though and a pretty massive refactoring in the compiler...

@bors: r=nrc

bors · 2018-04-11T02:11:24Z

📌 Commit 6d7cfd4 has been approved by nrc

…r=nrc proc_macro: Avoid cached TokenStream more often This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846

Rollup of 14 pull requests Successful merges: #49908, #49876, #49916, #49951, #49465, #49922, #49866, #49915, #49886, #49913, #49852, #49958, #49871, #49864 Failed merges:

rust-highfive assigned eddyb Apr 10, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 10, 2018

rust-highfive assigned nrc and unassigned eddyb Apr 10, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2018

kennytm mentioned this pull request Apr 14, 2018

Rollup of 14 pull requests #49939

Merged

bors merged commit 6d7cfd4 into rust-lang:master Apr 14, 2018

alexcrichton mentioned this pull request Apr 19, 2018

Hygiene break in macros involving string containing single quote #50061

Closed

alexcrichton deleted the fix-more-proc-macros branch April 20, 2018 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proc_macro: Avoid cached TokenStream more often #49852

proc_macro: Avoid cached TokenStream more often #49852

alexcrichton commented Apr 10, 2018

rust-highfive commented Apr 10, 2018

alexcrichton commented Apr 10, 2018

alexcrichton commented Apr 10, 2018

nrc commented Apr 10, 2018

alexcrichton commented Apr 11, 2018

bors commented Apr 11, 2018

proc_macro: Avoid cached TokenStream more often #49852

proc_macro: Avoid cached TokenStream more often #49852

Conversation

alexcrichton commented Apr 10, 2018

rust-highfive commented Apr 10, 2018

alexcrichton commented Apr 10, 2018

alexcrichton commented Apr 10, 2018

nrc commented Apr 10, 2018

alexcrichton commented Apr 11, 2018

bors commented Apr 11, 2018