Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro: Avoid cached TokenStream more often #49852

Merged
merged 1 commit into from
Apr 14, 2018

Conversation

alexcrichton
Copy link
Member

This commit adds even more pessimization to use the cached TokenStream inside
of an AST node. As a reminder the proc_macro API requires taking an arbitrary
AST node and transforming it back into a TokenStream to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way proc_macro works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a TokenStream
cache representing the tokens they were originally parsed from. This
TokenStream cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example #[cfg] processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.

This commit tweaks the usage of the cached TokenStream to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes #48644
Closes #49846

This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
@rust-highfive
Copy link
Collaborator

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 10, 2018
@alexcrichton
Copy link
Member Author

r? @nrc

cc @dtolnay

@rust-highfive rust-highfive assigned nrc and unassigned eddyb Apr 10, 2018
@alexcrichton
Copy link
Member Author

Note that the real underlying issue here, #43081, I intend to leave open as that'll just get more pressing over time

@nrc
Copy link
Member

nrc commented Apr 10, 2018

I don't mind landing this, but it feels like a hack on top of a hack :-( We should definitely be thinking about a long-term solution.

Have you confirmed that the simple equality check works in practice? I could imagine it failing if there has been macro expansion resulting in interpolated tokens or with tokens which are otherwise ignored (comments, whitespace, etc.), etc.

Assuming that it works most of the time and it is thus an improvement, even if imperfect then r=me.

@alexcrichton
Copy link
Member Author

I did some testing locally and it looks like this at least papers over the issue for now. I definitely agree this is piling hacks on hacks :(

The only real way forward that I know though is to take a strategy like syn and refactor the entire AST to have a lossless to_tokens method. That's a pretty significant undertaking though and a pretty massive refactoring in the compiler...

@bors: r=nrc

@bors
Copy link
Contributor

bors commented Apr 11, 2018

📌 Commit 6d7cfd4 has been approved by nrc

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Apr 14, 2018
…r=nrc

proc_macro: Avoid cached TokenStream more often

This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
bors added a commit that referenced this pull request Apr 14, 2018
Rollup of 14 pull requests

Successful merges: #49908, #49876, #49916, #49951, #49465, #49922, #49866, #49915, #49886, #49913, #49852, #49958, #49871, #49864

Failed merges:
@bors bors merged commit 6d7cfd4 into rust-lang:master Apr 14, 2018
@alexcrichton alexcrichton deleted the fix-more-proc-macros branch April 20, 2018 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants