Implement tokenization for some items in proc_macro #43230

alexcrichton · 2017-07-14T05:33:04Z

This PR is a partial implementation of #43081 targeted towards preserving span information in attribute-like procedural macros. Currently all attribute-like macros will lose span information with the input token stream if it's iterated over due to the inability of the compiler to losslessly tokenize an AST node. This PR takes a strategy of saving off a list of tokens in particular AST nodes to return a lossless tokenized version. There's a few limitations with this PR, however, so the old fallback remains in place.

rust-highfive · 2017-07-14T05:33:15Z

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2017-07-14T05:34:12Z

r? @jseyfried

cc @nrc

There's a few comments in the code particularly around handling inner attributes which I'd be very curious to hear others' thoughts on! I'm not sure that this is the best solution long-term, but I figured it'd be good to at least put this up for review!

kennytm · 2017-07-14T08:35:53Z

Several compile-fail tests failed.

[00:40:29] failures:
[00:40:29]     [compile-fail] compile-fail/import-prefix-macro-1.rs
[00:40:29]     [compile-fail] compile-fail/issue-20616-2.rs
[00:40:29]     [compile-fail] compile-fail/issue-39616.rs
[00:40:29]     [compile-fail] compile-fail/privacy/restricted/tuple-struct-fields/test.rs
[00:40:29]     [compile-fail] compile-fail/privacy/restricted/tuple-struct-fields/test2.rs
[00:40:29]     [compile-fail] compile-fail/privacy/restricted/tuple-struct-fields/test3.rs
[00:40:29]     [compile-fail] compile-fail/self-vs-path-ambiguity.rs
[00:40:29] 
[00:40:29] test result: FAILED. 2683 passed; 7 failed; 13 ignored; 0 measured; 0 filtered out

arielb1 · 2017-07-18T21:50:35Z

ping @jseyfried for a review.

alexcrichton · 2017-07-22T03:46:01Z

r? @nrc

jseyfried · 2017-07-24T19:13:32Z

Reviewed, r=me once @nrc gets a chance to peruse (fn collect_tokens could use a second pair of eyes).

I believe we can improve the LastToken data model, but deferring that to another PR is fine. I think the best way forward is to extend the Cursor API to allow efficient slicing.

alexcrichton · 2017-07-27T23:26:45Z

Friendly triage ping for you @nrc!

jseyfried · 2017-07-28T00:14:47Z

src/libproc_macro/lib.rs

+
+                match nt.0 {
+                    Nonterminal::NtItem(ref item) => {
+                        tokens = item.tokens.as_ref();


Since item.token doesn't include outer attributes, shouldn't we add them back in here?
For example, consider:

#[my_proc_macro] #[some_attribute] fn f() {}

Today, my_proc_macro sees #[some_attribute] fn f() {}; after this PR, I think it would just see fn f() {}.

Ah ok, excellent point! I wasn't exactly sure how this worked. I think for now that'll have to fall back to the stringification path, but I'll implement that and add a test.

Could we just use stringification to get the attributes' tokens and then prepend them to the item's real tokens?

Another excellent point!

nrc · 2017-07-28T04:00:00Z

So, I thought I understood inner attributes, but I recently found out that I don't :-s So I'm not sure I'll be helpful on those things.

I feel like adding tokens directly to Item is wrong some how. I think it would be better to add it to the Span. I suppose the former is more practical for now since we're only adding to items, not all nodes. We might also replace the span by tokens, so I'm not sure which is best in the long term. Which is to say, this is probably fine in the short term.

nrc

r+ with jseyfried's attribute comment addressed and the extra comment.

nrc · 2017-07-28T04:21:29Z

src/libsyntax/parse/parser.rs

+#[derive(Clone)]
+enum LastToken {
+    Collecting(Vec<TokenTree>),
+    Was(Option<TokenTree>),


Could you comment on how this enum is used please?

This commit adds a new field to the `Item` AST node in libsyntax to optionally contain the original token stream that the item itself was parsed from. This is currently `None` everywhere but is intended for use later with procedural macros.

This partly resolves the `FIXME` located in `src/libproc_macro/lib.rs` when interpreting interpolated tokens. All instances of `ast::Item` which have a list of tokens attached to them now use that list of tokens to losslessly get converted into a `TokenTree` instead of going through stringification and losing span information. cc rust-lang#43081

This test currently fails because the tokenization of an AST item during the expansion of a procedural macro attribute rounds-trips through strings, losing span information.

This is then later used by `proc_macro` to generate a new `proc_macro::TokenTree` which preserves span information. Unfortunately this isn't a bullet-proof approach as it doesn't handle the case when there's still other attributes on the item, especially inner attributes. Despite this the intention here is to solve the primary use case for procedural attributes, attached to functions as outer attributes, likely bare. In this situation we should be able to now yield a lossless stream of tokens to preserve span information.

alexcrichton · 2017-07-28T17:47:19Z

@bors: r=nrc,jseyfried

bors · 2017-07-28T17:47:20Z

📌 Commit 4886ec8 has been approved by nrc,jseyfried

alexcrichton · 2017-07-28T17:47:25Z

@bors: r=nrc,jseyfried

bors · 2017-07-28T17:47:26Z

💡 This pull request was already approved, no need to approve it again.

There's another pull request that is currently being tested, blocking this pull request: improve case with both anonymous lifetime parameters #43269 #43298

bors · 2017-07-28T17:47:26Z

📌 Commit 4886ec8 has been approved by nrc,jseyfried

bors · 2017-07-28T18:31:58Z

⌛ Testing commit 4886ec8 with merge 126321e...

Implement tokenization for some items in proc_macro This PR is a partial implementation of #43081 targeted towards preserving span information in attribute-like procedural macros. Currently all attribute-like macros will lose span information with the input token stream if it's iterated over due to the inability of the compiler to losslessly tokenize an AST node. This PR takes a strategy of saving off a list of tokens in particular AST nodes to return a lossless tokenized version. There's a few limitations with this PR, however, so the old fallback remains in place.

bors · 2017-07-28T23:20:59Z

☀️ Test successful - status-appveyor, status-travis
Approved by: nrc,jseyfried
Pushing 126321e to master...

This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846

…r=nrc proc_macro: Avoid cached TokenStream more often This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846

Ever plagued by rust-lang#43081 the compiler can return surprising spans in situations related to procedural macros. This is exhibited by rust-lang#47983 where whenever a procedural macro is invoked in a nested item context it would fail to have correct span information. While rust-lang#43230 provided a "hack" to cache the token stream used for each item in the compiler it's not a full-blown solution. This commit continues to extend this "hack" a bit more to work for nested items. Previously in the parser the `parse_item` method would collect the tokens for an item into a cache on the item itself. It turned out, however, that nested items were parsed through the `parse_item_` method, so they didn't receive similar treatment. To remedy this situation the hook for collecting tokens was moved into `parse_item_` instead of `parse_item`. Afterwards the token collection scheme was updated to support nested collection of tokens. This is implemented by tracking `TokenStream` tokens instead of `TokenTree` to allow for collecting items into streams at intermediate layers and having them interleaved in the upper layers. All in all, this... Closes rust-lang#47983

rustc: Implement tokenization of nested items Ever plagued by #43081 the compiler can return surprising spans in situations related to procedural macros. This is exhibited by #47983 where whenever a procedural macro is invoked in a nested item context it would fail to have correct span information. While #43230 provided a "hack" to cache the token stream used for each item in the compiler it's not a full-blown solution. This commit continues to extend this "hack" a bit more to work for nested items. Previously in the parser the `parse_item` method would collect the tokens for an item into a cache on the item itself. It turned out, however, that nested items were parsed through the `parse_item_` method, so they didn't receive similar treatment. To remedy this situation the hook for collecting tokens was moved into `parse_item_` instead of `parse_item`. Afterwards the token collection scheme was updated to support nested collection of tokens. This is implemented by tracking `TokenStream` tokens instead of `TokenTree` to allow for collecting items into streams at intermediate layers and having them interleaved in the upper layers. All in all, this... Closes #47983

rust-highfive assigned pnkfelix Jul 14, 2017

rust-highfive assigned jseyfried and unassigned pnkfelix Jul 14, 2017

alexcrichton force-pushed the more-tokenstream branch from 8124ba5 to 8c31595 Compare July 14, 2017 13:19

alexcrichton mentioned this pull request Jul 14, 2017

Work on nightly, not a fork alexcrichton/futures-await#1

Closed

7 tasks

alexcrichton force-pushed the more-tokenstream branch from 8c31595 to b26b7f1 Compare July 14, 2017 19:36

shepmaster added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 14, 2017

rust-highfive assigned nrc and unassigned jseyfried Jul 22, 2017

jseyfried reviewed Jul 28, 2017

View reviewed changes

nrc approved these changes Jul 28, 2017

View reviewed changes

nrc mentioned this pull request Jul 28, 2017

Source is missing from docs for crates that use a procedural macro #43371

Closed

alexcrichton added 4 commits July 28, 2017 07:58

Add a failing test for errors in proc macros

036300a

This test currently fails because the tokenization of an AST item during the expansion of a procedural macro attribute rounds-trips through strings, losing span information.

alexcrichton force-pushed the more-tokenstream branch from b26b7f1 to 4886ec8 Compare July 28, 2017 17:47

alexcrichton closed this Jul 28, 2017

alexcrichton reopened this Jul 28, 2017

bors merged commit 4886ec8 into rust-lang:master Jul 28, 2017

alexcrichton deleted the more-tokenstream branch August 22, 2017 19:44

alexcrichton mentioned this pull request Mar 2, 2018

Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

Closed

alexcrichton mentioned this pull request Apr 9, 2018

Compiler loses location information before calling macros (sometimes) #43081

Closed

alexcrichton mentioned this pull request Apr 10, 2018

proc_macro: Avoid cached TokenStream more often #49852

Merged

alexcrichton mentioned this pull request Jul 22, 2018

rustc: Implement tokenization of nested items #52618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement tokenization for some items in proc_macro #43230

Implement tokenization for some items in proc_macro #43230

alexcrichton commented Jul 14, 2017

rust-highfive commented Jul 14, 2017

alexcrichton commented Jul 14, 2017

kennytm commented Jul 14, 2017 •

edited

arielb1 commented Jul 18, 2017

alexcrichton commented Jul 22, 2017

jseyfried commented Jul 24, 2017

alexcrichton commented Jul 27, 2017

jseyfried Jul 28, 2017

alexcrichton Jul 28, 2017

jseyfried Jul 28, 2017

alexcrichton Jul 28, 2017

nrc commented Jul 28, 2017

nrc left a comment

nrc Jul 28, 2017

alexcrichton commented Jul 28, 2017

bors commented Jul 28, 2017

alexcrichton commented Jul 28, 2017

bors commented Jul 28, 2017

bors commented Jul 28, 2017

bors commented Jul 28, 2017

bors commented Jul 28, 2017

Implement tokenization for some items in proc_macro #43230

Implement tokenization for some items in proc_macro #43230

Conversation

alexcrichton commented Jul 14, 2017

rust-highfive commented Jul 14, 2017

alexcrichton commented Jul 14, 2017

kennytm commented Jul 14, 2017 • edited

arielb1 commented Jul 18, 2017

alexcrichton commented Jul 22, 2017

jseyfried commented Jul 24, 2017

alexcrichton commented Jul 27, 2017

jseyfried Jul 28, 2017

Choose a reason for hiding this comment

alexcrichton Jul 28, 2017

Choose a reason for hiding this comment

jseyfried Jul 28, 2017

Choose a reason for hiding this comment

alexcrichton Jul 28, 2017

Choose a reason for hiding this comment

nrc commented Jul 28, 2017

nrc left a comment

Choose a reason for hiding this comment

nrc Jul 28, 2017

Choose a reason for hiding this comment

alexcrichton commented Jul 28, 2017

bors commented Jul 28, 2017

alexcrichton commented Jul 28, 2017

bors commented Jul 28, 2017

bors commented Jul 28, 2017

bors commented Jul 28, 2017

bors commented Jul 28, 2017

kennytm commented Jul 14, 2017 •

edited