Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using $:literal containing integer to index into a tuple #91166

Closed
wants to merge 2 commits into from

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Nov 24, 2021

Example of the failure prior to this PR:

macro_rules! m {
    ($thing:ident . $field:literal) => {
        $thing . $field
    };
}

fn main() {
    let x = ('x',);
    let _ = m!(x.0);
}
error: unexpected token: `0`
 --> src/main.rs:3:18
  |
3 |         $thing . $field
  |                  ^^^^^^
...
9 |     let _ = m!(x.0);
  |             ------- in this macro invocation
  |
  = note: this error originates in the macro `m` (in Nightly builds, run with -Z macro-backtrace for more info)

@rust-highfive
Copy link
Collaborator

r? @cjgillot

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 24, 2021
@camelid camelid added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 25, 2021
@cjgillot
Copy link
Contributor

The implementation seems fine.
Since this allows more things in macro_rules, does lang team need to sign off on this?

@dtolnay dtolnay assigned scottmcm and unassigned cjgillot Nov 27, 2021
@scottmcm scottmcm added the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label Dec 3, 2021
@scottmcm
Copy link
Member

scottmcm commented Dec 3, 2021

Nominated for lang team discussion.

@dtolnay What is this going to do for the ol' x.0.0? For example, can I do this?

fn main() {
    let x = (('x',),);
    let _ = m!(x.0.0);
}

(Where 0.0 gets matched as a float literal.)

@dtolnay
Copy link
Member Author

dtolnay commented Dec 4, 2021

This PR keeps the current behavior for $ident . $float but I can change that too if we want it.

@petrochenkov
Copy link
Contributor

petrochenkov commented Dec 4, 2021

There's a definitive method to determine whether something interpolated should parse or not - checking how it would work in token-based expansion model (aka "how it would work in proc macro output"), as opposed to AST-based model that is currently used for implementing $interpolated fragments internally.

$thing . $field in the example above will expand into x . ⟪ 0 ⟫ where ⟪ ... ⟫ is a Delimiter::None group that is more or less equivalent to regular ( ... ) parentheses.
Only single-token fragments should be expanded without Delimiter::None, now it's only $tt ($ident is also a single-token fragment, but it's currently wrapped, but we should change that and remove the wrapping).

Do we want x . ( 0 ) to parse as a field access? No.
So this change should not go alone and should be accompanied with a change to $interpolated fragment wrapping, e.g. we need to stop wrapping single-token $lit fragments and only keep the wrapping for literals with -, or something like that.

Also see #67062 for the current issues with parsing Delimiter::Nones.

@dtolnay
Copy link
Member Author

dtolnay commented Dec 4, 2021

I'm not sure I agree this change would need to be bundled together with eliminating Delimiter::None wrappers around $:literal passed into a proc macro input. Tweaking the token stream passed into proc macros can be quite disruptive, as we saw during the process of fixing #43081 (which caused a new None wrapper around $:ident in some cases). Meanwhile this change for macro_rules is important for tt-munchers to be able to handle Rust expression syntax, as I came across in anyhow::ensure!:

(() . $member:ident $($rest:tt)*) => {};
(() . $member:literal $($rest:tt)*) => {};

It doesn't seem great to have to disrupt proc macros in order for macro_rules to get this improvement.

@scottmcm
Copy link
Member

scottmcm commented Dec 21, 2021

We had a long conversation about this at the last lang team meeting. My takeaway was that we were open for some kind of change around this area, but weren't sure what form that should be.

To attempt to summarize some of the things that came up:

  • What's the right way to think of the distinction between literal or tt here? There are of course a ton of literals that definitely wouldn't work after expansion, like 0_i32 or 0.0e0. Could the use case work with just a tt instead? Why or why not?
  • We'd expect this to not work if a 0 would be matched as an expr, right? Is there something fundamental about the literal vs the general expression here?
  • From a "well it's just tokens" perspective, why should this work for 0 but not 0.0 when the 0.0 version works when matched as a tt? https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9c7eeb41783c34c656364b1b885e4fe7
  • Should we add a new fieldname for use in macro_rules? It might be useful for things like matching struct definitions and literals too.

@scottmcm scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. labels Dec 21, 2021
@petrochenkov petrochenkov self-assigned this Dec 21, 2021
@petrochenkov
Copy link
Contributor

petrochenkov commented Dec 22, 2021

What's the right way to think of the distinction between literal or tt here?

tts are meaningless and can be inserted into any token stream and blend into it without keeping any "parsing priorities".
E.g. if 1 + 2 is inserted into $interpolated * 3 as tts, then the result will be 1 + 2 * 3, resulting in 2 blending with * 3 into an expression.

Any other matchers are meaningful and must keep their meaning (or at least grouping) when inserted into other token streams.
E.g. if 1 + 2 is inserted into $interpolated * 3 as expr, then the result will be ⟪ 1 + 2 ⟫ * 3, so 1 + 2 keeps being an expression after re-parsing and its parts don't blend with tokens from the adjacent context.

We conservatively emit the Delimiter::None brackets for all matchers except tt to avoid their tokens partially blending with the context, but it's not always necessary.
For example, ident matcher is just a subset of tt, it always represents a single token, it cannot blend with anything (it's currently conservatively wrapped into Delimiter::None anyway, but I'd like to remove that wrapping).
The literal matcher is not always a single token, fortunately or not, it accepts things like - 1, which can be torn apart after insertion.
So the ⟪ - 1 ⟫ brackets are necessary at least in some cases, otherwise things like - 1 inserted into $literal () will parse incorrectly after insertion.
If 0.0 turns into 3 tokens (which I want to do, but it's not done yet), then it will requires the ⟪ 0.0 ⟫ brackets too, because without the brackets it will blend with the context in cases like tuple . 0.0, which it should not do because it must keep its meaning as a literal.

@petrochenkov
Copy link
Contributor

petrochenkov commented Dec 22, 2021

The most important part to realize is that non-terminal tokens do not exist during parsing.
Yes, they may exist in the current implementation because the current implementation is old, but the language rules should work as if they didn't exist.

Imagine that we pass pass all tokens through a proc macro before parsing, proc macros do not support AST pieces by design, then all interpolated tokens are turned into their corresponding token streams, possibly including Delimiter::None brackets.
The language syntax must then be defined in terms of such token streams, and never non-terminal tokens.

@petrochenkov
Copy link
Contributor

petrochenkov commented Dec 22, 2021

We'd expect this to not work if a 0 would be matched as an expr, right? Is there something fundamental about the literal vs the general expression here?

I don't see any difference between expr and literal matchers here, both can be potentially multi-token (or potentially single token).

If they are multi-token then they surely be guarded by Delimiter::None brackets during tokenization before parsing.
If they are single-token, then it's kind of a grey territory, there's no need to shield them, so the answer depends on the amount of conservativeness with which we add extra Delimiter::None brackets.

Right now we always add the brackets around all non-tt matchers, so both tuple.$literal and tuple.$expr are equivalent to tuple.⟪ 0 ⟫ (remember, nonterminal tokens do not exist) so if tuple.⟪ 0 ⟫ is syntactically correct, then both tuple.$literal and tuple.$expr must parse.

In theory we can omit the brackets from all single-token cases.
(This shouldn't even be a large breaking change, the problems usually arise when you add brackets and parsers don't expect them like #91166 (comment) says.)
In this case both tuple.$literal and tuple.$expr will also be equivalent, but now to tuple . 0 (again, nonterminal tokens do not exist + single-token matchers are not wrapped into brackets now), so again both tuple.$literal and tuple.$expr must parse.

Some hybrid schemes are also possible - we omit unnecessary Delimiter::Nones in some cases but not in others.
Then you can make literal and expr behave differently.
It just doesn't seem to be a very consistent approach.

@petrochenkov
Copy link
Contributor

From a "well it's just tokens" perspective, why should this work for 0 but not 0.0 when the 0.0 version works when matched as a tt? https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9c7eeb41783c34c656364b1b885e4fe7

See #91166 (comment) on the difference between "unstructured" tt and other matchers.

Should we add a new fieldname for use in macro_rules? It might be useful for things like matching struct definitions and literals too.

No opinion on this, I don't see anything harmful it it except for generally expanding the language surface.
Note: the fieldname matcher will always produce a single token, so it doesn't strictly require the Delimiter::None wrapping, similarly to ident.

@petrochenkov petrochenkov removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 22, 2021
@dtolnay
Copy link
Member Author

dtolnay commented Dec 22, 2021

The literal matcher is not always a single token, fortunately or not, it accepts things like - 1, which can be torn apart after insertion.
So the ⟪ - 1 ⟫ brackets are necessary at least in some cases, otherwise things like - 1 inserted into $literal () will parse incorrectly after insertion.

A None group containing two tokens -1 is not what I would want. IMO ideally $:ident would be 1:1 with TokenTree::Ident (i.e. passed to proc macros with no None around it ever) and $:literal would be 1:1 with TokenTree::Literal. Proc macro literals already support negatives just the same as $:literal does. For example you can create a negative literal single token with Literal::i8_unsuffixed(-1).

As long as nonterminal tokens still exist in the old parser implementation, I'd like to make them behave consistently with the above treatment of $:ident and $:literal. I think it's okay for fixes in that direction to land in the parser independently of whether the $:ident and $:literal nonterminals passed to a proc macro are still being wrapped in None.

@bors
Copy link
Contributor

bors commented Jan 19, 2022

☔ The latest upstream changes (presumably #93048) made this pull request unmergeable. Please resolve the merge conflicts.

Currently fails to build:

    error: unexpected token: `0`
      --> src/test/ui/macros/macro-interpolation.rs:19:16
       |
    LL |         $var . $field
       |                ^^^^^^
    ...
    LL |     let _ = field!(tuple.0);
       |             --------------- in this macro invocation
       |
       = note: this error originates in the macro `field` (in Nightly builds, run with -Z macro-backtrace for more info)

    error: macro expansion ignores token `0` and any following
      --> src/test/ui/macros/macro-interpolation.rs:19:16
       |
    LL |         $var . $field
       |                ^^^^^^
    ...
    LL |     let _ = field!(tuple.0);
       |             --------------- caused by the macro expansion here
       |
       = note: the usage of `field!` is likely invalid in expression context

    error: aborting due to 2 previous errors
@dtolnay
Copy link
Member Author

dtolnay commented Jan 19, 2022

@dtolnay
Copy link
Member Author

dtolnay commented Jan 19, 2022

Gonna mark this as blocked on at least #92392 to put $:literal on a path to using a single token always on the proc macro side. Personally I'd be fine landing this one first but I respect that folks want some guarantee that the macro_rules behavior and proc macro behavior on literals will be able to stay in sync.

@dtolnay dtolnay added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 19, 2022
@dtolnay
Copy link
Member Author

dtolnay commented Jan 25, 2022

I was able to work around this not working by duplicating all my macro input tokens: ($($tt)*) ($($tt)*) and then matching one set of them for $:literal and using the other set for the actual indexing expression:

(($expr:expr) ($dot:tt $literal:tt $($rest:tt)*) (. $unusable_literal:literal $($rest:tt)*)) => {
    $expr $dot $literal
};

(The full macro: https://github.com/dtolnay/anyhow/blob/1.0.53/src/ensure.rs#L310)

#92392 has been closed so I'll close this as well. I still think this should work but for now I'm unblocked so someone else will need to drive this. :)

@dtolnay dtolnay closed this Jan 25, 2022
@dtolnay dtolnay deleted the literal branch January 25, 2022 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants