Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- ruby31.y: handle local variables as hash labels with omitted values #820

Merged

Conversation

iliabylich
Copy link
Collaborator

There's a new rule in 3.1 that allows hash value to be omitted if it's equal to key:

$ ./miniruby -ve 'p({ rand: })'
ruby 3.1.0dev (2021-09-24T13:56:38Z master 40a65030e5) [x86_64-darwin20]
{:rand=>0.6152290626750991}

We backported it in #818, but there's one case that we missed - local variables can also be used:

./miniruby -ve 'foo = 1; p({ foo: })'
ruby 3.1.0dev (2021-09-24T13:56:38Z master 40a65030e5) [x86_64-darwin20]
{:foo=>1}

This PR amends this logic.

And of course, there are implications like circular arguments validation that was missing:

$ /bin/cat ../ruby/test.rb
def m(a = { a: })
end

$ bin/ruby-parse --31 ../ruby/test.rb
../ruby/test.rb:1:13: error: circular argument reference a
../ruby/test.rb:1: def m(a = { a: })
../ruby/test.rb:1:             ^

$ ../ruby/miniruby -cv ../ruby/test.rb
ruby 3.1.0dev (2021-09-24T13:56:38Z master 40a65030e5) [x86_64-darwin20]
../ruby/test.rb:1: circular argument reference - a

@marcandre @koic @mbj @palkan Can I get a review of this API please? Are there any issues with multiple nodes having the same source location?

@palkan
Copy link
Contributor

palkan commented Sep 24, 2021

Are there any issues with multiple nodes having the same source location?

As soon as it's safe to use the fact that source locations are equal for both key and value nodes to detect implicit hashes, that would work for me.

(In my implementation, I used a separate node type; but I guess that's not an option for Parser 🙂).

@marcandre
Copy link
Contributor

marcandre commented Sep 24, 2021

Tricky.
Was it considered to have the location be nil instead?
The reservation I have with reusing the location (instead of nil) makes it so close to the normal form that it makes it very difficult to distinguish them.
Writing a method to know if a particular lvar is of such a form, one has to use the parent (not actually available), check that it is a pair and that the lvar in question is the second child, and finally compare the location with that of the first child.
This non-locality of the information doesn't seem ideal. I don't feel strongly about this, but I would intuitively think that the sym should have a location but the lvar/send/... shouldn't.

@marcandre
Copy link
Contributor

To comment further, in RuboCop we do have access to the parent, so whichever way this ends up, we can implement a method to check this (albeit ugly). Only risk factor is that a cop would rewrite the source without realizing this new case, and instead of crashing, and the resulting generated code could be incompatible. Maybe @koic or @dvandersluis can think of other impacts for RuboCop?

@iliabylich
Copy link
Collaborator Author

@marcandre There's a rule that all nodes have some location (at least expression_l), I'm pretty sure that changing it would have bigger impact than sharing location between two nodes.

There's always an alternative - creating a new node type 😄

@marcandre
Copy link
Contributor

Ah ah, new mode type sounds really bad indeed. 😬

Empty expression (just after the colon) would be an alternative.

@iliabylich
Copy link
Collaborator Author

Empty expression (just after the colon) would be an alternative.

That also would be a breaking change (node without expr_l). Also I don't know how I feel about "dummy" nodes in general, mostly because it splits nodes into two groups and it feels like this separation sits on top of node-type groups (i.e. this way we get two groups of types, dummy(typeA, typeB, ...) and real(typeC, typeD, ...)). I'd like to keep it flat.

I did quick research in other languages that support shorthand object field syntax.

JavaScript (Babel/parser) - the doc says that Object node has a list of members where each member is a sum type of ObjectProperty | ObjectMethod | SpreadElement where ObjectProperty has

key: Expression
shorthand: boolean;
value: Expression

The doc has T | null in many places, so I guess key and value are non-nullable, which means they share location. The flag is just a nice addition.

Rust (syn crate) - AST has a type called FieldValue that represents field in Foo { field } struct "instantiation", and it has the following fields:

pub member: Member,
pub colon_token: Option<Colon>,
pub expr: Expr,

Here's how it's used internally - if branch handles explicit (i.e. Foo { field: value }) key-value pairs, else branch handles Foo { field } syntax, and syn also copies location of the key to value.

At least we are not inventing anything new in this PR.

Another alternative that I see (and dislike a lot but still would like it be mentioned) is making pair.value nullable. I suppose that'd be a huge change for parser users, right?

I'm going to keep this PR open for several days, maybe someone can suggest a better approach.

@pocke
Copy link
Collaborator

pocke commented Sep 25, 2021

FYI, I found two differences between MRI and parser gem other than lvars. I'll create a new issue if you need it.

First, MRI accepts a constant as key/value but parser gem doesn't.

$ ruby -ve 'p({String:})' 
ruby 3.1.0dev (2021-09-24T23:41:02Z master 39a6bf5513) [x86_64-linux]
{:String=>String}

$ bin/ruby-parse --31 -e 'p({String:})' 
(send nil :p
  (hash
    (pair
      (sym :String)
      (send nil :String))))

Second, MRI rejects a label with ! or ?, but parser gem doesn't.

$ ruby -e p({foo!:})
-e:1: identifier foo! is not valid to get

$ ruby -e p({foo?:})
-e:1: identifier foo? is not valid to get

$ bin/ruby-parse --31 -e 'p({foo!:})'
(send nil :p
  (hash
    (pair
      (sym :foo!)
      (send nil :foo!))))

$ bin/ruby-parse --31 -e 'p({foo?:})'
(send nil :p
  (hash
    (pair
      (sym :foo?)
      (send nil :foo?))))

@mbj
Copy link
Collaborator

mbj commented Sep 27, 2021

Can I get a review of this API please? Are there any issues with multiple nodes having the same source location?

Unparser by definition does not use source locations and instead tries its best to generate concrete syntax "just" from the structure of the AST nodes. So the source location aspect does not concern me - at this point.

@iliabylich
Copy link
Collaborator Author

ok, time to finish it 😄

I've spent some time investigating what gettable function does in MRI and it's quite tricky because of how obfuscated it is:

  1. MRI calls TOK_INTERN macro (or a wrapper/helper function tokenize_ident) to attach id to a token. ID is basically a Ruby Symbol.
  2. this macro calls rb_intern3 to construct an ID that is internally represented as a tagged pointer (i.e. a pointer or a value depending on leading bytes that is &-checked every time it's dereferenced, constant known-at-compile-time symbols can be compared by value). This whole process is performed only for tokens that have variadic values like tCONSTANT, tIDENTIFIER, tLABEL and so on. Tokens like kCLASS don't need it because their value are constant and can be inferred based on a token type that is simply a flat number.
  3. rb_intern3 is defined in symbol.c and it calls intern_str -> rb_str_symname_type -> rb_enc_symname_type (and also -> enc_synmane_type_leading_chars) that attaches additional bitflags:
    • ID_GLOBAL for $-prefixed values
    • or ID_INSTANCE for @-prefixed values
    • or ID_CLASS for @@-prefixed values
    • or ID_CONST for values starting with capital letter
    • or ID_LOCAL otherwise
  4. Later MRI uses these flags to check what's the type of the token.val

We do the same thing in many places in Builder, I think it makes sense to mirror these checks as separate methods to simplify backporting in the future.

This gettable methods is called Builder::assignable in parser, so I'm going to add missing checks there.

Also I've noticed that this method checks for assignment to numparams in numblocks and errors if there are any, but we check it directly in .y file. I've found a bug:

proc { 1 in _1 }

MRI rejects this code but we accept it. It will be fixed by moving numparam check to the builder (wrapped with @version check, assignment to numparam is valid for Ruby < 27)

@iliabylich
Copy link
Collaborator Author

iliabylich commented Nov 19, 2021

@marcandre @koic @mbj @palkan @pocke Can I get a review please? Do you see any missing test cases?

@iliabylich
Copy link
Collaborator Author

Merging to unblock myself, feel free to post thoughts/requests here.

@iliabylich iliabylich merged commit 57bd371 into whitequark:master Nov 19, 2021
@iliabylich iliabylich deleted the lvar-in-hash-value-omission branch November 19, 2021 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants