Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- lexer.rl: fix incompatible delimiters on percent literal #808

Conversation

pocke
Copy link
Collaborator

@pocke pocke commented Jul 10, 2021

CRuby only accepts ASCII characters except [A-Za-z0-9] as a delimiter
of percent literal, but the lexer accepts different characters.
For exmaple:

  • CRuby accepts %w^Dfoo^D, but parser didn't (note: ^D means 0x04)
  • CRuby reject %w1foo1, but parser accepts
  • CRuby reject %w★foo★, but parser accepts

This patch fixes the problems.

Investigation of CRuby

CRuby parses percent literals here: https://github.com/ruby/ruby/blob/6072239121360293dbd2ed607f16b6a11668999a/parse.y#L8718-L8815

ASCII delimiters

I confirmed Ruby 1.8 or greater accept all ASCII characters except alnum as delimiters with the following code.

test.rb

(0..127).each do |n|
  next if /[a-zA-Z0-9()<>{}\[\]]/ =~ n.chr
  eval "%q#{n.chr}foo#{n.chr}"
end
$ docker run -it --rm -v $(pwd)/test.rb:/tmp/test.rb rubylang/all-ruby env ALL_RUBY_SINCE=1.8 ./all-ruby /tmp/test.rb
ruby-1.8.0
...
ruby-2.0.0-p648
ruby-2.1.0-preview1   (eval):1: warning: encountered \r in middle of line, treated as a mere space
...
ruby-2.7.4            (eval):1: warning: encountered \r in middle of line, treated as a mere space
ruby-3.0.0-preview1
...
ruby-3.0.2

Number and multibyte delimiter

I also confirmed Ruby 1.8 or greater reject 1 and as delimiters with the following commands.

$ docker run -it --rm rubylang/all-ruby env ALL_RUBY_SINCE=1.8 ./all-ruby -ce '%q1foo1'
ruby-1.8.0            -e:1: unknown type of %string
                      %q1foo1
                         ^
                  exit 1
...
ruby-2.4.10           -e:1: unknown type of %string
                      %q1foo1
                         ^
                  exit 1
ruby-2.5.0-preview1   -e:1: unknown type of %string
                      %q1foo1
                      ^~~
                  exit 1
...
ruby-2.6.8            -e:1: unknown type of %string
                      %q1foo1
                      ^~~
                  exit 1
ruby-2.7.0-preview1   -e:1: unknown type of %string
                      %q1foo1
                      ^~~
                  exit 1
ruby-2.7.0-preview2   -e:1: unknown type of %string
                      %q1foo1
                      ^~~
                  exit 1
...
ruby-3.0.2            -e:1: unknown type of %string
                      %q1foo1
                      ^~~
                  exit 1
$ docker run -it --rm rubylang/all-ruby env ALL_RUBY_SINCE=1.8 ./all-ruby -ce '%q★foo★'
ruby-1.8.0            -e:1: Invalid char `\230' in expression
                      -e:1: Invalid char `\205' in expression
                  exit 2
ruby-1.8.1            -e:1: Invalid char `\230' in expression
                      -e:1: Invalid char `\205' in expression
                  exit 1
...
ruby-1.8.7-p374       -e:1: Invalid char `\230' in expression
                      -e:1: Invalid char `\205' in expression
                  exit 1
ruby-1.9.0-0          -e:1: unknown type of %string
                      %q★foo★
                         ^
                  exit 1
...
ruby-2.4.10           -e:1: unknown type of %string
                      %q★foo★
                         ^
                  exit 1
ruby-2.5.0-preview1   -e:1: unknown type of %string
                      %q★foo★
                      ^~~
                  exit 1
...
ruby-2.6.8            -e:1: unknown type of %string
                      %q★foo★
                      ^~~
                  exit 1
ruby-2.7.0-preview1   -e:1: unknown type of %string
                      %q★foo★
                      ^~~
                  exit 1
ruby-2.7.0-preview2   -e:1: unknown type of %string
                      %q★foo★
                      ^~~
                  exit 1
...
ruby-2.7.4            -e:1: unknown type of %string
                      %q★foo★
                      ^~~
                  exit 1
ruby-3.0.0-preview1   -e:1: invalid multibyte char (US-ASCII)
                  exit 1
...
ruby-3.0.2            -e:1: invalid multibyte char (US-ASCII)
                  exit 1

CRuby only accepts ASCII characters except `[A-Za-z0-9]` as a delimiter
of percent literal, but the lexer accepts different characters.
For exmaple:

* CRuby accepts `%w^Dfoo^D`, but parser didn't (note: `^D` means 0x04)
* CRuby reject `%w1foo1`, but parser accepts
* CRuby reject `%w★foo★`, but parser accepts

This patch fixes the problems.
=> {
type, delimiter = @source_buffer.slice(@ts).chr, tok[-1].chr
fgoto *push_literal(type, delimiter, @ts);
};

# %w(we are the people)
'%' [A-Za-z]+ c_any
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the + is redundant, so I removed it. The test cases are all green so I think it is ok. Please tell me if the + is necessary here.

Copy link
Collaborator

@iliabylich iliabylich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@iliabylich iliabylich merged commit a48a8f6 into whitequark:master Jul 12, 2021
@pocke pocke deleted the __lexer_rl__fix_incompatible_delimiters_on_percent_literal branch July 12, 2021 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants