Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mishandled comments with extended mode regexp? #66

Closed
owst opened this issue Sep 11, 2020 · 2 comments
Closed

Mishandled comments with extended mode regexp? #66

owst opened this issue Sep 11, 2020 · 2 comments

Comments

@owst
Copy link
Contributor

owst commented Sep 11, 2020

My understanding (based on the docs) is that the following are equivalent:

  /
    a # bc+
  /x

  /
    a #bc+
  /x

  /
    a# bc+
  /x

  /
    a#bc+
  /x

  /a # bc+/x
  /a #bc+/x
  /a# bc+/x
  /a#bc+/x

in all cases, the pattern should match only a single 'a' as whitespace and anything on a line following a # is ignored. Indeed it seems that Ruby (2.6.6) does indeed treat them as equivalent. However, it seems that regexp_parser does not, and only recognises the comment if:

  1. The pattern is across multiple lines (examples 1-4), and
  2. There is a space before the # (examples 1-2)

To demonstrate, run the following:

require 'regexp_parser'

p Regexp::Parser::VERSION

[
  /
    a # bc+
  /x,
  /
    a #bc+
  /x,
  /
    a# bc+
  /x,
  /
    a#bc+
  /x,
  /a # bc+/x,
  /a #bc+/x,
  /a# bc+/x,
  /a#bc+/x
].each do |pat|
  puts "Pattern: #{pat.inspect}"
  pat =~ 'abcc'
  puts "pat =~ 'abccc' last_match: #{Regexp.last_match}"
  puts Regexp::Parser.parse(pat).each_expression.map { |exp, _| exp.class.name.sub(/.*::/, "") }.join(', ')

  puts
end

prints:

"1.7.1"
Pattern: /
    a # bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, WhiteSpace, Comment, WhiteSpace

Pattern: /
    a #bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, WhiteSpace, Comment, WhiteSpace

Pattern: /
    a# bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, WhiteSpace, Literal, Literal, WhiteSpace

Pattern: /
    a#bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, Literal, WhiteSpace

Pattern: /a # bc+/x
pat =~ 'abccc' last_match: a
Literal, WhiteSpace, Literal, Literal

Pattern: /a #bc+/x
pat =~ 'abccc' last_match: a
Literal, WhiteSpace, Literal, Literal

Pattern: /a# bc+/x
pat =~ 'abccc' last_match: a
Literal, WhiteSpace, Literal, Literal

Pattern: /a#bc+/x
pat =~ 'abccc' last_match: a
Literal, Literal

Notice how all patterns match just 'a', but only the first two parse include a Comment expression when parsed.

Am I misunderstanding something, or is this a bug?

owst added a commit to owst/regexp_parser that referenced this issue Sep 11, 2020
jaynetics added a commit that referenced this issue Sep 12, 2020
Allow no-whitespace and single-line comments (#66)
@jaynetics
Copy link
Collaborator

@owst you were correct, this was a bug.

thank you for the excellent contribution!

i'll release a new version with your changes in the coming days.

@jaynetics
Copy link
Collaborator

released with v1.8.0

jaynetics added a commit that referenced this issue Sep 28, 2020
Issue #70.

The comment scanner was to greedy. In a way this bug always existed.

Comment-like patterns with a specific shape have always been scanned incorrectly in normal mode, e.g.

```ruby
/foo # \d
/
```

This was just very rare. Prior to the fix of issue #66 via PR #67, the comment scanner only fired for a limited, incomplete subset of valid comments like the one above. With the broadening of the scanner, this bug became much easier to hit upon.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants