Mishandled comments with extended mode regexp? #66

owst · 2020-09-11T21:09:32Z

My understanding (based on the docs) is that the following are equivalent:

  /
    a # bc+
  /x

  /
    a #bc+
  /x

  /
    a# bc+
  /x

  /
    a#bc+
  /x

  /a # bc+/x
  /a #bc+/x
  /a# bc+/x
  /a#bc+/x

in all cases, the pattern should match only a single 'a' as whitespace and anything on a line following a # is ignored. Indeed it seems that Ruby (2.6.6) does indeed treat them as equivalent. However, it seems that regexp_parser does not, and only recognises the comment if:

The pattern is across multiple lines (examples 1-4), and
There is a space before the # (examples 1-2)

To demonstrate, run the following:

require 'regexp_parser'

p Regexp::Parser::VERSION

[
  /
    a # bc+
  /x,
  /
    a #bc+
  /x,
  /
    a# bc+
  /x,
  /
    a#bc+
  /x,
  /a # bc+/x,
  /a #bc+/x,
  /a# bc+/x,
  /a#bc+/x
].each do |pat|
  puts "Pattern: #{pat.inspect}"
  pat =~ 'abcc'
  puts "pat =~ 'abccc' last_match: #{Regexp.last_match}"
  puts Regexp::Parser.parse(pat).each_expression.map { |exp, _| exp.class.name.sub(/.*::/, "") }.join(', ')

  puts
end

prints:

"1.7.1"
Pattern: /
    a # bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, WhiteSpace, Comment, WhiteSpace

Pattern: /
    a #bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, WhiteSpace, Comment, WhiteSpace

Pattern: /
    a# bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, WhiteSpace, Literal, Literal, WhiteSpace

Pattern: /
    a#bc+
  /x
pat =~ 'abccc' last_match: a
WhiteSpace, Literal, Literal, WhiteSpace

Pattern: /a # bc+/x
pat =~ 'abccc' last_match: a
Literal, WhiteSpace, Literal, Literal

Pattern: /a #bc+/x
pat =~ 'abccc' last_match: a
Literal, WhiteSpace, Literal, Literal

Pattern: /a# bc+/x
pat =~ 'abccc' last_match: a
Literal, WhiteSpace, Literal, Literal

Pattern: /a#bc+/x
pat =~ 'abccc' last_match: a
Literal, Literal

Notice how all patterns match just 'a', but only the first two parse include a Comment expression when parsed.

Am I misunderstanding something, or is this a bug?

The text was updated successfully, but these errors were encountered:

Allow no-whitespace and single-line comments (#66)

jaynetics · 2020-09-12T09:59:43Z

@owst you were correct, this was a bug.

thank you for the excellent contribution!

i'll release a new version with your changes in the coming days.

jaynetics · 2020-09-20T10:21:34Z

released with v1.8.0

Issue #70. The comment scanner was to greedy. In a way this bug always existed. Comment-like patterns with a specific shape have always been scanned incorrectly in normal mode, e.g. ```ruby /foo # \d / ``` This was just very rare. Prior to the fix of issue #66 via PR #67, the comment scanner only fired for a limited, incomplete subset of valid comments like the one above. With the broadening of the scanner, this bug became much easier to hit upon.

owst added a commit to owst/regexp_parser that referenced this issue Sep 11, 2020

Allow no-whitespace and single-line comments (ammar#66)

6085eb5

jaynetics added a commit that referenced this issue Sep 12, 2020

Merge pull request #67 from owst/allow_no_whitespace_comments

58a00d6

Allow no-whitespace and single-line comments (#66)

jaynetics closed this as completed Sep 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mishandled comments with extended mode regexp? #66

Mishandled comments with extended mode regexp? #66

owst commented Sep 11, 2020

jaynetics commented Sep 12, 2020

jaynetics commented Sep 20, 2020

Mishandled comments with extended mode regexp? #66

Mishandled comments with extended mode regexp? #66

Comments

owst commented Sep 11, 2020

jaynetics commented Sep 12, 2020

jaynetics commented Sep 20, 2020