Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint/MixedRegexpCaptureTypes and "premature end of pattern" #8083

Closed
zverok opened this issue Jun 2, 2020 · 8 comments · Fixed by #8093
Closed

Lint/MixedRegexpCaptureTypes and "premature end of pattern" #8083

zverok opened this issue Jun 2, 2020 · 8 comments · Fixed by #8093
Labels

Comments

@zverok
Copy link
Contributor

zverok commented Jun 2, 2020

New cops, relying on Regexp parser might fell on something that is valid Ruby regexp. My example:

response.body.scan(/data = ({"words":.+}}}[^}]*})/m)

Here we are doing a pretty nasty thing (in specs): scanning HTML-with-JS rendered view, find one JSON object there (and then parse and check this object). But this is still valid Ruby, and it "just works".

When Lint/MixedRegexpCaptureTypes tries to scan it, it falls with the "Premature end of pattern" exception in the scanner.

The deal is, formally, in Ruby regexps {} is for quantifiers (for ex., x{15}), but if it is not a valid quantifier:

regexp-parser is kinda "theoretically right", but it leads to what is perceived as a "bug" by Rubocop users. It seems like one of the solutions below might be reasonable:

  • catch such errors in cop, and report them specially (something like "non-standard-conforming Regexp")
  • persuade regexp-parser's to follow all Ruby quirks, either by default, or maybe in special "compatibility" mode, maybe even reporting non-standard stuff as warnings
$ [bundle exec] rubocop -V
0.85.0 (using Parser 2.7.1.3, rubocop-ast 0.0.3, running on ruby 2.6.3 x86_64-linux)
@aried3r
Copy link

aried3r commented Jun 2, 2020

We are facing a similar issue. I've created a SSCCE as well.

# rubocop_regexp.rb
'Hello'.gsub(/]\[/, "_")
$ rubocop --only Lint/MixedRegexpCaptureTypes rubocop_regexp.rb                                                                                                                   
Inspecting 1 file
An error occurred while Lint/MixedRegexpCaptureTypes cop was inspecting /private/tmp/rubocop_regexp.rb:1:13.
To see the complete backtrace run rubocop -d.
.

1 file inspected, no offenses detected

1 error occurred:
An error occurred while Lint/MixedRegexpCaptureTypes cop was inspecting /private/tmp/rubocop_regexp.rb:1:13.
Errors are usually caused by RuboCop bugs.

Relevant part of the stacktrace:

Premature end of pattern at [
.../2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:1209:in `scan'
.../2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:71:in `scan'
.../2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:28:in `lex'
.../2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:15:in `lex'
.../2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:38:in `parse'
.../2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:22:in `parse'

And here a working example in Rubular.

The fix for us would be to write /\]\[/, but it was not needed until now because of how Ruby treats regular expressions.

@koic koic added the bug label Jun 2, 2020
@bbatsov
Copy link
Collaborator

bbatsov commented Jun 2, 2020

catch such errors in cop, and report them specially (something like "non-standard-conforming Regexp")

I guess this can be some other cop, as that's an orthogonal concern issues in any specific cop. For the cops - I think it'd be best to just ignore regular expressions that raise such an error in the cops.

@thetizzo
Copy link

thetizzo commented Jun 2, 2020

Another example that is raising this error for us in case it is needed:

Regex: /({.+})/
Error Message with stack:

An error occurred while Lint/MixedRegexpCaptureTypes cop was inspecting <file name>
Premature end of pattern at ({.+})
.../2.6.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:1209:in `scan'
.../2.6.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:71:in `scan'
.../2.6.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:28:in `lex'
.../2.6.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:15:in `lex'
.../2.6.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:38:in `parse'
.../2.6.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:22:in `parse'
.../2.6.0/gems/rubocop-0.85.0/lib/rubocop/cop/lint/mixed_regexp_capture_types.rb:30:in `on_regexp'

@marcandre
Copy link
Contributor

I hope the gem can be modified. JS (and I suspect many other languages) parse regexp this way and it doesn't seem such a quirk to me.

@lucascaton
Copy link

lucascaton commented Jun 3, 2020

I believe this error (exception) is related:

An error occurred while Lint/MixedRegexpCaptureTypes cop was inspecting <file>

This is the line it's scanning:

body.gsub(/( ?){{ ?first_name( ?\| ?default: '.+')? ?}}/) do

I'm using version:

0.85.0 (using Parser 2.7.1.3, rubocop-ast 0.0.3, running on ruby 2.7.1 x86_64-darwin19)


This might be helpful:

$ rubocop -d file.rb

Output

For .: configuration from ./.rubocop.yml
configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-performance-1.6.0/config/default.yml
configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-performance-1.6.0/config/default.yml
Default configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-0.85.0/config/default.yml
configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-rails-2.5.2/config/default.yml
configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-rails-2.5.2/config/default.yml
configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-rspec-1.39.0/config/default.yml
configuration from ~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/rubocop-rspec-1.39.0/config/default.yml
Inheriting configuration from ./.rubocop_todo.yml
Inspecting 1 file
Scanning ./file.rb
An error occurred while Lint/MixedRegexpCaptureTypes cop was inspecting ./file.rb:21:14.
Premature end of pattern at ){{ ?first_name( ?\| ?default: '.+')? ?}}
~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:1209:in `scan'
~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:71:in `scan'
~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:28:in `lex'
~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:15:in `lex'
~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:38:in `parse'
~/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:22:in `parse'

@zverok
Copy link
Contributor Author

zverok commented Jun 3, 2020

I created ammar/regexp_parser#63 for regexp_parser. If nobody will act on it till weekend, I'll try looking at it myself.

@TonyArra
Copy link

TonyArra commented Jun 3, 2020

hit the same issue:

An error occurred while Lint/MixedRegexpCaptureTypes cop was inspecting [filename]

Premature end of pattern at ^{(.*)}$

~/.rvm/gems/ruby-2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:1209:in `scan'
~/.rvm/gems/ruby-2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/scanner.rb:71:in `scan'
~/.rvm/gems/ruby-2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:28:in `lex'
~/.rvm/gems/ruby-2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/lexer.rb:15:in `lex'
~/.rvm/gems/ruby-2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:38:in `parse'
~/.rvm/gems/ruby-2.7.0/gems/regexp_parser-1.7.0/lib/regexp_parser/parser.rb:22:in `parse'
~/.rvm/gems/ruby-2.7.0/gems/rubocop-0.85.0/lib/rubocop/cop/lint/mixed_regexp_capture_types.rb:30:in `on_regexp'

koic added a commit to koic/rubocop that referenced this issue Jun 3, 2020
Fixes rubocop#8083.

This PR fixes an error for `Lint/MixedRegexpCaptureTypes` cop
when using a regular expression that cannot be processed by regexp_parser gem.
https://github.com/ammar/regexp_parser

Probably false negatives will be made, however false negatives will be
reduced when a processing pattern of regexp_parser increases.

So this PR relies on regexp_parser for error resolution.
@jcraigk
Copy link

jcraigk commented Jun 3, 2020

Same issue on a regexp that looks like this: /{id: \d+, text: 'Attribute \d+' }/

0.85.0 (using Parser 2.7.1.3, rubocop-ast 0.0.3, running on ruby 2.6.6 x86_64-darwin19)

bbatsov pushed a commit that referenced this issue Jun 3, 2020
Fixes #8083.

This PR fixes an error for `Lint/MixedRegexpCaptureTypes` cop
when using a regular expression that cannot be processed by regexp_parser gem.
https://github.com/ammar/regexp_parser

Probably false negatives will be made, however false negatives will be
reduced when a processing pattern of regexp_parser increases.

So this PR relies on regexp_parser for error resolution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants