New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink fallbacks for formally incorrect grammar #63
Comments
the argument about usability and prevalence in other engines looks really convincing to me. regexp_parser seems to be used more and more on regular expressions encountered "in the wild", and there is no good workaround for these cases there. as far as rubocop goes, you might catch any generally speaking, the current limitation might push people to either not use regexp_parser for such cases, or to attempt to pre-escape their input with their own regexp-scanning implementation, or catch the error and unexpectedly do nothing. so i'm all on board. any opposition, @ammar? if there is no documentation it might be best to look at the onigmo code for details about the behavior. some things to keep in mind / investigate:
|
Thanks for the mention @jaynetics. I also think that is convincing for some cases. Also, for ruby at least, the supported variants of regexp have stabilized (for example: https://bugs.ruby-lang.org/issues/8133#note-5) I think the following cases are reasonable:
I have doubts about other empty or unbalanced cases, like I can take a stab at it this coming weekend. |
balanced curlies with content that doesn't match /a{2, 3}/ =~ 'a{2, 3}' # => 0
/a{2,3,4}/ =~ 'a{2,3,4}' # => 0 empty |
Treating a I haven't encountered that usage of My understanding of the concerns of the parser has evolved over time. I no longer see "validation" as one of them. These cases enforce that understanding. |
Hi, and thanks for the awesome gem!
Recently regexp_parser started to be used in Rubocop to check regexp redundancy.
That led to uncovering of what can be considered as a bug (rubocop bug: rubocop/rubocop#8083). When parsing regexps like this, for example:
/{.+}/
(which is valid Ruby regexp), regexp_parser fails (thinking that{}
is incorrect quantifier). Same is related to some other forms, like/]\[/
I found out that the matter was discussed at #15, with verdict being, that it
Actually, I believe that it is not "MRI quirk", but sane behavior of the Regexp parser, that some characters have special meaning only in context. The behavior about parsing
{something that is not a quantifier}
, and]
is consistent through:So, it seems that parser that fails on those cases becomes less useful than it might be.
The text was updated successfully, but these errors were encountered: