New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(bug) Fix beginKeywords
to ignore . matches
#2434
Conversation
bda81f1
to
bebc9b7
Compare
beginKeywords
to ignore . matches
beginKeywords
to ignore . matchesbeginKeywords
to ignore . matches
75e5954
to
013ccff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understood all of it, but it looks reasonable.
I think if you understand what's going on in Wrapping a bunch of MultiRegexes was just the simplest way I could think of to accomplish the goal... but if someone knows a simpler way... the external API is trivial enough:
And actually this now allows for lazy-compiling of multi regexes... so for some grammars with a ton of complex modes it might be possible that some of those regex never even have to be compiled if those modes are never actually hit - so that's a nice small win. Would |
Tried to improve naming. |
Yes, it's a bit clearer this way. |
- split out complex multi-regex matching into discrete classes - new classes allow multiple regex matches at the same index position so if a one mode match is found unsuitable another mode matchcan be attempted - add private ` __onBegin` hook for mode matching (to allow rejecting `.` matches for `beginKeywords`)
Related: #2429 Resolves #2428
Attempts to restore as little as possible of the previous
beginKeywords
magic behavior as necessary just to prevent.keyword
orkeyword.
from matching. This is done via an internalonBegin
callback that essentially ignores the match if a.
is found.What behavior does this NOT restore? A failed match will NOT consume any content from the input stream, ie it will not advance the cursor.
Previously:
Now:
Ex:
Also previously the match would actually start at the
.
(when one was present), which would prevent any other rules from matching if there was a failed keyword match - that is no longer the case. The match now starts with the keyword itself and then looks backwards to see if there was a preceding.
or not.What else might still be an issue...
This new behavior means that when
beginKeywords
fails to matchkeywords
now has a chance to. I think this is new behavior, and correct behavior. No other place in the parser does a rule silently consume content for a non-match, and I don't thinkbeginKeywords
should do so either.MultiRegex
is just a re-factoring of the behavior that was there before into a class, allowing us to work with a large group of regexes as a single search unit.ResumableMultiRegex
wraps multiple MultiRegex objects to allow resuming a multi-regex search at the same cursor position, allowing for the possibility of multiple regexes matching, instead of just the first.See comments in source for more details