Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(bug) Fix beginKeywords to ignore . matches #2434

Merged
merged 15 commits into from Mar 2, 2020

Conversation

joshgoebel
Copy link
Member

@joshgoebel joshgoebel commented Feb 27, 2020

Related: #2429 Resolves #2428

Attempts to restore as little as possible of the previous beginKeywords magic behavior as necessary just to prevent .keyword or keyword. from matching. This is done via an internal onBegin callback that essentially ignores the match if a . is found.

What behavior does this NOT restore? A failed match will NOT consume any content from the input stream, ie it will not advance the cursor.

Previously:

       v Parse cursor is here after failure to match.
A.class

Now:

  v Parse cursor is here (or earlier) after failure to match.
A.class

Ex:

    contains: [
      { beginKeywords: "class" },
      // this rule will still match class if the above rules fails to (because of a `.`)
      { begin: "class", className: "found" }
    ]

Also previously the match would actually start at the . (when one was present), which would prevent any other rules from matching if there was a failed keyword match - that is no longer the case. The match now starts with the keyword itself and then looks backwards to see if there was a preceding . or not.


What else might still be an issue...

This new behavior means that when beginKeywords fails to match keywords now has a chance to. I think this is new behavior, and correct behavior. No other place in the parser does a rule silently consume content for a non-match, and I don't think beginKeywords should do so either.


  • MultiRegex is just a re-factoring of the behavior that was there before into a class, allowing us to work with a large group of regexes as a single search unit.
  • ResumableMultiRegex wraps multiple MultiRegex objects to allow resuming a multi-regex search at the same cursor position, allowing for the possibility of multiple regexes matching, instead of just the first.

See comments in source for more details

@joshgoebel joshgoebel force-pushed the keywords_fix branch 3 times, most recently from bda81f1 to bebc9b7 Compare February 27, 2020 23:09
@joshgoebel joshgoebel changed the title WIP: (bug) Fix beginKeywords to ignore . matches WIP: Fix beginKeywords to ignore . matches Feb 27, 2020
@joshgoebel joshgoebel changed the title WIP: Fix beginKeywords to ignore . matches (bug) Fix beginKeywords to ignore . matches Feb 27, 2020
@joshgoebel joshgoebel force-pushed the keywords_fix branch 3 times, most recently from 75e5954 to 013ccff Compare February 28, 2020 18:27
@joshgoebel joshgoebel added this to the 10.0 milestone Feb 29, 2020
Copy link
Collaborator

@egor-rogov egor-rogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understood all of it, but it looks reasonable.

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 2, 2020

I think if you understand what's going on in highlight.js you understand the gist of it. I wish I could think of better names though. It's weird now because the ResumableMultiRegex (or whatever I called it, lol) has not only an index now but also a "which regex" to begin searching with index (which I called startAt), since now it's resumable at the same position to later regexes in the stack can be tried.

Wrapping a bunch of MultiRegexes was just the simplest way I could think of to accomplish the goal... but if someone knows a simpler way... the external API is trivial enough:

  • exec(string)
  • matchAt=
  • lastIndex=

And actually this now allows for lazy-compiling of multi regexes... so for some grammars with a ton of complex modes it might be possible that some of those regex never even have to be compiled if those modes are never actually hit - so that's a nice small win.

Would regexIndex be clearer?

@joshgoebel
Copy link
Member Author

Tried to improve naming.

@egor-rogov
Copy link
Collaborator

Yes, it's a bit clearer this way.

@joshgoebel joshgoebel merged commit 8c248fd into highlightjs:master Mar 2, 2020
joshgoebel added a commit that referenced this pull request Mar 3, 2020
- split out complex multi-regex matching into discrete classes
- new classes allow multiple regex matches at the same index position so if
  a one mode match is found unsuitable another mode matchcan be attempted
- add private ` __onBegin` hook for mode matching 
  (to allow rejecting `.` matches for `beginKeywords`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(java) .class should not detect as keyword
2 participants