(bug) Fix `beginKeywords` to ignore . matches #2434

joshgoebel · 2020-02-27T22:42:02Z

Related: #2429 Resolves #2428

Attempts to restore as little as possible of the previous beginKeywords magic behavior as necessary just to prevent .keyword or keyword. from matching. This is done via an internal onBegin callback that essentially ignores the match if a . is found.

What behavior does this NOT restore? A failed match will NOT consume any content from the input stream, ie it will not advance the cursor.

Previously:

       v Parse cursor is here after failure to match.
A.class

Now:

  v Parse cursor is here (or earlier) after failure to match.
A.class

Ex:

    contains: [
      { beginKeywords: "class" },
      // this rule will still match class if the above rules fails to (because of a `.`)
      { begin: "class", className: "found" }
    ]

Also previously the match would actually start at the . (when one was present), which would prevent any other rules from matching if there was a failed keyword match - that is no longer the case. The match now starts with the keyword itself and then looks backwards to see if there was a preceding . or not.

What else might still be an issue...

This new behavior means that when beginKeywords fails to match keywords now has a chance to. I think this is new behavior, and correct behavior. No other place in the parser does a rule silently consume content for a non-match, and I don't think beginKeywords should do so either.

MultiRegex is just a re-factoring of the behavior that was there before into a class, allowing us to work with a large group of regexes as a single search unit.
ResumableMultiRegex wraps multiple MultiRegex objects to allow resuming a multi-regex search at the same cursor position, allowing for the possibility of multiple regexes matching, instead of just the first.

See comments in source for more details

egor-rogov

I don't think I understood all of it, but it looks reasonable.

joshgoebel · 2020-03-02T01:55:41Z

I think if you understand what's going on in highlight.js you understand the gist of it. I wish I could think of better names though. It's weird now because the ResumableMultiRegex (or whatever I called it, lol) has not only an index now but also a "which regex" to begin searching with index (which I called startAt), since now it's resumable at the same position to later regexes in the stack can be tried.

Wrapping a bunch of MultiRegexes was just the simplest way I could think of to accomplish the goal... but if someone knows a simpler way... the external API is trivial enough:

exec(string)
matchAt=
lastIndex=

And actually this now allows for lazy-compiling of multi regexes... so for some grammars with a ton of complex modes it might be possible that some of those regex never even have to be compiled if those modes are never actually hit - so that's a nice small win.

Would regexIndex be clearer?

joshgoebel · 2020-03-02T02:28:45Z

Tried to improve naming.

egor-rogov · 2020-03-02T19:44:15Z

Yes, it's a bit clearer this way.

- split out complex multi-regex matching into discrete classes - new classes allow multiple regex matches at the same index position so if a one mode match is found unsuitable another mode matchcan be attempted - add private ` __onBegin` hook for mode matching (to allow rejecting `.` matches for `beginKeywords`)

joshgoebel force-pushed the keywords_fix branch 3 times, most recently from bda81f1 to bebc9b7 Compare February 27, 2020 23:09

joshgoebel changed the title ~~WIP: (bug) Fix beginKeywords to ignore . matches~~ WIP: Fix beginKeywords to ignore . matches Feb 27, 2020

joshgoebel changed the title ~~WIP: Fix beginKeywords to ignore . matches~~ (bug) Fix beginKeywords to ignore . matches Feb 27, 2020

joshgoebel requested review from egor-rogov and allejo February 27, 2020 23:23

joshgoebel added bug parser labels Feb 28, 2020

joshgoebel force-pushed the keywords_fix branch 3 times, most recently from 75e5954 to 013ccff Compare February 28, 2020 18:27

joshgoebel added this to the 10.0 milestone Feb 29, 2020

egor-rogov mentioned this pull request Feb 29, 2020

(java) .class should not detect as keyword #2428

Closed

egor-rogov approved these changes Mar 1, 2020

View reviewed changes

joshgoebel added 13 commits March 1, 2020 21:01

nicer more modular solution

3cfa915

explain abort vs ignore vs skip

ce4b7a2

MultiMatcher gets its own class

461ce6d

deeper down the rabbit hole

9a97ae7

document

36c72fe

tweaks

0910304

rename api

7f5a087

add test

c1bdbd5

add docs

8d3656c

simplify code

034057d

update docs

0f71edd

reset regex index when stuck

e5bfaf2

DRY

e09c990

joshgoebel force-pushed the keywords_fix branch from 013ccff to e09c990 Compare March 2, 2020 02:05

try to improve clarity with naming

e137079

joshgoebel force-pushed the keywords_fix branch from 9b0fcb5 to e137079 Compare March 2, 2020 02:24

update changelog

a242267

joshgoebel merged commit 8c248fd into highlightjs:master Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(bug) Fix `beginKeywords` to ignore . matches #2434

(bug) Fix `beginKeywords` to ignore . matches #2434

joshgoebel commented Feb 27, 2020 •

edited

egor-rogov left a comment

joshgoebel commented Mar 2, 2020 •

edited

joshgoebel commented Mar 2, 2020

egor-rogov commented Mar 2, 2020

(bug) Fix beginKeywords to ignore . matches #2434

(bug) Fix beginKeywords to ignore . matches #2434

Conversation

joshgoebel commented Feb 27, 2020 • edited

egor-rogov left a comment

Choose a reason for hiding this comment

joshgoebel commented Mar 2, 2020 • edited

joshgoebel commented Mar 2, 2020

egor-rogov commented Mar 2, 2020

(bug) Fix `beginKeywords` to ignore . matches #2434

(bug) Fix `beginKeywords` to ignore . matches #2434

joshgoebel commented Feb 27, 2020 •

edited

joshgoebel commented Mar 2, 2020 •

edited