Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(parser) new highlight() API #3053

Merged
merged 9 commits into from Mar 20, 2021

Conversation

joshgoebel
Copy link
Member

@joshgoebel joshgoebel commented Mar 17, 2021

Resolves #2277.

Changes

  • Implements the new highlight(code, options = {}) API
  • language and ignoreIllegals are options now (vs params)
  • continuation is for internal use only and hence not supported any longer

Compatibility:

  • v10.7 supports BOTH old and new APIs fully.
  • v11 will support both as well (to ease migrations) but drop continuation support from the old API.
  • v12 will drop old API support entirely.

Checklist

  • Added markup tests, or they don't apply here because...
  • Updated the changelog at CHANGES.md

@joshgoebel joshgoebel requested a review from allejo March 17, 2021 17:58
@joshgoebel joshgoebel changed the title (chore) new highlight() API (end) new highlight() API Mar 17, 2021
@joshgoebel joshgoebel changed the title (end) new highlight() API enh(parser) new highlight() API Mar 17, 2021
@joshgoebel joshgoebel self-assigned this Mar 18, 2021
@joshgoebel joshgoebel added this to the 10.7 milestone Mar 18, 2021
Base automatically changed from master to main March 18, 2021 17:26
Copy link
Member

@allejo allejo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to output a deprecation message for this?

@joshgoebel joshgoebel merged commit bfb5a59 into highlightjs:main Mar 20, 2021
@paladox
Copy link
Member

paladox commented Apr 7, 2021

I'm wondering if we could add support for continuation again? We use it for the gerrit project specifically here https://github.com/GerritCodeReview/gerrit/blob/master/polygerrit-ui/app/elements/diff/gr-syntax-layer/gr-syntax-layer.ts#L445.

@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 8, 2021

This has been removed for many different reasons.

  • Line by line driving is fundamentally broken. Grammars rules are not guaranteed to break cleanly at line boundaries - and some most definitely do not.
  • It was added to our documentation in 2019 that this is not intended to be used to for line-by-line highlighting (93badf7) [as soon as I found out people were doing that].
  • Technically it's leaking parser internals.
  • It's not intended to "drive the parser" (line by line or otherwise).
  • It is a detail of our internal sublanguage implementation - one that's likely to change in the future.

Many of the things we'd like to investigate in v11 may require deep internal changes. We might switch from an iterative parser to a completely recursive parser. There is rethinking sublanguages (#1263), a potential new back-tracking parser (#1140), and more. Just because a line has been processed may not even mean it's processed. We might hit a rule mismatch a few lines later and then rewind and try again with an entirely different matching strategy.

To allow us to iterate as freely as possible without worrying about breaking changes that means our internals need to be fully internal. As such continuation is gone... and _top becomes private API. No state in, no state out.


To do line-by-line content 100% safely and correctly one should instead post-process the final output (which can be done with ~15 lines of JS or so and a small stack). See my stackoverflow answer. There are other options with the private Emitter API, but use at your own risk. Post-processing the final text output is the easy and most stable choice and what most implementors should do.

@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 8, 2021

Someone should really write a line by line plugin using after:highlight that wraps this up nicely and then we could just point everyone to that.

Sample ~ implementation: https://github.com/hexojs/hexo-util/pull/246/files#diff-23bd09b492fb22d0b078517e838293bfa0031ece18b7bfba1adf706d5b9592d0R126

One could easily alter it to return an array or whatever, etc.

@paladox
Copy link
Member

paladox commented Apr 8, 2021

@joshgoebel I guess it should be:

result = this.hljs.highlight(
        baseLine,
        {
            language: this.baseLanguage,
            ignoreIllegals: true
        }
      );

result.value = result.value.split('\n').map(line => {
     const prepend = tokenStack.map(token => `<span class="${token}">`).join('');
     const matches = line.matchAll(/(<span class="(.*?)">|<\/span>)/g);
     for (const match of matches) {
       if (match[0] === '</span>') tokenStack.shift();
       else tokenStack.unshift(match[2]);
     }
     const append = '</span>'.repeat(tokenStack.length);
     return prepend + line + append;
   }).join('\n');
this.baseRanges.push(this._rangesFromString(result.value, rangesCache));

?

@paladox
Copy link
Member

paladox commented Apr 8, 2021

@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 8, 2021

You tell me...? I don't completely follow larger context of what you are doing exactly or what baseLine and revisionLine are... if you want access to the individual lines quickly you probably shouldn't be rejoining but rather just leave it as an array so you could slice/splice, etc... the code was just an example of how easy it is to "fix up" the broken tags after splitting on \n.

@joshgoebel
Copy link
Member Author

@paladox On an unrelated note: https://github.com/gerrit-review/gerrit/blob/master/polygerrit-ui/app/elements/diff/gr-syntax-layer/gr-syntax-layer.js#L349

If these are still actual issues today I'd love to see issues filed and perhaps we can fix them so you don't have to deal with them downstream.

@joshgoebel joshgoebel deleted the new_highlight_api branch April 8, 2021 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propose: Improve highlight API
3 participants