Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare Emitter API to go public #3621

Closed
joshgoebel opened this issue Sep 13, 2022 · 9 comments
Closed

Prepare Emitter API to go public #3621

joshgoebel opened this issue Sep 13, 2022 · 9 comments
Assignees
Labels
enhancement An enhancement or new feature help welcome Could use help from community parser

Comments

@joshgoebel
Copy link
Member

joshgoebel commented Sep 13, 2022

In preparation for #3619 and #3620 I'd live to review out Emitter API and rename some things to better match our own internal naming. I think the original was a bit too specific with regards to internal implementations and the new version should be a little more abstract:

Old:

  • addKeyword(text, scope)
  • addText(text)
  • addSublanguage(emitter, subLanguageName)
  • finalize()
  • openNode(scope)
  • closeNode()
  • closeAllNodes()
  • toHTML()

New:

  • addText(text)
  • startScope(scope) (or pushScope?)
  • endScope() (or popScope? or finishScope?)
  • __addSublanguage(emitter, subLanguageName) - (may stay private)
  • finalize() - thoughts on naming?
  • toHTML()

Removed:

  • addKeyword(text, scope) - pull back into an internal helper
  • closeAllNodes() - Lets just let finalize be responsible for all cleanup.

For writing custom grammar parsers one would really only need to concern themselves with 3 API calls I think:

  • addText
  • startScope
  • endScope

@wooorm I know you had plenty of thoughts on this before... it's possible I'm a little more receptive now... I do plan on reducing a API a bit if we're going public (which I recall you pushed for previously).

@joshgoebel joshgoebel added enhancement An enhancement or new feature parser help welcome Could use help from community labels Sep 13, 2022
@joshgoebel
Copy link
Member Author

Considering pulling sublanguage fully private as well and we'd allow for language:xml scopes instead which would trigger the HTML output's special class='language-xml' rendering behavior... so it would be just another fancier scope name rather than a whole different node type (underneath).

We do need some way to alternative emitters to do this if they want to be fully compatible with some themes that depend on this naming system in the HTML.

@wooorm
Copy link

wooorm commented Sep 13, 2022

I mainly need addText, openNode, and closeNode (new names/slightly different, is fine), and I think addSublanguage and addKeyword can be sort-of merged into openNode?
For names, you could go with open, text, close?

As you’re sort-of talking about classes with sublanguages, you could also turn the dot notation into actual classes. I’m doing that manually now: https://github.com/wooorm/lowlight/blob/ebddb1040b8b4adc78abb502e59ddd378a56bd8b/lib/core.js#L267-L272.

Last, instead of a function-based API, you could also build an AST. Basically what I am using the emitter for. Others can transform that AST into whatever they want.
See: https://github.com/wooorm/lowlight/blob/ebddb1040b8b4adc78abb502e59ddd378a56bd8b/lib/core.js#L200.

@joshgoebel
Copy link
Member Author

addSublanguage and addKeyword can be sort-of merged into openNode?

Well, built on top of, yes... I'd move the functions back into the core library and out of the emitter public API.

As you’re sort-of talking about classes with sublanguages

Sort of, but everything is a scope until it hits HTML (and it might never)... the internal naming on that probably isn't 100% clear, but that's the boundary now. Classes only exist when doing HTML output. So at the emitter level we're always talking about scopes.

Last, instead of a function-based API, you could also build an AST.

I prefer an immutable API with a changeable data structure than the opposite... I think that's easier and more flexible to maintain and as you've seen it's just as easy to work with in practice. In the future we may go back to just joining strings rather than a tree - so I'm still not interested in solidifying that piece.

For names, you could go with open, text, close?

I'll consider this.

@joshgoebel
Copy link
Member Author

joshgoebel commented Sep 16, 2022

@foo123

The answer is both. As explained here the format used BNF+JSON, combines two standards: BNF format for grammar specification and JSON format (that includes the BNF format) as a versatile and ubiquitus data format.

Any interest in this? Instead of the middle man step of converting your BNF grammars to Highlight.js grammars if you have an existing BNF parser you could just use that to do the parsing and then emit the results directly into our output pipeline. I wonder if that would help simplify things on your end?

And then people would have another easy way to define HLJS grammars - using BNF. (well I guess technically your hybrid, but surely it'd be possible to compile BNF into BNF/JSON?)

I wonder how BNF parsers tend to do with 'getting back on track' when presented with wholly invalid data? That's often a thing with code snippets - incomplete context, typos, etc...

@foo123
Copy link

foo123 commented Sep 17, 2022

@joshgoebel as long as there are ways to bypass the default tokenizer and there is a way to provide the actual token (parsed with an external parser), my addon is fine. I don't have the time at this point to test it, but I will do it once it is live.

I wonder how BNF parsers tend to do with 'getting back on track' when presented with wholly invalid data? That's often a thing with code snippets - incomplete context, typos, etc...

This is part of error recovery. The approach can be simple or complex. A simple approach is discard everything (eg highlight as default text) until a valid sequence is found, then start again.

@joshgoebel
Copy link
Member Author

as long as there are ways to bypass the default tokenizer and there is a way to provide the actual token

That's what #3620 allows. It should land in 11.7 (as a beta feature) with a full release with v12.

@wooorm
Copy link

wooorm commented Apr 29, 2023

I think this landed now right?

I updated and I’m getting errors. Here’s how I migrated, how it went!

  • Runtime error of emitter.startScope is not a function, apparently it’s needed but no type error
  • Apparently it receives a name that is any? Why not string? Or string | undefined?
  • Let me just copy/paste what I had in openNode, except for casting name to string
  • endScope is also needed, makes sense, let’s do the same as for startScope
  • Hmm, error emitter.__addSublanguage is not a function. That seems private, apparently it’s needed.
  • Lets rename addSublanguage which I had, to __addSublanguage?
  • WORKS! Just some differences in Rust
  • Coverage says addKeyword, closeAllNodes, is no longer used, removing those.
  • OK, apparently openNode, closeNode are still needed, I’ll keep those an change startScope/endScope to call those!
  • Hmm, now TS says addKeyword, closeAllNodes, and addSublanguage are required, even tho they are never called: I’ll add a ts-expect-error for that.

All together: wooorm/lowlight@0ab194d

@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 29, 2023

If I make a branch/PR can you test it's types? I honestly didn't look at types the entire time I did this work... so I can go back and try to clean that up now.

#3774

@joshgoebel
Copy link
Member Author

Closing this out as complete by the release of 11.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature help welcome Could use help from community parser
Projects
None yet
Development

No branches or pull requests

4 participants