Prepare Emitter API to go public #3621

joshgoebel · 2022-09-13T15:47:46Z

In preparation for #3619 and #3620 I'd live to review out Emitter API and rename some things to better match our own internal naming. I think the original was a bit too specific with regards to internal implementations and the new version should be a little more abstract:

Old:

addKeyword(text, scope)
addText(text)
addSublanguage(emitter, subLanguageName)
finalize()
openNode(scope)
closeNode()
closeAllNodes()
toHTML()

New:

addText(text)
startScope(scope) (or pushScope?)
endScope() (or popScope? or finishScope?)
__addSublanguage(emitter, subLanguageName) - (may stay private)
finalize() - thoughts on naming?
toHTML()

Removed:

~~addKeyword(text, scope)~~ - pull back into an internal helper
~~closeAllNodes()~~ - Lets just let finalize be responsible for all cleanup.

For writing custom grammar parsers one would really only need to concern themselves with 3 API calls I think:

addText
startScope
endScope

@wooorm I know you had plenty of thoughts on this before... it's possible I'm a little more receptive now... I do plan on reducing a API a bit if we're going public (which I recall you pushed for previously).

The text was updated successfully, but these errors were encountered:

joshgoebel · 2022-09-13T16:49:00Z

Considering pulling sublanguage fully private as well and we'd allow for language:xml scopes instead which would trigger the HTML output's special class='language-xml' rendering behavior... so it would be just another fancier scope name rather than a whole different node type (underneath).

We do need some way to alternative emitters to do this if they want to be fully compatible with some themes that depend on this naming system in the HTML.

wooorm · 2022-09-13T17:12:21Z

I mainly need addText, openNode, and closeNode (new names/slightly different, is fine), and I think addSublanguage and addKeyword can be sort-of merged into openNode?
For names, you could go with open, text, close?

As you’re sort-of talking about classes with sublanguages, you could also turn the dot notation into actual classes. I’m doing that manually now: https://github.com/wooorm/lowlight/blob/ebddb1040b8b4adc78abb502e59ddd378a56bd8b/lib/core.js#L267-L272.

Last, instead of a function-based API, you could also build an AST. Basically what I am using the emitter for. Others can transform that AST into whatever they want.
See: https://github.com/wooorm/lowlight/blob/ebddb1040b8b4adc78abb502e59ddd378a56bd8b/lib/core.js#L200.

joshgoebel · 2022-09-13T17:19:17Z

addSublanguage and addKeyword can be sort-of merged into openNode?

Well, built on top of, yes... I'd move the functions back into the core library and out of the emitter public API.

As you’re sort-of talking about classes with sublanguages

Sort of, but everything is a scope until it hits HTML (and it might never)... the internal naming on that probably isn't 100% clear, but that's the boundary now. Classes only exist when doing HTML output. So at the emitter level we're always talking about scopes.

Last, instead of a function-based API, you could also build an AST.

I prefer an immutable API with a changeable data structure than the opposite... I think that's easier and more flexible to maintain and as you've seen it's just as easy to work with in practice. In the future we may go back to just joining strings rather than a tree - so I'm still not interested in solidifying that piece.

For names, you could go with open, text, close?

I'll consider this.

joshgoebel · 2022-09-16T19:27:50Z

@foo123

The answer is both. As explained here the format used BNF+JSON, combines two standards: BNF format for grammar specification and JSON format (that includes the BNF format) as a versatile and ubiquitus data format.

Any interest in this? Instead of the middle man step of converting your BNF grammars to Highlight.js grammars if you have an existing BNF parser you could just use that to do the parsing and then emit the results directly into our output pipeline. I wonder if that would help simplify things on your end?

And then people would have another easy way to define HLJS grammars - using BNF. (well I guess technically your hybrid, but surely it'd be possible to compile BNF into BNF/JSON?)

I wonder how BNF parsers tend to do with 'getting back on track' when presented with wholly invalid data? That's often a thing with code snippets - incomplete context, typos, etc...

foo123 · 2022-09-17T08:02:33Z

@joshgoebel as long as there are ways to bypass the default tokenizer and there is a way to provide the actual token (parsed with an external parser), my addon is fine. I don't have the time at this point to test it, but I will do it once it is live.

I wonder how BNF parsers tend to do with 'getting back on track' when presented with wholly invalid data? That's often a thing with code snippets - incomplete context, typos, etc...

This is part of error recovery. The approach can be simple or complex. A simple approach is discard everything (eg highlight as default text) until a valid sequence is found, then start again.

joshgoebel · 2022-09-17T11:53:47Z

as long as there are ways to bypass the default tokenizer and there is a way to provide the actual token

That's what #3620 allows. It should land in 11.7 (as a beta feature) with a full release with v12.

wooorm · 2023-04-29T18:05:58Z

I think this landed now right?

I updated and I’m getting errors. Here’s how I migrated, how it went!

Runtime error of emitter.startScope is not a function, apparently it’s needed but no type error
Apparently it receives a name that is any? Why not string? Or string | undefined?
Let me just copy/paste what I had in openNode, except for casting name to string
endScope is also needed, makes sense, let’s do the same as for startScope
Hmm, error emitter.__addSublanguage is not a function. That seems private, apparently it’s needed.
Lets rename addSublanguage which I had, to __addSublanguage?
WORKS! Just some differences in Rust
Coverage says addKeyword, closeAllNodes, is no longer used, removing those.
OK, apparently openNode, closeNode are still needed, I’ll keep those an change startScope/endScope to call those!
Hmm, now TS says addKeyword, closeAllNodes, and addSublanguage are required, even tho they are never called: I’ll add a ts-expect-error for that.

All together: wooorm/lowlight@0ab194d

joshgoebel · 2023-04-29T23:33:53Z

If I make a branch/PR can you test it's types? I honestly didn't look at types the entire time I did this work... so I can go back and try to clean that up now.

#3774

joshgoebel · 2023-04-30T00:27:05Z

Closing this out as complete by the release of 11.8.

joshgoebel added enhancement An enhancement or new feature parser help welcome Could use help from community labels Sep 13, 2022

joshgoebel mentioned this issue Sep 16, 2022

Prism vs Highlight.js: Why choose one over the other? #3625

Open

joshgoebel assigned allejo Sep 17, 2022

joshgoebel mentioned this issue Jan 29, 2023

[Request] Support for low-level incremental tokenization #3702

Closed

joshgoebel closed this as completed Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare Emitter API to go public #3621

Prepare Emitter API to go public #3621

joshgoebel commented Sep 13, 2022 •

edited

joshgoebel commented Sep 13, 2022

wooorm commented Sep 13, 2022

joshgoebel commented Sep 13, 2022

joshgoebel commented Sep 16, 2022 •

edited

foo123 commented Sep 17, 2022 •

edited

joshgoebel commented Sep 17, 2022

wooorm commented Apr 29, 2023 •

edited

joshgoebel commented Apr 29, 2023 •

edited

joshgoebel commented Apr 30, 2023

Prepare Emitter API to go public #3621

Prepare Emitter API to go public #3621

Comments

joshgoebel commented Sep 13, 2022 • edited

joshgoebel commented Sep 13, 2022

wooorm commented Sep 13, 2022

joshgoebel commented Sep 13, 2022

joshgoebel commented Sep 16, 2022 • edited

foo123 commented Sep 17, 2022 • edited

joshgoebel commented Sep 17, 2022

wooorm commented Apr 29, 2023 • edited

joshgoebel commented Apr 29, 2023 • edited

joshgoebel commented Apr 30, 2023

joshgoebel commented Sep 13, 2022 •

edited

joshgoebel commented Sep 16, 2022 •

edited

foo123 commented Sep 17, 2022 •

edited

wooorm commented Apr 29, 2023 •

edited

joshgoebel commented Apr 29, 2023 •

edited