Discuss: Plug-in hooks and building a plug-in culture #2225

joshgoebel · 2019-10-26T01:33:23Z

Related: #2174

This thread might get long. It'll try to summarize a list of hook points here and keep it up-to-date. The overall idea is to encourage a plugin culture - rather than adding the 100 tiny ideas people approach us with as options and spread all over the code I'd rather have a plug-in ecosystem that make it easy to write tiny plugins to enhance Highlight.js in various ways. Deciding WHERE the code for those plugins lives would be a separate consideration. Perhaps really useful/popular ones would exist in core, perhaps others would live elsewhere.

Worth discussing how these might work with auto-highlight also... which is really a whole bunch of different highlights happening sequentially and then one "winning"...

Basically a plugin would register itself with Highlight.js and then it would be called at various hook points and thru these well defined hook points would be able to do various things. Examples to follow.

It's also worth considering if individual languages could also act as their own plugins... ie, allowing them to hook into all the same places - except only when they are being used to highlight the code.

Hook Ideas

Post-autodecect (called after auto-detection)
Input hook (called before highlighting)
Output hook (called before returning the result)
Post-parse hook (called after parse tree is generated)

Post auto-detect

Called after auto-detection is complete and all registered languages have been "scored". Could change the relevancy of any of the results, thereby deciding which language is detected.

Input Hook

Would have access to the incoming state... such as relevancy, language, code, etc... could change any of these... increase relevancy, pre-filter the content, etc.

Output Hook

Called just prior to returning data... again could also any of the data in the return state...

Post-parse hook

Called after the parse tree is generated. The plugin could alter the parse tree. Or access any of the other normal state information (relevancy score, etc).

joshgoebel · 2019-10-26T01:35:23Z

Example: Tab-rejiggering, trimming space before/after

#1492

It's been requested that we strip whitespace from the beginning and ending of code blocks. Also if someone pastes code that has 5-7 tab stops for ALL of it you could imagine perhaps removing 5 tab stops therefor scooting that content over to the left.

These kind of things could trivially be handled by an input plugin. Handed the raw code, they would tweak the spacing, and then pass that code along to be highlighted.

joshgoebel · 2019-10-26T01:37:09Z

Example: shebang detection, #!/bin/ruby ...

This could be implemented with an auto-detect post or pre filter... it would analyze the first line of code (even before highlighting) and then could "focus" the highlighter to a single language or be used to "bump" a certain languages rank so that it would have a "bonus" in the auto-detection game.

joshgoebel · 2019-10-26T01:42:21Z

Example: Highlight duplicate HTML attributes

This could be easily done post-parse by walking the parse tree and dynamically flagging certain nodes to be highlighted with a different class, etc.

joshgoebel · 2019-10-26T01:52:34Z

Example: Relevancy boosting for people who typically only use one type of code

#2194

A post-autodetect plugin could slightly favor a language the content author knew they ALMOST ALWAYS used, thereby preventing auto-detect from doing the wrong thing. Say if you only blog about Pascal you could give it a 10% relevancy boost.

joshgoebel · 2019-10-26T02:04:48Z

Example: Replacing the parser entirely with experimental parsers

#1133

An input plugin could do it's own parsing and if it returned a parseTree (rather than just code) then Hightlight.js would simply skip it's own parsing and jump straight to output processing of the parseTree into HTML.

egor-rogov · 2019-10-27T00:24:32Z

Nice idea!
I think of per-line output hook to allow to modify (already highlighted) HTML. E.g. add line numbers - there were lots of such requests.

joshgoebel · 2019-10-27T01:15:59Z

You don’t really need a per line hook if you can hook the whole output. Do you? Split, add stuff. Join.

egor-rogov · 2019-10-28T11:38:54Z

Well, you'll have to deal with html tags spanning on several rows. To simplify this task, In per-line mode we probably can close opened tags on the end of line and then re-open them on the beginning of the next line.

joshgoebel · 2019-10-29T12:34:52Z

Well, you'll have to deal with html tags spanning on several rows. To simplify this task, In per-line mode we probably can close opened tags on the end of line and then re-open them on the beginning of the next line.

Oh, I see. Yeah that'd be a whole other thing. I think there are MUCH easier ways to add line numbers than wrapping every line in a span though, but that's my opinion. :-)

egor-rogov · 2019-10-29T14:25:15Z

I think there are MUCH easier ways to add line numbers than wrapping every line in a span though, but that's my opinion. :-)

Like what?

joshgoebel · 2019-10-29T14:27:22Z

Like using grid or flex box and just having the line numbers in their own pre block... what I've always done and it's worked well. Or you could probably even do a pre with two code elements, but same principle.

joshgoebel · 2019-10-29T14:29:01Z

Not really opposed to that kind of hook (if someone else wants to write a plugin), but seems like a lot of needless complexity when there are easier ways to solve the problem. :-)

joshgoebel · 2019-11-02T13:34:21Z

Reference regarding one idea for sharing options: #1969

Not sure it's the best approach, but I wanted to save it for future reference when we come up with more of a plan for plugins.

joshgoebel · 2019-11-14T15:23:19Z

joshgoebel · 2020-02-08T22:17:36Z

When adding plugin support I started with the easiest piece highlightBlock. Now I'm looking at the more complex pieces, highlight and highlightAuto... and of course highlight is reentrant in that sublanguages a recursive call to highlight.

And highlightAuto simply calls highlight a bunch of times and picks the winner... so the question is are these details that plugins need to concern themselves with, or do we try to simplify/abstract this somehow?

I think the sublanguage thing is a bit nastier to understand, and I'm leaning towards another set of callbacks so that plugin authors don't have to concern themselves with sublanguage UNLESS they really want to:

before:highlight
after:highlight
before:highlight/sublanguage
after:highlight/sublanguage

So someone just wanting to hook top-level highlight calls just uses the first two, and they they are done. Otherwise almost every plugin would have to maintain state and a flag of how "deep" it was to know if it was being called initially or if it was being called on a sublanguage.

Perhaps there are some cases where plugin authors don't care and want EVERY highlight call treated the same (you're always trying to do some transform on a particular language, for example). But I think those cases would be fewer and it's easy enough to just add the two extra hooks... while simplifying things for a lot of other plugin authors.

With regard to autoDetect the complexity bothers me a little less. I guess I'm assuming many plugins will target explicit highlights or auto-highlights, not both. If that were to be the case then those authors wouldn't even notice. However someone who wanted to target auto and highlight would find their callbacks fired like this:

before:highlightAuto
before:highlight / after:highlight (first potential language)
before:highlight / after:highlight (2nd potential language)
before:highlight / after:highlight (3rd potential language)
...
...
after:highlightAuto

Is it just a plugin authors responsibility to keep this in mind? I'm thinking perhaps it we provided some meta-state that would be enough:

    // say I only care about explicit  highlights, not auto-detect attempts
    hljs.addPlugin( {
      'before:highlight': ({block, data}) => {
        if (data.flags.autodetect) return;

        // continue with normal processing
    })

egor-rogov · 2020-02-08T22:33:48Z

Is there a reason one can care about explicit highlights but not about auto-highlights?
In fact, sublanguages often use auto-detection even when you call explicit highlight.

joshgoebel · 2020-02-08T22:40:17Z

Is there a reason one can care about explicit highlights but not about auto-highlights?

Well off the top of my head you have different categories of plugins. Or really maybe the difference isn't between auto-and-not, it's between "just one language" vs "ALWAYS".

IE:

Just one language

I have a kick-ass plugin that does impossible language analysis and really cool stuff with C++ code (or perhaps even something trivially simple)... I don't care about auto-highlighting (in general). If someone says the code is "C++" then I run OR if auto-detect decides the code is C++, then I run. But in general when the 283 other languages are tested/highlighted I don't give one flip - I only care about C++.

All languages

I want to strip all whitespace from before and after code blocks because whatever tool is building the HTML couldn't be bothered to do that properly - and for my use cases whitespace never matters. In this case I want my plugin to ALWAYS run before, and modify the code by stripping whitespace... I don't care if we know the language, I don't care if it's auto-detect, etc... I just want to strip space.

I probably don't want to strip space from sublanguages though.

What I want to avoid is a world where someone wants to write a "simple" plugin like one of the above but to do it has to hook 8 different hooks, and track whether highlight is being called initially, inside of auto-detect, inside of sublanguage, inside of a nested sublanguage, etc...

IE the closer the plug-in psuedo-code is to this, the better:

before("cpp", doAwesomeAnalysis)
after("cpp", applyAwesomeAnalysis)

beforeAll(stripWhitespace);

joshgoebel · 2020-02-08T22:45:58Z

There is kind of an implied split here we don't currently even have... "after:autodetect" and "before:highlight"... since the auto-detect is the highlight... it's possible my plugin would only want to act on C++ code once it had been determined it was actually C++ code, not just when the highlighter is GUESSING.

joshgoebel · 2020-02-08T22:48:40Z

I try to avoid designing this stuff abstractly, so the current focus is #2391.

The idea being "detect frontmatter" and if present, possibly route to a particular grammar or else the plugin would take over parsing duties itself, calling two particular grammars internally. And already you have a case where a plugin might want to recursively call highlight from inside a highlight callback itself... although I suppose in that case I assume the plugin would use internal state to prevent it from triggering on it's own callbacks?

egor-rogov · 2020-02-08T23:39:56Z

the plugin would take over parsing duties itself, calling two particular grammars internally

Looks very complicated... In plugin you'd probably want to set boundaries of languages and let the core do the actual highlighting itself.
Maybe it's worth starting with that functionality? I.e. possibility to pass to highlightBlock a markup defining language boundaries.
Then you could change the markup in before:highlightBlock.

joshgoebel · 2020-02-08T23:45:42Z

Looks very complicated... In plugin you'd probably want to set boundaries of languages and let the core do the actual highlighting itself.

That's what I'd do here. The plugin would split the code into two pieces in a before hook... then call highlight(yaml) and highlight(markdown), then concat the two together, then return that result... hence "bypassing" the actual highlighting we'd have done ourselves on a single large block of code.

...but the plugin author would be free to choose what works best. Perhaps they want to add support for a language that's really impossible to parse with pure regex but easy with a few lines of JS. In that case they wouldn't need to call highlight at all.

possibility to pass to highlightBlock a markup defining language boundaries.

I don't know what you mean here but it sounds like the opposite of a plugin, sounds like adding features back to the core library.

Then you could change the markup in before:highlightBlock.

We already have before/after for highlightBlock... technically it'd be possible to write this plugin today if it only needed to work with highlightBlock, but highlightBlock is browser-only... and that's pretty limiting. Most plugins are going to want to hook the actual library calls so that they would work inside NodeJS, etc. not just the browser environment.

joshgoebel · 2020-02-08T23:47:12Z

But since you brought up highlightBlock right now it's kind of a "catch-all"... the plugin author would have to figure out if a language was specified or going to be auto-detected - they also have the power to change or control this behavior... they could run their own auto-detect for example... or flag anything containing a magic string as a given language, etc.

IE, my shebang plugin.. that would detect the language only by looking at the very first line, etc. :-)

joshgoebel · 2020-02-08T23:51:45Z

I think we want to hide/abstract the internals as much as we can so that our hooks all make sense but don't seem to be rooted too much in implementation details. Say highlight (for some funny reason) calls itself 20 times just because of some quirk (even in simple cases).

Someone hooking before/after highlight would NOT expect their hook to be called 21 times... we'd want to abstract that to deal with only the external call to highlight.

So that's why I think we need to have separate hooks for before/sublanguage and after/sublangage... today they call highlight, but that's an implementation detail. It could just as easily been a specialized method that was called instead of recursively calling the main method.

You could also make the same argument about auto-detect... TODAY it calls highlight a zillion times... but that's an implementation detail. Other libraries have really good results with machine learning for language recognition (and we've already talked about plugins that could copt recognition). So in that case if I hooked highlight and highlightAuto I would NOT necessarily expect highlight to be called 284 times when a call to highlightAuto was made.

joshgoebel · 2020-02-08T23:57:21Z

So I think that leaves us with more focusing on what is happening rather than internal method dispatch:

before/sublanguage
after/sublanguage
before/autodetect (triggered say 284+ times when highlightAuto is called, etc)
after/autodetect
before/highlight (external api call)
after/highlight (external api call)
before/highlightAuto (external api call)
after/highlightAuto (external api call)
before/highlightBlock (external api call)
after/highlightBlock (external api call)

So now you have clear boundaries for functionality. If you wanted to ALWAYS touch say "pascal" you'd need to hook sublanguage and highlight.

The last tricky party is when you mix sublanguage and autodetect... Is a before/sublanguage/autodetect also needed?

joshgoebel · 2020-02-08T23:59:10Z

Actually I don't think autodetect should trigger 284 times... that seems weird and very, very internal. Should plugins need to care about every single time attempt the auto-detect makes at detection? I need to think about this over coffee sometime. Seems so much of this is easy to bend to be very implementation specific if we don't think about it right.

joshgoebel · 2020-02-09T00:02:46Z

Say I've been given a snippet of text to auto-highlight. More abstractly if you think it terms of events:

Auto-detection
Highlighting
- Highlighting sublanguage 1
- Highlighting sublanguage 2

So I count 4 events there, and 3 types (auto-detect, highlight, highlight sublangauge'. I could hook into before/after of each of the highlighting events.

If I hooked into BEFORE auto-detection I could write my own auto-detect, or influence the built in method (whatever it was). If I hooked AFTER I could say look at the list of potential matches and make my own choice (so perhaps augmenting the built-in support).

I don't see a case there to call any hook 284 times. That seems to be an implementation detail of how we currently do auto-detection.

That makes a LOT more sense to me.

If I care about auto-detection, I hook it.
If I only care about pascal say, I hook highlighting and check language.
If I care about removing whitespace I'd hook just highlight.

Leading to:

before/highlightSublanguage
after/highlightSublanguage
before/autodetectSublanguage
after/autodetectSublanguage
before/autodetect (triggered once for a given code block)
after/autodetect (triggered once for a given code block)
before/highlight
after/highlight
before/highlightBlock (still useful to do browser things)
after/highlightBlock (still useful to do browser things)

Updated: Added autodetect for sublanguage.

egor-rogov · 2020-02-09T00:09:44Z

I do agree with your thought process.

A thought: do we really need separate hooks for autodetection? We can have something like "choose language" with a list of alternatives. If there is just one, it corresponds to highlight; if there are some of them, it is highlightAuto.

joshgoebel · 2020-02-09T00:11:02Z

We can have something like "choose language" with a list of alternatives. If there is just one, it corresponds to highlight; if there are some of them, it is highlightAuto.

I don't follow what you mean. A plugin like shebang would need to hook BEFORE anything was auto-detected because it would want to overrule that decision.

    hljs.addPlugin( {
      'before:autodetect': (data) => {
        var language = detectFromShebang(data.code);
        if (language)
          data.language = language;
    })

egor-rogov · 2020-02-09T00:13:45Z

I mean you can treat anything like autodetection. But sometimes it can choose from exactly one predefined option.

This can help with

The last tricky party is when you mix sublanguage and autodetect... Is a before/sublanguage/autodetect also needed?

joshgoebel · 2020-02-09T00:15:18Z

The last tricky party is when you mix sublanguage and autodetect... Is a before/sublanguage/autodetect also needed?

Actually I think that would work just like the parent... the question is would you need a "sublanguage/autodect" hook... IE, autodetect/sublangauge would fire (if it wasn't explicit) and then highlight/sublanguage would fire.

joshgoebel · 2020-02-09T00:16:17Z

I mean you can treat anything like autodetection. But sometimes it can choose from exactly one predefined option.

I still don't follow you might have to give a very specific example of what you had in mind.

egor-rogov · 2020-02-09T00:48:09Z

Nevermind. Your event system makes sense.

So, for a highlightAuto call you'll get the following series of events:

before/autodetect (here we can override autodetection)
after/autodetect (here we can choose one of potential matches)
before/highlight
after/highlight

Do I get it right?
Then how can you implement it, given that highlighting is done inside the autodetection foreach-loop?

joshgoebel · 2020-02-09T00:55:36Z

here we can choose one of potential matches

Or decide they are all rubbish and do something else entirely. :)

Then how can you implement it, given that highlighting is done inside the autodetection foreach-loop

Well, honestly it'll help if you stop thinking about it that way. :-) Instead think: Auto-detection is done. One of the side-effects is that we also get the result of each potential highlight run, which may or may not always be useful. And may or may not be how it's implemented in the future.

So, we run the auto-detection... and (assuming another plugin hasn't provided it's own output) then we decide the language is say basic. The final two events then are more virtual - but only because of our existing implementation. So we'd fire before/highlight then after/highlight immediately after each other. After highlight would use the cached result from our auto-detection.

UNLESS the plugin makes changes to code (or language even) inside before/highlight in which case now the auto-detection was ONLY just that - autodection. Now we have to actually run the highlight on the modified code to generate the actual result.

joshgoebel · 2020-02-09T01:01:22Z

I'll probably start next with just before/highlight (specific language) and after/highlight, which allow for a lot of useful plugins and save the auto-detect/sublanguage stuff for when it's asked for.

joshgoebel · 2020-02-09T01:20:54Z

A bigger question is plugin vs plugin interaction. For example: I was just thinking about the API and how a plugin like frontmatter would handle it's own need to reuse highlightjs....

IE, frontmatter splits the content then calls highlight twice on each part... but really does it need to then ALSO handle those hooks and realize it's running inside itself and then just do nothing? I thought, no... so you could have a flag skipPlugins:

header = highlight(yaml, {skipPlugins: true})
body = highlight(markdown, {skipPlugins: true})

But then do I really want to skip ALL plugins, or do I really want to skip ONLY myself? Perhaps it's not too much to expect early plugins to deal with cruft like this and then after we see what the REAL problems are we circle back and try and fix them with plugins v2. :-)

IE, for now if you were using the external API to call highlight from within a hook you'd have to be careful to avoid infinite recursion and that's the plugin authors responsibility.

taufik-nurrohman · 2020-06-05T06:06:40Z

But then do I really want to skip ALL plugins, or do I really want to skip ONLY myself?

Learn it from the native event listener. We could define context.

function c() {}
function d() {}

a.addEventListener('b', c);
a.addEventListener('b', d);
a.removeEventListener('b', c); // remove `c` only

So for skipPlugins we may have something like this:

function foo() {}
function bar() {}

{skipPlugins: true} // ignore all
{skipPlugins: [foo, bar]} // ignore `foo` and `bar` only

Here’s mine, for inspiration: https://github.com/taufik-nurrohman/color-picker/blob/bb7051afbf5f61ac9b691d28fb899e06b4fadffb/color-picker.js#L282-L322

var $ = this, // the class instance scope
    hooks = {};

function hookLet(name, fn) {}
function hookSet(name, fn) {}

$.hooks = hooks;
$.off = hookLet;
$.on = hookSet;

Usage:

hljs.on('foo', bar);
hljs.on('foo', function() {});

hljs.off('foo', bar); // remove `bar` only

joshgoebel · 2020-06-05T16:02:34Z

Yes I understand how it might work conceptually.

I'd like to see more plugins deal this themselves before we consider adding anything to core. Right now it should be pretty easy for a single plugin to prevent re-entering ITSELF. I'm not sure plugins should "be aware" of other plugins or be able to easily turn them on and off.

You should be able to add multiple plugins and trust that they all "just work" unless they do mutually exclusive things. I don't want plugin A saying it doesn't work with plugin B so it just turns it on and off, etc...

joshgoebel · 2020-10-16T17:54:48Z

Closing and referencing at #2762.

joshgoebel added enhancement An enhancement or new feature parser labels Oct 26, 2019

joshgoebel changed the title ~~Plug-in hooks and building a plug-in culture~~ Idea: Plug-in hooks and building a plug-in culture Oct 26, 2019

joshgoebel added plugin Specific plugin or plugin discussion and removed parser labels Oct 26, 2019

joshgoebel mentioned this issue Nov 3, 2019

[Question] Is there a way to "highlight" just certain lines? #2253

Closed

joshgoebel pinned this issue Feb 6, 2020

joshgoebel changed the title ~~Idea: Plug-in hooks and building a plug-in culture~~ Plug-in hooks and building a plug-in culture Feb 8, 2020

joshgoebel changed the title ~~Plug-in hooks and building a plug-in culture~~ Discuss: Plug-in hooks and building a plug-in culture Feb 19, 2020

joshgoebel unpinned this issue Mar 10, 2020

joshgoebel mentioned this issue May 19, 2020

Provide access to a parse tree as an alternative to HTML output #1086

Closed

joshgoebel mentioned this issue Oct 16, 2020

Discuss: Long-term discussions index #2762

Open

joshgoebel closed this as completed Oct 16, 2020

joshgoebel mentioned this issue Feb 12, 2021

Deprecate HTML auto-merging from core (plugin may be needed) #2889

Closed

3 tasks

joshgoebel mentioned this issue Jun 12, 2021

[Request] Add line numbering #3238

Closed

Discuss: Plug-in hooks and building a plug-in culture #2225

Discuss: Plug-in hooks and building a plug-in culture #2225

Comments

joshgoebel commented Oct 26, 2019 • edited

joshgoebel commented Oct 26, 2019 • edited

joshgoebel commented Oct 26, 2019 • edited

joshgoebel commented Oct 26, 2019 • edited

joshgoebel commented Oct 26, 2019

joshgoebel commented Oct 26, 2019 • edited

egor-rogov commented Oct 27, 2019

joshgoebel commented Oct 27, 2019

egor-rogov commented Oct 28, 2019

joshgoebel commented Oct 29, 2019

egor-rogov commented Oct 29, 2019

joshgoebel commented Oct 29, 2019

joshgoebel commented Oct 29, 2019 • edited

joshgoebel commented Nov 2, 2019

joshgoebel commented Nov 14, 2019

joshgoebel commented Feb 8, 2020 • edited

egor-rogov commented Feb 8, 2020

joshgoebel commented Feb 8, 2020 • edited

joshgoebel commented Feb 8, 2020

joshgoebel commented Feb 8, 2020 • edited

egor-rogov commented Feb 8, 2020

joshgoebel commented Feb 8, 2020

joshgoebel commented Feb 8, 2020 • edited

joshgoebel commented Feb 8, 2020 • edited

joshgoebel commented Feb 8, 2020

joshgoebel commented Feb 8, 2020

joshgoebel commented Feb 9, 2020 • edited

egor-rogov commented Feb 9, 2020

joshgoebel commented Feb 9, 2020 • edited

egor-rogov commented Feb 9, 2020

joshgoebel commented Feb 9, 2020

joshgoebel commented Feb 9, 2020

egor-rogov commented Feb 9, 2020

joshgoebel commented Feb 9, 2020 • edited

joshgoebel commented Feb 9, 2020 • edited

joshgoebel commented Feb 9, 2020 • edited

taufik-nurrohman commented Jun 5, 2020 • edited

joshgoebel commented Jun 5, 2020 • edited

joshgoebel commented Oct 16, 2020

joshgoebel commented Oct 26, 2019 •

edited

joshgoebel commented Oct 26, 2019 •

edited

joshgoebel commented Oct 26, 2019 •

edited

joshgoebel commented Oct 26, 2019 •

edited

joshgoebel commented Oct 26, 2019 •

edited

joshgoebel commented Oct 29, 2019 •

edited

joshgoebel commented Feb 8, 2020 •

edited

joshgoebel commented Feb 8, 2020 •

edited

joshgoebel commented Feb 8, 2020 •

edited

joshgoebel commented Feb 8, 2020 •

edited

joshgoebel commented Feb 8, 2020 •

edited

joshgoebel commented Feb 9, 2020 •

edited

joshgoebel commented Feb 9, 2020 •

edited

joshgoebel commented Feb 9, 2020 •

edited

joshgoebel commented Feb 9, 2020 •

edited

joshgoebel commented Feb 9, 2020 •

edited

taufik-nurrohman commented Jun 5, 2020 •

edited

joshgoebel commented Jun 5, 2020 •

edited