Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: Plug-in hooks and building a plug-in culture #2225

Closed
joshgoebel opened this issue Oct 26, 2019 · 38 comments
Closed

Discuss: Plug-in hooks and building a plug-in culture #2225

joshgoebel opened this issue Oct 26, 2019 · 38 comments
Labels
enhancement An enhancement or new feature plugin Specific plugin or plugin discussion

Comments

@joshgoebel
Copy link
Member

joshgoebel commented Oct 26, 2019

Related: #2174

This thread might get long. It'll try to summarize a list of hook points here and keep it up-to-date. The overall idea is to encourage a plugin culture - rather than adding the 100 tiny ideas people approach us with as options and spread all over the code I'd rather have a plug-in ecosystem that make it easy to write tiny plugins to enhance Highlight.js in various ways. Deciding WHERE the code for those plugins lives would be a separate consideration. Perhaps really useful/popular ones would exist in core, perhaps others would live elsewhere.

Worth discussing how these might work with auto-highlight also... which is really a whole bunch of different highlights happening sequentially and then one "winning"...

Basically a plugin would register itself with Highlight.js and then it would be called at various hook points and thru these well defined hook points would be able to do various things. Examples to follow.

It's also worth considering if individual languages could also act as their own plugins... ie, allowing them to hook into all the same places - except only when they are being used to highlight the code.

Hook Ideas

  • Post-autodecect (called after auto-detection)
  • Input hook (called before highlighting)
  • Output hook (called before returning the result)
  • Post-parse hook (called after parse tree is generated)

Post auto-detect

Called after auto-detection is complete and all registered languages have been "scored". Could change the relevancy of any of the results, thereby deciding which language is detected.

Input Hook

Would have access to the incoming state... such as relevancy, language, code, etc... could change any of these... increase relevancy, pre-filter the content, etc.

Output Hook

Called just prior to returning data... again could also any of the data in the return state...

Post-parse hook

Called after the parse tree is generated. The plugin could alter the parse tree. Or access any of the other normal state information (relevancy score, etc).

@joshgoebel
Copy link
Member Author

joshgoebel commented Oct 26, 2019

Example: Tab-rejiggering, trimming space before/after

#1492

It's been requested that we strip whitespace from the beginning and ending of code blocks. Also if someone pastes code that has 5-7 tab stops for ALL of it you could imagine perhaps removing 5 tab stops therefor scooting that content over to the left.

These kind of things could trivially be handled by an input plugin. Handed the raw code, they would tweak the spacing, and then pass that code along to be highlighted.

@joshgoebel
Copy link
Member Author

joshgoebel commented Oct 26, 2019

Example: shebang detection, #!/bin/ruby ...

This could be implemented with an auto-detect post or pre filter... it would analyze the first line of code (even before highlighting) and then could "focus" the highlighter to a single language or be used to "bump" a certain languages rank so that it would have a "bonus" in the auto-detection game.

@joshgoebel joshgoebel added enhancement An enhancement or new feature parser labels Oct 26, 2019
@joshgoebel
Copy link
Member Author

joshgoebel commented Oct 26, 2019

Example: Highlight duplicate HTML attributes

This could be easily done post-parse by walking the parse tree and dynamically flagging certain nodes to be highlighted with a different class, etc.

@joshgoebel joshgoebel changed the title Plug-in hooks and building a plug-in culture Idea: Plug-in hooks and building a plug-in culture Oct 26, 2019
@joshgoebel
Copy link
Member Author

Example: Relevancy boosting for people who typically only use one type of code

#2194

A post-autodetect plugin could slightly favor a language the content author knew they ALMOST ALWAYS used, thereby preventing auto-detect from doing the wrong thing. Say if you only blog about Pascal you could give it a 10% relevancy boost.

@joshgoebel
Copy link
Member Author

joshgoebel commented Oct 26, 2019

Example: Replacing the parser entirely with experimental parsers

#1133

An input plugin could do it's own parsing and if it returned a parseTree (rather than just code) then Hightlight.js would simply skip it's own parsing and jump straight to output processing of the parseTree into HTML.

@joshgoebel joshgoebel added plugin Specific plugin or plugin discussion and removed parser labels Oct 26, 2019
@egor-rogov
Copy link
Collaborator

Nice idea!
I think of per-line output hook to allow to modify (already highlighted) HTML. E.g. add line numbers - there were lots of such requests.

@joshgoebel
Copy link
Member Author

You don’t really need a per line hook if you can hook the whole output. Do you? Split, add stuff. Join.

@egor-rogov
Copy link
Collaborator

Well, you'll have to deal with html tags spanning on several rows. To simplify this task, In per-line mode we probably can close opened tags on the end of line and then re-open them on the beginning of the next line.

@joshgoebel
Copy link
Member Author

Well, you'll have to deal with html tags spanning on several rows. To simplify this task, In per-line mode we probably can close opened tags on the end of line and then re-open them on the beginning of the next line.

Oh, I see. Yeah that'd be a whole other thing. I think there are MUCH easier ways to add line numbers than wrapping every line in a span though, but that's my opinion. :-)

@egor-rogov
Copy link
Collaborator

I think there are MUCH easier ways to add line numbers than wrapping every line in a span though, but that's my opinion. :-)

Like what?

@joshgoebel
Copy link
Member Author

Like using grid or flex box and just having the line numbers in their own pre block... what I've always done and it's worked well. Or you could probably even do a pre with two code elements, but same principle.

@joshgoebel
Copy link
Member Author

joshgoebel commented Oct 29, 2019

Not really opposed to that kind of hook (if someone else wants to write a plugin), but seems like a lot of needless complexity when there are easier ways to solve the problem. :-)

@joshgoebel
Copy link
Member Author

Reference regarding one idea for sharing options: #1969

Not sure it's the best approach, but I wanted to save it for future reference when we come up with more of a plan for plugins.

@joshgoebel
Copy link
Member Author

Related:

#2285

@joshgoebel joshgoebel pinned this issue Feb 6, 2020
@joshgoebel joshgoebel changed the title Idea: Plug-in hooks and building a plug-in culture Plug-in hooks and building a plug-in culture Feb 8, 2020
@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 8, 2020

When adding plugin support I started with the easiest piece highlightBlock. Now I'm looking at the more complex pieces, highlight and highlightAuto... and of course highlight is reentrant in that sublanguages a recursive call to highlight.

And highlightAuto simply calls highlight a bunch of times and picks the winner... so the question is are these details that plugins need to concern themselves with, or do we try to simplify/abstract this somehow?

I think the sublanguage thing is a bit nastier to understand, and I'm leaning towards another set of callbacks so that plugin authors don't have to concern themselves with sublanguage UNLESS they really want to:

  • before:highlight
  • after:highlight
  • before:highlight/sublanguage
  • after:highlight/sublanguage

So someone just wanting to hook top-level highlight calls just uses the first two, and they they are done. Otherwise almost every plugin would have to maintain state and a flag of how "deep" it was to know if it was being called initially or if it was being called on a sublanguage.

Perhaps there are some cases where plugin authors don't care and want EVERY highlight call treated the same (you're always trying to do some transform on a particular language, for example). But I think those cases would be fewer and it's easy enough to just add the two extra hooks... while simplifying things for a lot of other plugin authors.


With regard to autoDetect the complexity bothers me a little less. I guess I'm assuming many plugins will target explicit highlights or auto-highlights, not both. If that were to be the case then those authors wouldn't even notice. However someone who wanted to target auto and highlight would find their callbacks fired like this:

  • before:highlightAuto
  • before:highlight / after:highlight (first potential language)
  • before:highlight / after:highlight (2nd potential language)
  • before:highlight / after:highlight (3rd potential language)
  • ...
  • ...
  • after:highlightAuto

Is it just a plugin authors responsibility to keep this in mind? I'm thinking perhaps it we provided some meta-state that would be enough:

    // say I only care about explicit  highlights, not auto-detect attempts
    hljs.addPlugin( {
      'before:highlight': ({block, data}) => {
        if (data.flags.autodetect) return;

        // continue with normal processing
    })

@egor-rogov
Copy link
Collaborator

Is there a reason one can care about explicit highlights but not about auto-highlights?
In fact, sublanguages often use auto-detection even when you call explicit highlight.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 8, 2020

Is there a reason one can care about explicit highlights but not about auto-highlights?

Well off the top of my head you have different categories of plugins. Or really maybe the difference isn't between auto-and-not, it's between "just one language" vs "ALWAYS".

IE:

Just one language

I have a kick-ass plugin that does impossible language analysis and really cool stuff with C++ code (or perhaps even something trivially simple)... I don't care about auto-highlighting (in general). If someone says the code is "C++" then I run OR if auto-detect decides the code is C++, then I run. But in general when the 283 other languages are tested/highlighted I don't give one flip - I only care about C++.

All languages

I want to strip all whitespace from before and after code blocks because whatever tool is building the HTML couldn't be bothered to do that properly - and for my use cases whitespace never matters. In this case I want my plugin to ALWAYS run before, and modify the code by stripping whitespace... I don't care if we know the language, I don't care if it's auto-detect, etc... I just want to strip space.

I probably don't want to strip space from sublanguages though.


What I want to avoid is a world where someone wants to write a "simple" plugin like one of the above but to do it has to hook 8 different hooks, and track whether highlight is being called initially, inside of auto-detect, inside of sublanguage, inside of a nested sublanguage, etc...

IE the closer the plug-in psuedo-code is to this, the better:

before("cpp", doAwesomeAnalysis)
after("cpp", applyAwesomeAnalysis)

beforeAll(stripWhitespace);

@joshgoebel
Copy link
Member Author

There is kind of an implied split here we don't currently even have... "after:autodetect" and "before:highlight"... since the auto-detect is the highlight... it's possible my plugin would only want to act on C++ code once it had been determined it was actually C++ code, not just when the highlighter is GUESSING.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 8, 2020

I try to avoid designing this stuff abstractly, so the current focus is #2391.

The idea being "detect frontmatter" and if present, possibly route to a particular grammar or else the plugin would take over parsing duties itself, calling two particular grammars internally. And already you have a case where a plugin might want to recursively call highlight from inside a highlight callback itself... although I suppose in that case I assume the plugin would use internal state to prevent it from triggering on it's own callbacks?

@egor-rogov
Copy link
Collaborator

the plugin would take over parsing duties itself, calling two particular grammars internally

Looks very complicated... In plugin you'd probably want to set boundaries of languages and let the core do the actual highlighting itself.
Maybe it's worth starting with that functionality? I.e. possibility to pass to highlightBlock a markup defining language boundaries.
Then you could change the markup in before:highlightBlock.

@joshgoebel
Copy link
Member Author

Looks very complicated... In plugin you'd probably want to set boundaries of languages and let the core do the actual highlighting itself.

That's what I'd do here. The plugin would split the code into two pieces in a before hook... then call highlight(yaml) and highlight(markdown), then concat the two together, then return that result... hence "bypassing" the actual highlighting we'd have done ourselves on a single large block of code.

...but the plugin author would be free to choose what works best. Perhaps they want to add support for a language that's really impossible to parse with pure regex but easy with a few lines of JS. In that case they wouldn't need to call highlight at all.

possibility to pass to highlightBlock a markup defining language boundaries.

I don't know what you mean here but it sounds like the opposite of a plugin, sounds like adding features back to the core library.

Then you could change the markup in before:highlightBlock.

We already have before/after for highlightBlock... technically it'd be possible to write this plugin today if it only needed to work with highlightBlock, but highlightBlock is browser-only... and that's pretty limiting. Most plugins are going to want to hook the actual library calls so that they would work inside NodeJS, etc. not just the browser environment.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 8, 2020

But since you brought up highlightBlock right now it's kind of a "catch-all"... the plugin author would have to figure out if a language was specified or going to be auto-detected - they also have the power to change or control this behavior... they could run their own auto-detect for example... or flag anything containing a magic string as a given language, etc.

IE, my shebang plugin.. that would detect the language only by looking at the very first line, etc. :-)

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 8, 2020

I think we want to hide/abstract the internals as much as we can so that our hooks all make sense but don't seem to be rooted too much in implementation details. Say highlight (for some funny reason) calls itself 20 times just because of some quirk (even in simple cases).

Someone hooking before/after highlight would NOT expect their hook to be called 21 times... we'd want to abstract that to deal with only the external call to highlight.

So that's why I think we need to have separate hooks for before/sublanguage and after/sublangage... today they call highlight, but that's an implementation detail. It could just as easily been a specialized method that was called instead of recursively calling the main method.

You could also make the same argument about auto-detect... TODAY it calls highlight a zillion times... but that's an implementation detail. Other libraries have really good results with machine learning for language recognition (and we've already talked about plugins that could copt recognition). So in that case if I hooked highlight and highlightAuto I would NOT necessarily expect highlight to be called 284 times when a call to highlightAuto was made.

@joshgoebel
Copy link
Member Author

So I think that leaves us with more focusing on what is happening rather than internal method dispatch:

  • before/sublanguage
  • after/sublanguage
  • before/autodetect (triggered say 284+ times when highlightAuto is called, etc)
  • after/autodetect
  • before/highlight (external api call)
  • after/highlight (external api call)
  • before/highlightAuto (external api call)
  • after/highlightAuto (external api call)
  • before/highlightBlock (external api call)
  • after/highlightBlock (external api call)

So now you have clear boundaries for functionality. If you wanted to ALWAYS touch say "pascal" you'd need to hook sublanguage and highlight.

The last tricky party is when you mix sublanguage and autodetect... Is a before/sublanguage/autodetect also needed?

@joshgoebel
Copy link
Member Author

Actually I don't think autodetect should trigger 284 times... that seems weird and very, very internal. Should plugins need to care about every single time attempt the auto-detect makes at detection? I need to think about this over coffee sometime. Seems so much of this is easy to bend to be very implementation specific if we don't think about it right.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 9, 2020

Say I've been given a snippet of text to auto-highlight. More abstractly if you think it terms of events:

  • Auto-detection
  • Highlighting
    • Highlighting sublanguage 1
    • Highlighting sublanguage 2

So I count 4 events there, and 3 types (auto-detect, highlight, highlight sublangauge'. I could hook into before/after of each of the highlighting events.

If I hooked into BEFORE auto-detection I could write my own auto-detect, or influence the built in method (whatever it was). If I hooked AFTER I could say look at the list of potential matches and make my own choice (so perhaps augmenting the built-in support).

I don't see a case there to call any hook 284 times. That seems to be an implementation detail of how we currently do auto-detection.

That makes a LOT more sense to me.

  • If I care about auto-detection, I hook it.
  • If I only care about pascal say, I hook highlighting and check language.
  • If I care about removing whitespace I'd hook just highlight.

Leading to:

  • before/highlightSublanguage
  • after/highlightSublanguage
  • before/autodetectSublanguage
  • after/autodetectSublanguage
  • before/autodetect (triggered once for a given code block)
  • after/autodetect (triggered once for a given code block)
  • before/highlight
  • after/highlight
  • before/highlightBlock (still useful to do browser things)
  • after/highlightBlock (still useful to do browser things)

Updated: Added autodetect for sublanguage.

@egor-rogov
Copy link
Collaborator

I do agree with your thought process.

A thought: do we really need separate hooks for autodetection? We can have something like "choose language" with a list of alternatives. If there is just one, it corresponds to highlight; if there are some of them, it is highlightAuto.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 9, 2020

We can have something like "choose language" with a list of alternatives. If there is just one, it corresponds to highlight; if there are some of them, it is highlightAuto.

I don't follow what you mean. A plugin like shebang would need to hook BEFORE anything was auto-detected because it would want to overrule that decision.

    hljs.addPlugin( {
      'before:autodetect': (data) => {
        var language = detectFromShebang(data.code);
        if (language)
          data.language = language;
    })

@egor-rogov
Copy link
Collaborator

I mean you can treat anything like autodetection. But sometimes it can choose from exactly one predefined option.

This can help with

The last tricky party is when you mix sublanguage and autodetect... Is a before/sublanguage/autodetect also needed?

@joshgoebel
Copy link
Member Author

The last tricky party is when you mix sublanguage and autodetect... Is a before/sublanguage/autodetect also needed?

Actually I think that would work just like the parent... the question is would you need a "sublanguage/autodect" hook... IE, autodetect/sublangauge would fire (if it wasn't explicit) and then highlight/sublanguage would fire.

@joshgoebel
Copy link
Member Author

I mean you can treat anything like autodetection. But sometimes it can choose from exactly one predefined option.

I still don't follow you might have to give a very specific example of what you had in mind.

@egor-rogov
Copy link
Collaborator

Nevermind. Your event system makes sense.

So, for a highlightAuto call you'll get the following series of events:

  1. before/autodetect (here we can override autodetection)
  2. after/autodetect (here we can choose one of potential matches)
  3. before/highlight
  4. after/highlight

Do I get it right?
Then how can you implement it, given that highlighting is done inside the autodetection foreach-loop?

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 9, 2020

here we can choose one of potential matches

Or decide they are all rubbish and do something else entirely. :)

Then how can you implement it, given that highlighting is done inside the autodetection foreach-loop

Well, honestly it'll help if you stop thinking about it that way. :-) Instead think: Auto-detection is done. One of the side-effects is that we also get the result of each potential highlight run, which may or may not always be useful. And may or may not be how it's implemented in the future.

So, we run the auto-detection... and (assuming another plugin hasn't provided it's own output) then we decide the language is say basic. The final two events then are more virtual - but only because of our existing implementation. So we'd fire before/highlight then after/highlight immediately after each other. After highlight would use the cached result from our auto-detection.

UNLESS the plugin makes changes to code (or language even) inside before/highlight in which case now the auto-detection was ONLY just that - autodection. Now we have to actually run the highlight on the modified code to generate the actual result.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 9, 2020

I'll probably start next with just before/highlight (specific language) and after/highlight, which allow for a lot of useful plugins and save the auto-detect/sublanguage stuff for when it's asked for.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 9, 2020

A bigger question is plugin vs plugin interaction. For example: I was just thinking about the API and how a plugin like frontmatter would handle it's own need to reuse highlightjs....

IE, frontmatter splits the content then calls highlight twice on each part... but really does it need to then ALSO handle those hooks and realize it's running inside itself and then just do nothing? I thought, no... so you could have a flag skipPlugins:

header = highlight(yaml, {skipPlugins: true})
body = highlight(markdown, {skipPlugins: true})

But then do I really want to skip ALL plugins, or do I really want to skip ONLY myself? Perhaps it's not too much to expect early plugins to deal with cruft like this and then after we see what the REAL problems are we circle back and try and fix them with plugins v2. :-)

IE, for now if you were using the external API to call highlight from within a hook you'd have to be careful to avoid infinite recursion and that's the plugin authors responsibility.

@joshgoebel joshgoebel changed the title Plug-in hooks and building a plug-in culture Discuss: Plug-in hooks and building a plug-in culture Feb 19, 2020
@joshgoebel joshgoebel unpinned this issue Mar 10, 2020
@taufik-nurrohman
Copy link
Member

taufik-nurrohman commented Jun 5, 2020

But then do I really want to skip ALL plugins, or do I really want to skip ONLY myself?

Learn it from the native event listener. We could define context.

function c() {}
function d() {}

a.addEventListener('b', c);
a.addEventListener('b', d);
a.removeEventListener('b', c); // remove `c` only

So for skipPlugins we may have something like this:

function foo() {}
function bar() {}

{skipPlugins: true} // ignore all
{skipPlugins: [foo, bar]} // ignore `foo` and `bar` only

Here’s mine, for inspiration: https://github.com/taufik-nurrohman/color-picker/blob/bb7051afbf5f61ac9b691d28fb899e06b4fadffb/color-picker.js#L282-L322

var $ = this, // the class instance scope
    hooks = {};

function hookLet(name, fn) {}
function hookSet(name, fn) {}

$.hooks = hooks;
$.off = hookLet;
$.on = hookSet;

Usage:

hljs.on('foo', bar);
hljs.on('foo', function() {});

hljs.off('foo', bar); // remove `bar` only

@joshgoebel
Copy link
Member Author

joshgoebel commented Jun 5, 2020

Yes I understand how it might work conceptually.

I'd like to see more plugins deal this themselves before we consider adding anything to core. Right now it should be pretty easy for a single plugin to prevent re-entering ITSELF. I'm not sure plugins should "be aware" of other plugins or be able to easily turn them on and off.

You should be able to add multiple plugins and trust that they all "just work" unless they do mutually exclusive things. I don't want plugin A saying it doesn't work with plugin B so it just turns it on and off, etc...

@joshgoebel
Copy link
Member Author

Closing and referencing at #2762.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature plugin Specific plugin or plugin discussion
Projects
None yet
Development

No branches or pull requests

3 participants