New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss: File size of distributable for the browser #2846
Comments
I have a take on this, but it might be ignorant on how people are actually using highlight.js, but here it is: Recommend using ESM and stop caring about the bundled size. If we use a promise-based approach, we could use dynamic import (https://caniuse.com/es6-module-dynamic-import) to lazy load only language parsers the user is actually using and have a pretty clean API: <link rel="stylesheet" href="/path/to/styles/default.css">
<!-- IE compatibility -->
<script src="/path/to/promise-polyfill.min.js" nomodule ></script>
<script src="/path/to/highlight.min.js" nomodule ></script>
<script nomodule >hljs.highlightAll().catch(console.error);</script>
<!-- Evergreen browsers -->
<script type="module">
import hljs from '/path/to/highlight.mjs';
hljs.highlightAll().catch(console.error);
</script> Using this approach, only IE users would suffer from the bundle size. We would have to make some change in the underlining API to deal with import hljs from "./highlight.js/lib/core.js";
// registerLanguage should accept lazy loaded `Language`:
hljs.registerLanguage('c', () => import('./highlight.js/lib/languages/c.js'));
hljs.registerLanguage('d', () => import('./highlight.js/lib/languages/d.js'));
// the actual fetching of the language module can be triggered later by a user
// event – or they are never fetched if there are not actually needed.
someForm.addEventListener('submit', (ev) => {
ev.preventDefault();
// highlight now returns a Promise
hljs.highlight(someForm.language.value, someForm.code.value)
// the correct module is loaded only when we need it
.then(result => {
// update the highlighted HTML…
})
.catch(err => {
// the Promise rejects if someForm.language.value doesn't correspond to a
// registered language or an alias.
alert('Unknown language?');
console.error(err);
});
}); I might be wrong, but I am under the impression that most highlight.js use a subset of the languages it provides (and sometimes only one). If we are moving to ESM on v11, I think we should do the extra work of having a Promise-based API, users would benefit from it. |
Thanks for chiming in. This is definitely all very interesting. I had originally added lazy loading as a feature request but no one seemed to care about it, so I let it go. A few thoughts:
Questions:
The original idea of upgrading
Being a tiny team with very limited resources we tend to be a bit risk adverse and typically that serves us well. |
Note that I'm not suggesting we remove the bundle CDN browser script: I think we should keep it (for people who don't want to deal with ESM, or want to support older browsers, or don't want to deal with lazy-loading). What I'm suggesting is to let its size grow, consider it a non-concern, and nudge people to use the ESM version if they want to save some of their users' bandwidth.
My feeling is that web pages that do includes snippets of all 190 languages must be pretty rare, and that most pages must have something like 10 different languages tops – but I have no data to back this up though. But anyways, if your website is likely to contain all the languages, do use the bundled version, I agree lazy loading is not for you.
True, although that's something we can solve at build time.
Not really: https://stackoverflow.com/questions/45804660/is-it-possible-to-use-subresource-integrity-with-es6-module-imports. The Import Assertions proposal has reached stage 3, SRI may be added to the spec at a later time, but currently dynamic
They usually have an option to let the user chose (E.G.: for Rollup it's called
My take on this would be to be as lazy as possible, because it's easier 😅 for us to implement, and for the users to understand what the library is doing. We should document that if the user knows what languages they are going to use, they can preload the relevant modules: <link rel="preload" href="/path/to/highlight.js/languages/js.js" as="script">
<link rel="preload" href="/path/to/highlight.js/languages/html.js" as="script">
<!-- or -->
<script type="module" href="/path/to/highlight.js/languages/js.js" integrity="…"></script>
<script type="module" href="/path/to/highlight.js/languages/html.js" integrity="…"></script> If they can't be bothered, they can either accept the tradeoff of lazy-loading, or the file size of the browser bundle. |
I wasn't referring entirely to actual real-life webpages or use cases - but rather implimentor behavior, including the possibility of confusion and mistakes.
So it seems the actual need here is for a tiny fraction of use cases:
Given all that, I feel right now lazy-loading is best handled outside of core. This issue was originally created following some discussions with Stack Overflow, who are extremely size sensitive. It took 3 months before anyone else chimed in on the topic. I just don't think many people actually need this functionality (or care about size super strongly) - and even if I'm mistaken, I don't (at this time) see huge advantages to it being in core vs a plug-in/add-on. It seems (esp. after we release an ESM npm package) one could very easily write a small "wrapper" package such as
I'd suggest this is even quite possible today without much effort using |
Fair enough, I think I share your POV on this, most people are fine with highlight.js the way it is. That doesn't mean we shouldn't try to improve it (I mean, that's why you opened this issue, right?), but apologies if my comments were off topic.
What if a magic PR came down your inbox that added support for lazy-loading in core? Are you open to be convinced, or would you rather wait for another time? |
No need for an apology - I'd say you were definitely on-topic.
Well, it's not just about who does the initial work (and believe me I appreciate the willingness). That's often only the tip of the iceberg. Often the long-term support/maintenance of a feature can far outweigh any initial outlay of development time. Our current trend has been towards shrinking and simplifying the core codebase and public API - while extending the plug-in API as it makes sense... things not critical to "core" functionality are then pulled out into plugins, etc. The old but time tested "less is more" approach. With our current tiny team a vibrant plugin community with individual authors who are passionate about maintaining and supporting their own particular plugins has a lot of benefits. |
I like #2846 (comment) idea about code splitting, dose anything like this ever got implemented? |
Not fully. Perhaps also see my thoughts just added to #3199. We do currently ship everything as separate ESM modules: core, all the grammars, etc.... You can dynamically import them and register them if you write the code for it... What doesn't exist is the glue to tie it all together into some type of |
I wanted to open a discussion about the direction we're headed with regard to file size. It's come up recently in talking to StackOverflow, but otherwise I don't hear much about it.
First some history
In terms of "absolute" relative size we've grown quite a bit in the past year or so. All numbers are stated in terms of gzipped size.
:common
subset, gzipped:common
gzippedWe've close to doubled the size of our common distributable. Yet in that time we also added several new languages to
:common
also:33% of our size increase comes from just these new languages (i.e. ~20kb to 28kb). The rest comes from numerous grammar improvements, parser improvements, etc...
Here and Now / My Thoughts
We have our new "higher fidelity" initiative in #2500. Both 37kb and 20kb seem tiny to me. Yes, it's possible to build much larger builds. The full library (with every grammars) weighs in at a whopping 272kb.
All the feedback I see here on issues is of the "please, better highlighting, highlight more, highlight better" variety... I can't remember anyone pushing back with "the library is too large, make it smaller". I wanted to open the topic to see if anyone has any thoughts on this.
Personally I feel that our situation now is good and that increasing the bundle even 30-40% would be a win if we end up with much more nuanced highlighting at a result. (I don't think the size will actually increase that much though.) I don't see how we can keep the size the same as we pursue higher fidelity and more nuanced highlighting. Many of the recent "language reboots" (LaTex, Mathematica, etc) have seen huge improvements in those grammars - but also a significant increase in the grammar size.
I still think a very "popular" use for Highlight.js is on a small website/blog where one is using a subset of the languages, not a full build (or anything even close). Then on the other end you have huge sites like Discourse and StackOverflow building larger bundles. In those cases I think the right solution (if size becomes a problem) is to lazy-load the grammars on demand. Which we've always made easy to do and it just got easier with my PR to eliminate all run-time dependencies between languages.
A good portion of our size is keyword bundles... There has been talk recently of whether (in some languages) we could detect
CamelCaseClassThingy
rather than a hard-coded list... and while we could do that it (removing some keywords) it would have detrimental effects on our auto-detection capabilities which for many languages is highly dependent on large keyword lists.Also, there are other highlighters. It's always been my advice that if "small size" is a key requirement for someone that Prism might be a better choice as they tend to rely a lot more on tighter regex, simpler grammars, and dependency stacking... which helps them keep the size of each grammar smaller.
So I see our library continuing to grow slowly in size with every new release... and continuing to highlight with more nuance... with of course continued improvements to the parser and auto-detect when possible.
Note: Currently every language is built as a stand-alone module - which hurts our non-compressed size - since some dependency modules end up being duplicated in the source... this should have less bearing on the final gzip size though and there are also plans to fix this in the future (when using the official build system to build a monolithic distributable).
The text was updated successfully, but these errors were encountered: