Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

highlightjs.org build seems to differ from ./tools/build.js -t browser :common #2291

Closed
joshgoebel opened this issue Nov 20, 2019 · 15 comments · Fixed by #2511
Closed

highlightjs.org build seems to differ from ./tools/build.js -t browser :common #2291

joshgoebel opened this issue Nov 20, 2019 · 15 comments · Fixed by #2511
Assignees
Labels
package/build Issues relating to npm or packaging parser
Milestone

Comments

@joshgoebel
Copy link
Member

Expected behavior: The build would be identical to building locally using the build process.

Copying from #1245

I've asked few times before, where is the code which is responsible for creating the download and/or concatenating the fragments into a single download?

You'd think it was the build process, but it seems the results are different. Only @isagalaev knows I think. The code generated on the website is putting hljs.registerLanguage OUTSIDE the original enclosure and the code generated by the build process places it inside. @isagalaev Is the website using a special build system instead of the one in the repo for some reason? hljs as a global only works in the web environment (where anything on window is treated as a global), so that is why it's failing in node.js.

@joshgoebel joshgoebel self-assigned this Nov 20, 2019
@joshgoebel joshgoebel added package/build Issues relating to npm or packaging parser labels Nov 20, 2019
@joshgoebel joshgoebel assigned isagalaev and unassigned joshgoebel Nov 20, 2019
@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 20, 2019

This will become even more important as we move forward with changes to the codebase and build system... the website needs to be using the same exactly build system we are using for development work and continuous integration or else this problem is going to get worse in the near future - with no guarantee that the builds the website produces are even usable at all.

@joshgoebel
Copy link
Member Author

@isagalaev Ping.

@isagalaev
Copy link
Member

@yyyc514 I've been missing out on my bug mail lately… Life stuff, as usual. And I do remember I wanted to publish/open up the site's code, but haven't gotten to it yet.

Now, onto the build stuff. The site doesn't use build.js -t browser because it's supposed to build a custom package according to check boxes, and building it from scratch every time is too expensive. So instead I do this:

  • upon a new release the site runs build.js -t cdn and stores all pre-build language files in a directory, and then build.js -t cdn none to create a pre-build core without any language files and stores it too
  • upon a download request it concatenates them into a single piece and zips it on the fly, which is fast

So there is a (undocumented) assumption that a browser build is basically just a concatenation of files stored on a CDN.

Did build.js -t browser change recently which could break this assumption?

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 14, 2019

So there is a (undocumented) assumption that a browser build is basically just a concatenation of files stored on a CDN.

Obviously true, but I'd say long term that is a faulty assumption. It prevents us from doing useful things (for a monolith build) like dead-code/constant elimination, allowing languages to share code (at compile time via ES6 modules), etc...

Did build.js -t browser change recently which could break this assumption?

Not significantly, not yet.

But, I've been playing with ES6 module based builds that import everything in one file and then compile/minify/terser that result, and it's also (not surprising) smaller than just concat-ing a bunch of individual files with all the repeated registerLanguage boilerplate.

So I'd like to see the build process continue to improve... I suppose we could just accept that the source build scripts were "better" than the build the website gives you, but then we're introducing a fourth deliverable (CDN, browser, node, and now WEBSITE browser build)- and there does seem to be some expectation from users (not wholly unreasonable) that building from git and building on the website would produce identical output.

@isagalaev
Copy link
Member

isagalaev commented Dec 15, 2019

I agree with the general gist. I'd like to eventually get rid of the site code imitating the build process. However I have a pretty strong feeling that running an actual build process for every download would not be feasible. Which means that we need to get rid of the whole idea of custom builds made by the site. Unless I'm missing another option?

In fact I've been waiting for technology, namely HTTP/2, to mature enough to not even need a one-file distribution, custom or not. So that most browser users would do something like:

<script src="{cdn-path/highlight.js}"></script>
<script>
hljs.initHighlightingOnLoad('html', 'css', 'javascript', ...);
</script>

The last line might generate separate <script> tags referring to individual languages or do some fancy module loading (I have no idea if modern browsers can finally do it).

If we could do that then we can deprecate the "browser" build altogether and just tell people to use whatever packing solution they like.

@joshgoebel
Copy link
Member Author

However I have a pretty strong feeling that running an actual build process for every download would not be feasible.

Well, without seeing the requests per minute numbers of peak traffic it's hard to say, but I for sure understand your concerns with the old build process being so slow. I'm not sure the newer one is much faster.

Which means that we need to get rid of the whole idea of custom builds made by the site. Unless I'm missing another option?

I do think if the server-side were all Node.js (not Python) that you might have some options with caching the intermediate state that Terser/Rollup allows you to do, but really I'm just spitballing. I know that can help a lot with Rollup but I'm not sure it can help a lot with Terser (or if it's an option).

If we could do that then we can deprecate the "browser" build altogether and just tell people to use whatever packing solution they like.

Time will tell I suppose. :-)

@joshgoebel
Copy link
Member Author

I think for the present I see two obvious paths:

  • Cripple our browser build to pretty much duplicate the current "concat" strategy. True, it's not currently a lot different presently, but I'd had high hopes we could do some creative things with it moving forward.
  • Potentially add a note that the website build is a "convenience" build and that you get the tightest/most optimized build by building directly from source.

@isagalaev
Copy link
Member

you might have some options with caching

You know, I looked at the stats now, and I think it can actually work. Out of 327 downloads, 277 (85%) is the default set, with the rest of them being mostly unique combinations of languages. I think users won't be too upset if downloading a unique combination takes several seconds. I'll look into it.

Potentially add a note that the website build is a "convenience" build and that you get the tightest/most optimized build by building directly from source.

Yes, that's a good interim solution. As long as the concatenated result actually works :-)

@joshgoebel
Copy link
Member Author

Yes, that's a good interim solution. As long as the concatenated result actually works :-)

Well, it should so long as we don't break the basic CDN/registration stuff. The one reported issue so far is that the website build can't be used by node, only in the browser, but we've decided that's a non-issue as the intended target of the website build is the web, not node.

@joshgoebel
Copy link
Member Author

You know, I looked at the stats now, and I think it can actually work. Out of 327 downloads, 277 (85%) is the default set

Not even what I meant. :-) But yes caching the "whole shebang" can help also. I meant that most of these build libs have internal caches that can make doing the work "repeatedly" much faster... but that's usually assuming you have a long running process. So if you're running Python and only spawning short-running node.js processes for building that might not be any help - even if it would speed things up.

@isagalaev
Copy link
Member

So if you're running Python and only spawning short-running node.js processes for building that might not be any help

Ah, got you now.

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 23, 2019

@isagalaev Have you tested our website build for Web Worker usage? I don't think it will work because the worker doesn't have window and therefore hljs isn't available to register languages... In my own modified build I add the following BETWEEN the core-lib and the language modules:

if (typeof importScripts === 'function') { var hljs = self.hljs; }

IE:

var workerStub = "if (typeof importScripts === 'function') { var hljs = self.hljs; }";

  var fullSrc = [
    header, libraryCore, workerStub,
    ...languages.map((lang) => lang.moduleSrc) ].join("\n");

Could we make that fix? Is that a reasonable resolution you think? It's certainly a quick fix.

@joshgoebel joshgoebel added this to the 10.0 milestone Dec 26, 2019
@joshgoebel
Copy link
Member Author

@isagalaev Actually the new UMD wrapper code uses this, not self which seems to "just work" with workers, so I've removed the stub as it's no longer required and that solves that problem...

So can we decide a path forward here? If it's not possible for the website to run the actual build process (which is only going to get better/more tightly integrated with time) can we add a comment saying that for the "most optimized" build one should build directly from source?

We can work on the verbiage... I'm just trying to figure out the "next steps" for some of the more nebulous issues we have open still.

CC @egor-rogov

@joshgoebel joshgoebel modified the milestones: 10.0, 10.1 Mar 3, 2020
@joshgoebel
Copy link
Member Author

can we add a comment saying that for the "most optimized" build one should build directly from source?

The new README already does this, but is its still worth adding a mention to the website also?

@joshgoebel
Copy link
Member Author

@isagalaev Given any more thought to this? I could close this with #2511 if we really wanted. Just switching to a global period and then exporting that with module.exports means the website builds can now be used for the web OR required directly with Node.

There are still build improvements in the future that wouldn't allow the website to have access to, but for now it would mean parity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package/build Issues relating to npm or packaging parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants