Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: try different approach to JSX #2412

Closed
wants to merge 1 commit into from

Conversation

joshgoebel
Copy link
Member

This is an attempt at a more modular approach to JSX.

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 19, 2020

I'm most curious for thoughts on desired output. Since JSX can include embedded JS you get into this vicious cycle where now you really need to jump back into javascript sublanguage again... so you end up with code like:

<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">BrowserRouter</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="javascript">{<span class="hljs-comment">/* comment */</span>}</span>
          <span class="javascript">{ test <span class="hljs-string">"}"</span> }</span>

Is it really critical/necessary to flag all of these as javascript, or should they really be something more like subst?

Then you also have JS style comments inside HTML tags (which seems to be valid and used):

box = (<div
  box="yes" // comment
/>)

Which results in code like:

<span class="hljs-attr">box</span>=<span class="hljs-string">"yes"</span> <span class="hljs-comment">// comment</span>

But technically the //comment is Javascript code, not HTML... so should it really also be in javascript or subst block?

I think the answers start to depend a lot on whether you think of this as a back and forth between JS and HTML or rather one larger meta-language such as "JSX". In the latter case for example, I don't think the comment needs to be flagged as JS... though I do wonder in that case if jsx wouldn't be a better over-arching class than xml.

@egor-rogov @allejo

@egor-rogov
Copy link
Collaborator

But technically the //comment is Javascript code, not HTML... so should it really also be in javascript or subst block?

I think the answers start to depend a lot on whether you think of this as a back and forth between JS and HTML or rather one larger meta-language such as "JSX". In the latter case for example, I don't think the comment needs to be flagged as JS

I don't think it's our goal to mark different parts of code with appropriate languages. Our goal is to highlight comments as comments, no matter which language they belong to.

At the first glance the PR looks complicated. Registering of an internal language, a module in lib... I don't like the idea very much.
Btw, please remind me, what's wrong with the current implementation?

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 3, 2020

Btw, please remind me, what's wrong with the current implementation?

There is no good way to properly handle {{ javascript.code() }} or // comments inside HTML in the current implementation since neither of these things are valid HTML, but they are valid in HTML inside JSX.

Hence needing an internal JSX variant of HTML.

I don't think it's our goal to mark different parts of code with appropriate languages.

subLanguage has always done that... are you suggesting perhaps that it should not? It's certainly very ugly and I've always disliked how complex it makes the test output.

At the first glance the PR looks complicated. Registering of an internal language, a module in lib... I don't like the idea very much.

Would you prefer a run-time require vs the module? In this case there is no point at all of the "_jsx" language without Javascript, so registering and linking at run-time seems wrong. And splitting the sublanguage out into it's own module is just good organization - so I'm not completely sure what you're objecting to here.

Registering of an internal language

Why? Does the idea of internal languages bother you in general? Would it bother you less if they were hidden and not in the listLanguages namespace?

Would a single internal: true flag perhaps suffice?

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 3, 2020

Also, the reason it's so dynamic is because eventually this can be used inside Javascript or Typescript, so you won't just have a single "_jsx" grammar. You'd have variants also: _jsx_javascript, _jsx_typescript, etc... since the JSX grammar needs a way to embed snippets of the parent grammar. So each variant would have a different internal sublanguage value with {{ }} blocks.

And the reason we need a sublanguage is because HTML is case insensitive while JS is not... and currently that is a global per language setting.

Although I just had another thought that I need to look into.

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 3, 2020

Although I just had another thought that I need to look into.

Picking out HTML tags singly (one by one) inside of JS is another [much simpler] idea... But that is also a much "dumber" solution, in that it can't identify blocks of HTML - and it's also broken in that it would incorrectly highlight JS code (or keywords) inside of HTML blocks. Ex:

var x = (<div>class is not a keyword here</div>);

I'm not sure how people are using HLJS to highlight JSX code but if it was me I'd want a different background/style applied to xml blocks inside Javascript... they really should be rendered more like "string" or heretic something.

So it seems we absolutely need to detect actual blocks of HTML (which is how this has always worked in the past) not just individual HTML tags here and there.

@allejo
Copy link
Member

allejo commented Mar 3, 2020

You'd have variants also: _jsx_javascript, _jsx_typescript

What other variants are there? Also, wouldn't this be jsx and then tsx?

I'm not sure how people are using HLJS to highlight JSX code but if it was me I'd want a different background/style applied to xml blocks inside Javascript... they really should be rendered more like "string" or heretic something.

While the React website uses Prism, can't we use the React website as the basis of how people [would] use highlight.js to highlight JSX? https://reactjs.org/tutorial/tutorial.html#what-is-react

So it seems we absolutely need to detect actual blocks of HTML (which is how this has always worked in the past) not just individual HTML tags here and there.

Excuse the ignorance, when you say "blocks of HTML" do you mean matching the <div> with the </div> in your example and consider that a "block?"

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 3, 2020

What other variants are there? Also, wouldn't this be jsx and then tsx?

I'm not sure, maybe only those two. The names don't matter they are just internal identifiers. If the class/sublangauge names matter then we can encode pretty names and then we can discuss what those might be... But you're right I'm probably using the names wrongly. I tend to think of JSX as just the actual HTML part of the JS, but really I think it's the name for the whole "flavor" of JS. ...so it's possible the pretty names might both actually be the same, ie just "xml" perhaps.

I'm using "jsx" as a stand-in for the concept of "html embedded in JS in JSX style".... we need a name for that concept since it's the SAME concept shared between JSX and TSX and I need something to name the module/file/etc. :-)

While the React website uses Prism, can't we use the React website as the basis of how people [would] use highlight.js to highlight JSX? https://reactjs.org/tutorial/tutorial.html#what-is-react

It's one point of data, true, but see my larger point about why we need to identify WHOLE blocks in any case - regardless of how we colorize the background.

Excuse the ignorance, when you say "blocks of HTML" do you mean matching the

with the
in your example and consider that a "block?"

Yes. We need to know "html starts here" and "html ends here" or else we have no idea whether we are highlighting JS or HTML.

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 3, 2020

Actually, perhaps this could build on the work in #2261 and then we have a tiny HTML parser... so that we're actually MATCHING beginning and end tags... I think JSX is always supposed to have a single outer wrapping tag in any case, is it not?

You still have the case issue, but that can be solved for a small case like this with just double specifying the case in the HTML matching rules.

@egor-rogov
Copy link
Collaborator

I had another look into this. Inasmuch I can't propose something better/simpler, I'm not against it.

Registering of an internal language

Why? Does the idea of internal languages bother you in general? Would it bother you less if they were hidden and not in the listLanguages namespace?

Would a single internal: true flag perhaps suffice?

My concern here is that an internal language can potentially cause some hard-to-troubleshot problem, like name collision or something. That's why I'd prefer either to make _jsx a rightful grammar or to encapsulate it in another namespace.

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 21, 2020

My concern here is that an internal language can potentially cause some hard-to-troubleshot problem, like name collision or something. That's why I'd prefer either to make _jsx a rightful grammar or to encapsulate it in another namespace.

Except jsx (where jsx refers to the HTML inside a jsx file) isn't a grammar, it's only a sub-grammar. You don't think _ is already a namespace? Would __jsx make you any more comfortable? :-) The name will be auto generated for the language so conflicts isn't really a concern I have... Ie you'd have __jsx_javascript and __jsx_typescript, etc.

A conflict isn't going to happen and if it does, it's the users problem - since these would be in the languages list they would be visible and avoidable.

@egor-rogov
Copy link
Collaborator

The name will be auto generated for the language so conflicts isn't really a concern I have... Ie you'd have __jsx_javascript and __jsx_typescript, etc.

How's that?

  hljs.registerLanguage("_jsx", JSX_LANGUAGE({containers}));

A conflict isn't going to happen and if it does, it's the users problem - since these would be in the languages list they would be visible and avoidable.

It's not visible when you're looking into src/languages catalog...

But yes, underscore is like a namespace, so maybe we'll just need a word or two in developer's docs about this naming convention, and that's it.

@joshgoebel
Copy link
Member Author

How's that?

That's just how it would work internally - to prevent conflicts.

It's not visible when you're looking into src/languages catalog...

No, because it's not a true grammar. :-)

But yes, underscore is like a namespace, so maybe we'll just need a word or two in developer's docs about this naming convention, and that's it.

I agree only to the extent that we should make it easy for people to filter them out (if need be), say if they wanted to say build a dropdown or something - so to that end a small mention in the docs might make sense. I still think I might scrap this whole approach in favor of the new callbacks though and just match the pair of HTML tags with callbacks... this was an experiment. We'll see.

Right now I'm focused on other things and waiting for the website/server changes we need so we can do a proper actual v10 release. So this stuff is likely pushed until 10.1 anyways.

@joshgoebel joshgoebel modified the milestones: 10.1, 10.2 Jun 11, 2020
@joshgoebel joshgoebel closed this Aug 6, 2020
@jamiebuilds
Copy link

Hi @joshgoebel, can I ask why this was closed? I'd be happy to take this PR over and fix any remaining issues. Unless you want to take a different approach entirely, also happy to do that.

@joshgoebel
Copy link
Member Author

Wish I had left some notes, LOL. Please see the newly created #2998

@joshgoebel
Copy link
Member Author

I'd suggest reading #2998 and the 3 other threads fully and then if you think you have some unique insights we'd be happy to hear them. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants