Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixMarkup useBR: true removes HTML tag if it starts at index 0 of string #2529

Closed
andrewQwer opened this issue May 5, 2020 · 34 comments · Fixed by #2532
Closed

fixMarkup useBR: true removes HTML tag if it starts at index 0 of string #2529

andrewQwer opened this issue May 5, 2020 · 34 comments · Fixed by #2532
Labels
bug help welcome Could use help from community parser
Milestone

Comments

@andrewQwer
Copy link

andrewQwer commented May 5, 2020

fixMarkup and highlightBlock methods with {useBR: true} break markup.

I noticed, that hljs.highlight(...) generates the following code:

"<span class="hljs-symbol">[Created By]</span>"

Then fixMarkup for this string generates the following:

"[Created By]</span>"

Just comment configure method call and you'll see the difference (console output provided as well)
https://jsfiddle.net/t1maqeb7/6/

I believe it should not break markup this way.

@andrewQwer andrewQwer added bug help welcome Could use help from community language labels May 5, 2020
@joshgoebel
Copy link
Member

joshgoebel commented May 5, 2020

I'm assuming like no one is using this feature since it hasn't changed in FOREVER... I trace the regex back to:

joshgoebel@6424e64

Although to be clear you

    if (tabReplace) {
       result.value = result.value.replace(/^((<[^>]+>|\t)+)/gm, function(match, p1, offset, s) {
         return p1.replace(/\t/g, tabReplace);
       })
     }

But what does <[^>]+> have to do with tabs? Is it trying to replace tags INSIDE HTML? The nested replace here is quite strange.

@joshgoebel
Copy link
Member

@isagalaev Any idea what this is for?

@joshgoebel
Copy link
Member

#2532 add a test for this.

@joshgoebel
Copy link
Member

Does anyone remember why fix markup is part of the public API anyways?

@andrewQwer
Copy link
Author

andrewQwer commented May 5, 2020

even if it is not a part of the public API it is used inside highlightBlock that causes the same issue when useBR is set. And if I want to use just highlight method I'm loosing a chance to replace \n with <br/>

@joshgoebel
Copy link
Member

even if it is not a part of the public API it is used inside highlightBlock

Sure, I was just wondering why we're allowing it to be used outside when it seems purely an internal matter. We need to focus on highlighting, not providing people string utility functions. :)

@joshgoebel
Copy link
Member

And if I want to use just highlight method I'm loosing a chance to replace \n with

Why wouldn't you use highlightBlock?

@andrewQwer
Copy link
Author

I'm in React.js world and don't wont highlight.js to modify innerHTML :)

@joshgoebel
Copy link
Member

joshgoebel commented May 5, 2020

Why does your code have <br> in it? IE, is there a reason you purpose chose to not built it into a <pre> tag?

@joshgoebel joshgoebel added parser and removed language labels May 5, 2020
@joshgoebel
Copy link
Member

And if I want to use just highlight method I'm loosing a chance to replace \n with

Yeah, that's probably why it's external. :-) Though really it'd probably be better long-term to build it into highlight or allow highlightBlock to take a block of html rather than a block element.

@joshgoebel
Copy link
Member

joshgoebel commented May 5, 2020

I'm in React.js world and don't wont highlight.js to modify innerHTML :)

And FYI you can still use highlightBlock, pretty sure there is no rule the block has to be inserted into the DOM.

@andrewQwer
Copy link
Author

andrewQwer commented May 5, 2020

Why does your code have <br> in it? IE, is there a reason you purpose chose to not built it into a <pre> tag?

I translate what is typed in textarea that uses \n as a new line into highlighted code that is rendered to <span>. Will try use pre, thanks.

@joshgoebel
Copy link
Member

joshgoebel commented May 5, 2020

This is definitely a bug, but pre is much simpler, fewer moving pieces. :-)

@andrewQwer
Copy link
Author

andrewQwer commented May 5, 2020

And offtop question, but maybe you know how to correctly include your library to babel-loader config that ignores node_modules? :) I need it to make it work in IE11.

The following doesn't work as well as lots of other versions of this config ...

test: /\.js$/,
        include: [list of directories]
        ),
        exclude: [
            /node_modules[\\/](?!(highlight\.js)\/).*/
]

@joshgoebel
Copy link
Member

joshgoebel commented May 5, 2020

I'm not sure why you're using fixMarkup at lal actually... it sounds like in your case (a span) that the LAST thing you'd want to do - before render is ADD <br> to replace \n, not be removing them.

but maybe you know how to correctly include your library to babel-loader

Nope, sorry.

@joshgoebel joshgoebel changed the title useBR breaks markup useBR removes first HTML tag if it starts at index 0 of the string May 5, 2020
@joshgoebel joshgoebel changed the title useBR removes first HTML tag if it starts at index 0 of the string fixMarkup useBR: true removes HTML tag if it starts at index 0 of string May 5, 2020
@andrewQwer
Copy link
Author

I'm not sure why you're using fixMarkup at lal actually... it sounds like in your case (a span) that the LAST thing you'd want to do - before render is ADD <br> to replace \n, not be removing them.

According to a codebase it was supposed that useBR replaces \n with <br/> by calling fixMarkup method, but highlight method doesn't call it inside. That is why I used it externally :)

@joshgoebel
Copy link
Member

joshgoebel commented May 5, 2020

useBR replaces \n with <br/> by calling fixMarkup method

It does not. fixMarkup only does the opposite. \n => <br> is internal to the highligthBLock method itself if you look.

@joshgoebel
Copy link
Member

@andrewQwer If you could comment over on the related thread about how you're using/integrating the library with React and how we could perhaps make it easier, that might be useful.

@andrewQwer
Copy link
Author

@andrewQwer If you could comment over on the related thread about how you're using/integrating the library with React and how we could perhaps make it easier, that might be useful.

Where is this thread? :)

@joshgoebel
Copy link
Member

@allejo Any idea on this one?

@joshgoebel joshgoebel added this to the 10.1 milestone May 7, 2020
@allejo
Copy link
Member

allejo commented May 7, 2020

@yyyc514 oh geez, you've sent me down a terrible, terrible rabbit hole with this. I've found these two commits that should be helpful as to why it was introduced, f30dbf6 and 78f21f0.

It looks to me like this is legacy behavior since highlight.js 5.x (at least) where you could... combine custom markup? This current bug looks to be introduced in #1453; it probably should have been return p1; instead of return '';.

But please! +1 for removing this horror from the public API.

@joshgoebel
Copy link
Member

@yyyc514 oh geez, you've sent me down a terrible, terrible rabbit hole with this.

Oh now that I've looked I'm sooooo sorry. I guess I didn't go back far enough... ugh.... so the intention is to preserve the tabs inside custom markups while removing from from the source, yes?

@joshgoebel
Copy link
Member

joshgoebel commented May 7, 2020

Ok so we only replace all LEADING tabs... but there might be intervening custom markup in the way... ugh. So I understand it now.

I guess I rely on the magic stream merge stuff too often... feels like this is the kind of thing it should be doing, but magic it's magic only works when the character counts are exactly the same... so it you did this to the raw text then tried to merge the streams all hell would break lose.

@joshgoebel
Copy link
Member

joshgoebel commented May 7, 2020

I wonder what's wrong with replace ALL the tabs with tabReplace, not just the leading tabs?

@allejo
Copy link
Member

allejo commented May 7, 2020

I can't replicate it replacing just the leading tabs. The following snippet replaces all of the tabs even if it's inside the markup

var tabReplace = "  ";
var value = `<pre><code class="python">for x in [1, 2, 3]:
<span style="background:yellow">	</span>count(x)
	if x == 3:
	<span style="background:yellow">	</span>count(x + 1)
</code></pre>`;

value = value.replace(/^((<[^>]+>|\t)+)/gm, function(match, p1, offset, s) {
    return p1.replace(/\t/g, tabReplace);
});

console.log(value);

Note, this snippet is the old regex from 78f21f0. So I can't replicate what behavior they were trying to achieve at the time.

@joshgoebel
Copy link
Member

joshgoebel commented May 7, 2020

The ones inside markup are still leading... that's what this insanity is doing.

<tab><markup><tab><markup><code><tab><code>

The first two tabs get replaced - they are indents - but not the last tab, it's content. Same behavior as VS Code "change indentation to spaces", but man this regex is only covering the most basic case... and ugh...

@allejo
Copy link
Member

allejo commented May 7, 2020

OH! Ok, if I add a tab inside of count in my snippet, then that tab remains since it's content.

@joshgoebel
Copy link
Member

Yes, you got it. :)

@joshgoebel
Copy link
Member

it probably should have been return p1; instead of return '';.

And yeah that seemed pretty obvious but like most problems I wanted to understand the big picture before rushing in - and see if we could perhaps clean up some of the mess. :-)

@allejo
Copy link
Member

allejo commented May 7, 2020

see if we could perhaps clean up some of the mess. :-)

+1 for this!

@isagalaev
Copy link
Member

I was walking by, and while I didn't read through the entire thing I wanted to say this. There were a few features added to the core at users' requests when it was all small an nimble. These days I would say it makes much more sense to be opinionated and remove corner cases like useBR, tabWidth and several different spellings of "lang-", "language-" as language prefixes in the class name (as far as I remember there is a very precise recommendation in HTML5 to use "language-"). Everyone with special cases would be expected to do pre- or post-processing.

(Those are just my feelings towards it, feel free to use your own judgment!)

@joshgoebel
Copy link
Member

joshgoebel commented May 15, 2020

These days I would say it makes much more sense to be opinionated and remove corner cases like useBR, tabWidth and several different spellings of "lang-", "language-" as language prefixes in the class name (as far as I remember there is a very precise recommendation in HTML5 to use "language-"). Everyone with special cases would be expected to do pre- or post-processing.

If @egor-rogov doesn't object I'd love to put useBR on the chopping block for removal in version 11 then. It could easily be reproduced now with a few line highlightBlock plugin... and I think people who aren't using pre/code are doing it slightly wrong to begin with. :-)

Perhaps replaceTabs too... another piece that could easily be a 1-2 line plugin.

I'll take this as a definite vote that you don't mind killing the fixMarkup api then. :-)

@joshgoebel
Copy link
Member

Related #2534

@isagalaev
Copy link
Member

I'll take this as a definite vote that you don't mind killing the fixMarkup api then. :-)

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help welcome Could use help from community parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants