HTML renderer should distinguish between block and inline HTML #1745

Martii · 2020-08-11T02:57:17Z

Describe the bug
<kbd>key</kbd> doesn't seem to be generating correctly for us. See final rendered output at https://openuserjs.org/about/Falkon#debugging and source for this file here.

See also:

Our server side marked manipulation here with most set options floating around here.
Our DOM manipulation settings at here.

Any help on understanding this change from 0.8.0 would be greatly appreciated. :) I also hope this issue is not replicated but I did do a brief search on this repo including code for kbd and only found the smartypants option which we've never used.

Tested in:

Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0 , Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36 and under Windows to rule out any browser issue.

The text was updated successfully, but these errors were encountered:

UziTech · 2020-08-11T05:01:16Z

It looks like it is the sanitization that is closing the open tag before the text then removing the close tag after the text.

Opening and closing inline html tags are tokenized separately so sanitize html is seeing <kbd> and changing it to <kbd></kbd> the seeing </kbd> and removing it.

You can see the difference in tokenization in the v0.8.0 demo and the v1.1.1 demo

Martii · 2020-08-11T06:39:17Z

Appreciate the look... I'll retest it again... however...

Re:

It looks like it is the sanitization

But my test in dev is this from right above that code a few lines:

function sanitize(aHtml) {
  //return sanitizeHtml(aHtml, htmlWhitelistPost);
  return aHtml;  // Test disabling sanitizer
}

... and it still comes back with:

<kbd></kbd>Ctrl + <kbd></kbd>Shift + <kbd></kbd>i

Here's the retest with no sanitization in dev. Same issue which seems to be highly pointing to marked:

If I do this here:

exports.renderMd = function (aText) {
  //return marked(aText);
  return aText;
};

... then all is well with the native html (md isn't processed so it's mostly md but the Ctrl + Shift + i is not having issues with sanitizing):

UziTech · 2020-08-12T00:33:44Z

it looks like the line hookNode.innerHTML = sanitized; also does the same thing

Martii · 2020-08-12T05:13:52Z

the line hookNode.innerHTML = sanitized; also does the same thing

That's just the final sanitized String results (which are bypassed in dev atm from item 2 above and retested/reshown in the first screen cap above) i.e. no change for this equivalent of documentFragment implementation in node.

I'm thinking more along the lines of the renderer (Camel caps too) portion of the line of var sanitized = sanitize(marked.Renderer.prototype[aType].apply(renderer, arguments));. If i'm bypassing any and all sanitizing by item number 2 up there with just calling the function and returning no manipulation it could either be some Renderer prototype change in this project or perhaps some worse issue in node.

When I get a few extra cycles I'll try it in latest node@14.x and see if it's a latest node@12.x issue specific to something this package utilizes. EDIT: Same issue in node@14.x.

As I mentioned above this is one weird issue. :\

Misc references for myself here:

Here's the code compare changes between where the issue appears in this project.
Notable renderer changes at lib/marked.js and lib/marked.esm.js.

UziTech · 2020-08-12T21:33:39Z

That's just the final sanitized String results

yes but it does change the string. JSDom must be doing some sort of sanitization when setting innerHTML

hookNode.innerHTML = '<kbd>';
console.log(hookNode.innerHTML); // '<kbd></kbd>'

hookNode.innerHTML = '</kbd>';
console.log(hookNode.innerHTML); // ''

Martii · 2020-08-13T01:23:53Z

JSDom must be doing some sort of sanitization

Interesting guess and still appreciate the thoughts but...

It's just a filter for @mentions. I took the jsdom dep out of the picture early this morning by commenting them out. Same marked issue. Then I redisabled sanitize-html dep and then it went to the output that is on marked demo page (dumped our whole document into marked's server).

It is highly improbable that two extensions/packages would cause this issue not to mention the fact that marked@0.8.0 works perfectly. It's more probable that there is something broken in marked. If given a choice between insecure HTML, losing @mentions, and using the latest marked, we'll pick secure with sanitize-html along with jsdom and unfortunately have to migrate to another markdown processor or lower the version of marked... but I'd like to keep trying here as I like the project.

Those renderer changes are quite notable between 0.8.0 an 0.8.1 and perhaps it can't be utilized the way we are doing it with overriding the rendering or it's broken for extending the renderer in marked or there is a node issue floating around. It's going to take me quite some time to figure it out and I'll have to reread the advanced and pro docs here to see if any changes. Sizzle wrote the most of the initial markdown library so I get to go figure out what he did, any changes since then, and verify it.

I'll try backing out the relevant changes in marked manually in dev to see if that helps (or breaks which is always possible).

NOTE: This will change a bit during debugging.

Replacing 0.8.1's lib/marked.js with 0.8.0's lib/marked.js yields the kbd issue.
Checking marked's dependency changes between 0.8.0 and 0.8.1 ...

@@ -1,6 +1,6 @@
      "version": "7.1.0",
      "resolved": "https://registry.npmjs.org/acorn/-/acorn-7.1.0.tgz",
      "integrity": "sha512-kL5CuoXA/dgxlBbVrflsflzQ3PAas7RYZB52NOm/6839iVYJgKMJ3cQJD+t2i5+qFa8h3MDpEOJiS64E8JLnSQ==",

      "version": "7.1.1",
      "resolved": "https://registry.npmjs.org/acorn/-/acorn-7.1.1.tgz",
      "integrity": "sha512-add7dgA5ppRPxCFJoAGfMDi7PIBXq1RtGo7BhbLaxwrXPOmw8gq48Y9ozT01hUKy9byMjlR20EJhu5zlkErEkg==",

... acorn changed.

Replacing 0.8.1's lib/marked.esm.js with 0.8.0's lib/marked.esm.js yields the kbd issue.
Replacing 0.8.1's bin/marked with 0.8.0's bin/marked yields the kbd issue.
... Continue replacing one file at a time in 0.8.1's with 0.8.0's ...
BINGO .... marked/src/InlineLexer.js fixes 0.8.1 ... resetting marked dependency and retesting... CONFIRMED

Now I get to go see what changed there.

From send inline html to renderer #1602 :

--- ./InlineLexer.js    1985-10-26 02:15:00.000000000 -0600
+++ ~/tmp/marked.0.8.0/src/InlineLexer.js      1985-10-26 02:15:00.000000000 -0600
@@ -82,11 +82,11 @@
         }
 
         src = src.substring(cap[0].length);
-        out += this.renderer.html(this.options.sanitize
-          ? (this.options.sanitizer
+        out += this.options.sanitize
+          ? this.options.sanitizer
             ? this.options.sanitizer(cap[0])
-            : escape(cap[0]))
-          : cap[0]);
+            : escape(cap[0])
+          : cap[0];
         continue;
       }

Misc note: Some WEIRD file date/times from npmjs.com installs and here. My clock is set by NIST.

Martii · 2020-08-13T03:21:22Z

@UziTech

So now to find this change source from #1602 in the new tree structure for marked@1.1.1. Any quick knowing where? ( #1627 ...) I'll still look in the current HEAD. Will also have to compare the bug at #1601 and understand it fully.

Thanks in advance. :)

UziTech · 2020-08-13T03:45:41Z

It was actually the fix in #1602 that you pointed out that makes this happen.

Here is what is happening:

In v0.8.0 inline HTML would not be sent to the renderer so your renderer would just gets called with '<kbd>key</kbd>'.
send inline html to renderer #1602 fixed that so in v0.8.1+ your render gets called with '<kbd>' (which your renderer turns into '<kbd></kbd>'), then '</kbd>' (which your renderer turns into ''), and finally '<kbd></kbd>key'

The issue is that your renderer is treating inline html tags as a block of html.

Maybe we could tell the html renderer whether it is inline or not and your render function can ignore inline html tokens?

Martii · 2020-08-13T18:04:22Z

Understand what you mean now. The granularity of the html renderer method changed and increased. This was good for for #1601 but bad for the way we sanitize, mention, and whatever else we add in the future.

A couple of things that need clarification before I attempt to answer your question.

So in v0.8.0- it used to be solely a Block level rendering method and in v0.8.1 it became also an Inline level rendering method. Would this be a fair conclusion to make? If yes, are there other "cross-overs" like this? ❓
We tried back in the day to move sanitizing out of the renderer and it hosed syntax highlighting. There's not enough time and effort to cover 100% of their classes and other attributes they add in to successfully write exceptions to filter whether transformed or ignored.
We tried the text renderer but it was splitting up certain text elements into two, or more pieces depending on other md codes floating around. i.e. misses of hyper linking. Our users/authors have a sense of humor sometimes with showing us bugs in the rendering so mentions would be clipped under certain circumstances.
We ended up with what we've been dealing with during this issue because it was usually distinct complete blocks of HTML code. e.g. <kbd>key</kbd> as we've been using in this issue. The sanitizer is a must and the mentioner needs well formed HTML code or XPath won't be able to create the object needed for parsing.

So basically we've tried just about everything that we know of in positioning the sanitizer and the mentioner in various places.

So to attempt to have an answer for your question of:

Maybe we could tell the html renderer whether it is inline or not and your render function can ignore inline html tokens?

Some sort of flags argument, or equivalent, would definitely help to have an "end stage" value but my concern is still on my question Number 1 above in this comment. If Number 1 is a "yes, that's how it works logically"... Would it be better to have an inlinehtml render method and restore the block level html or use some sort of flag to indicate that html is at a different stage of the process like you mentioned? ❓ i.e. wouldn't want #1601 to recur depending on the answer here.

Another note loosely related to this: When using HTML <hr class="foo"> it never hits the renderers that we extend... I even tried the hr renderer but I presume that is for --- md codes. Without going too far off the topic, what did I miss? If I didn't miss anything... how does this relate to the granularity of html renderer? ❓

UziTech · 2020-08-14T04:27:28Z

Yes the text renderer gets inline and block text.
v1.0.0 introduced the tokenizer that might be a better place to do the sanitization before the tokens are created. Or you could run a function on every token to do the sanitization with the walkTokens option.
That could be a bug but without the offending markdown I can't be sure.
see 2. for better places for the sanitizer and mentioner.

Would it be better to have an inlinehtml render method and restore the block level html

Maybe but that would be a breaking change and we would like to reduce the number of breaking changes due to the number of dependents. If there is a non-breaking way it would be better to do that now and implement the breaking change later.

When using HTML <hr class="foo"> it never hits the renderers that we extend

It does when I try it. It calls the html renderer.

Martii · 2020-08-14T05:05:54Z

It does when I try it.

Hmmm... dev which is node@14.x atm isn't sanitizing it. On production it's sanitizing it (temporarily put it up something similar at the same section of "Debugging" under Falkon) which is node@12.x. Will look into this later. Could be another weird artifact of the breaking change from 0.8.0 to 0.8.1.

Ref:

Thanks for the newer feature recommendations and clarifications on which ones are dual. Will look into tokenizer as soon as I can.

If there is a non-breaking way it would be better to do that now and implement the breaking change later.

Understood. That's why I asked if there were any other renderers that have the same possibilities as html... you said yes so your idea is probably the best esp. if they grow. :)

UziTech added the other package label Aug 11, 2020

Martii mentioned this issue Nov 9, 2020

*marked* issues with HTML OpenUserJS/OpenUserJS.org#1775

Closed

UziTech changed the title ~~Weirdness with kbd tag~~ HTML renderer should distinguish between block and inline HTML Dec 8, 2020

UziTech added proposal and removed other package labels Dec 8, 2020

UziTech added this to To Do in vNext via automation Dec 8, 2020

This was referenced Mar 31, 2023

fix: update min node to v18 #2767

Merged

feat: add block param to html renderer #2768

Merged

UziTech moved this from To Do to In Progress in vNext Apr 1, 2023

UziTech closed this as completed in #2768 May 2, 2023

vNext automation moved this from In Progress to Done May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML renderer should distinguish between block and inline HTML #1745

HTML renderer should distinguish between block and inline HTML #1745

Martii commented Aug 11, 2020 •

edited

UziTech commented Aug 11, 2020

Martii commented Aug 11, 2020 •

edited

UziTech commented Aug 12, 2020

Martii commented Aug 12, 2020 •

edited

UziTech commented Aug 12, 2020

Martii commented Aug 13, 2020 •

edited

Martii commented Aug 13, 2020 •

edited

UziTech commented Aug 13, 2020 •

edited

Martii commented Aug 13, 2020

UziTech commented Aug 14, 2020

Martii commented Aug 14, 2020 •

edited

HTML renderer should distinguish between block and inline HTML #1745

HTML renderer should distinguish between block and inline HTML #1745

Comments

Martii commented Aug 11, 2020 • edited

UziTech commented Aug 11, 2020

Martii commented Aug 11, 2020 • edited

UziTech commented Aug 12, 2020

Martii commented Aug 12, 2020 • edited

UziTech commented Aug 12, 2020

Martii commented Aug 13, 2020 • edited

Martii commented Aug 13, 2020 • edited

UziTech commented Aug 13, 2020 • edited

Martii commented Aug 13, 2020

UziTech commented Aug 14, 2020

Martii commented Aug 14, 2020 • edited

Martii commented Aug 11, 2020 •

edited

Martii commented Aug 11, 2020 •

edited

Martii commented Aug 12, 2020 •

edited

Martii commented Aug 13, 2020 •

edited

Martii commented Aug 13, 2020 •

edited

UziTech commented Aug 13, 2020 •

edited

Martii commented Aug 14, 2020 •

edited