HTML contents seem to be incorrectly escaped if HTML tags are not separated by empty lines #707

nhinds · 2016-01-10T06:41:06Z

A <script> tag's contents is escaped if it's sandwiched between two HTML tags, and the first HTML element has another HTML tag on the line after it. To illustrate:

Input:

<p>This is immediately followed by a script tag, which is then broken</p>
<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Output:

<p>This is immediately followed by a script tag, which is then broken</p>
<script type="text/javascript">
if (i &lt; 3 &amp;&amp; &#39;one&#39; != &quot;two&quot;) alert(&quot;ok&quot;);
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

In this output, the contents of the script tag are escaped as if they were inside a p tag. Adding debug statements inside Lexer.prototype.token reveals that the if (cap = this.rules.html.exec(src)) { block is entered only once, and consumes the entire string. It looks like the HTML regex is matching the entire input document above as a single tag (or HTML block), which then sets the pre option to false and proceeds to escape special characters inside all tags in the document.

Additional examples:
Input with blockquotes (so it's not just p tags):

<blockquote>This is immediately followed by a script tag, which is then broken</blockquote>
<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<blockquote>This ending tag is matched as the ending tag of the first paragraph</blockquote>

Output:

<blockquote>This is immediately followed by a script tag, which is then broken</blockquote>
<script type="text/javascript">
if (i &lt; 3 &amp;&amp; &#39;one&#39; != &quot;two&quot;) alert(&quot;ok&quot;);
</script>

<blockquote>This ending tag is matched as the ending tag of the first paragraph</blockquote>

Input with whitespace between the first tag and the script tag (works):

<p>This is followed by whitespace, which works</p>

<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Output:

<p>This is followed by whitespace, which works</p>

<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Input where the first tag is followed by a different HTML tag:

<p>This is immediately followed by a different tag, which breaks the script tag</p>
<div>This breaks the script tag</div>

<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Output:

<p>This is immediately followed by a different tag, which breaks the script tag</p>
<div>This breaks the script tag</div>

<script type="text/javascript">
if (i &lt; 3 &amp;&amp; &#39;one&#39; != &quot;two&quot;) alert(&quot;ok&quot;);
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

All the above were performed with version 0.3.5 using the default configuration, via marked -i repro.md where repro.md contains only the given contents.

The text was updated successfully, but these errors were encountered:

hxsf · 2016-02-16T15:30:47Z

I think, because the regular of html cannot match all HTML line, like this:

I made a html function:

renderer.html = function(html){
    return '<h5>' + encode(html) + ' has been matched</h5>';
};

INPUT:

# title 1

<h1>aaaa</h1>
<h2>bbbb</h2>

<h1>aaaa</h1>
<h1>bbbb</h1>
<h2>cccc</h2>

OUTPUT:

<h1>aaaa</h1>
<h5>&lt;h2&gt;bbbb&lt;/h2&gt; has been matched<h5>

<h1>aaaa</h1>
<h5>&lt;h1&gt;bbbb&lt;/h2&gt; &lt;br&gt;&lt;h1&gt;bbbb&lt;/h2&gt; has been matched<h5>
<hr>

So, if only one \n between in two or more HTMLtags, the frist HTMLtag cannot been matched.

hxsf · 2016-02-17T03:36:32Z

I change the file marked/lib/marked.js line 23

Before Change

  html: /^ *(?:comment *(?:\n|\s*$)|closed *(?:\n{2,}|\s*$)|closing *(?:\n{2,}|\s*$))/,

After Change

  html: /^ *(?:comment *(?:\n|\s*$)|closed *(?:\n+|\s*$)|closing *(?:\n+|\s*$))/,

then, I fixed my problem,.

but i don't know whether it cause other problems or not.

joshbruce · 2017-12-25T20:27:37Z

#985

joshbruce closed this as completed Dec 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML contents seem to be incorrectly escaped if HTML tags are not separated by empty lines #707

HTML contents seem to be incorrectly escaped if HTML tags are not separated by empty lines #707

nhinds commented Jan 10, 2016

hxsf commented Feb 16, 2016

hxsf commented Feb 17, 2016

joshbruce commented Dec 25, 2017

HTML contents seem to be incorrectly escaped if HTML tags are not separated by empty lines #707

HTML contents seem to be incorrectly escaped if HTML tags are not separated by empty lines #707

Comments

nhinds commented Jan 10, 2016

hxsf commented Feb 16, 2016

INPUT:

OUTPUT:

hxsf commented Feb 17, 2016

Before Change

After Change

joshbruce commented Dec 25, 2017