Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML contents seem to be incorrectly escaped if HTML tags are not separated by empty lines #707

Closed
nhinds opened this issue Jan 10, 2016 · 3 comments

Comments

@nhinds
Copy link

nhinds commented Jan 10, 2016

A <script> tag's contents is escaped if it's sandwiched between two HTML tags, and the first HTML element has another HTML tag on the line after it. To illustrate:

Input:

<p>This is immediately followed by a script tag, which is then broken</p>
<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Output:

<p>This is immediately followed by a script tag, which is then broken</p>
<script type="text/javascript">
if (i &lt; 3 &amp;&amp; &#39;one&#39; != &quot;two&quot;) alert(&quot;ok&quot;);
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

In this output, the contents of the script tag are escaped as if they were inside a p tag. Adding debug statements inside Lexer.prototype.token reveals that the if (cap = this.rules.html.exec(src)) { block is entered only once, and consumes the entire string. It looks like the HTML regex is matching the entire input document above as a single tag (or HTML block), which then sets the pre option to false and proceeds to escape special characters inside all tags in the document.

Additional examples:
Input with blockquotes (so it's not just p tags):

<blockquote>This is immediately followed by a script tag, which is then broken</blockquote>
<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<blockquote>This ending tag is matched as the ending tag of the first paragraph</blockquote>

Output:

<blockquote>This is immediately followed by a script tag, which is then broken</blockquote>
<script type="text/javascript">
if (i &lt; 3 &amp;&amp; &#39;one&#39; != &quot;two&quot;) alert(&quot;ok&quot;);
</script>

<blockquote>This ending tag is matched as the ending tag of the first paragraph</blockquote>

Input with whitespace between the first tag and the script tag (works):

<p>This is followed by whitespace, which works</p>

<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Output:

<p>This is followed by whitespace, which works</p>

<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Input where the first tag is followed by a different HTML tag:

<p>This is immediately followed by a different tag, which breaks the script tag</p>
<div>This breaks the script tag</div>

<script type="text/javascript">
if (i < 3 && 'one' != "two") alert("ok");
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

Output:

<p>This is immediately followed by a different tag, which breaks the script tag</p>
<div>This breaks the script tag</div>

<script type="text/javascript">
if (i &lt; 3 &amp;&amp; &#39;one&#39; != &quot;two&quot;) alert(&quot;ok&quot;);
</script>

<p>This ending tag is matched as the ending tag of the first paragraph</p>

All the above were performed with version 0.3.5 using the default configuration, via marked -i repro.md where repro.md contains only the given contents.

@hxsf
Copy link

hxsf commented Feb 16, 2016

I think, because the regular of html cannot match all HTML line, like this:

I made a html function:

renderer.html = function(html){
    return '<h5>' + encode(html) + ' has been matched</h5>';
};
INPUT:
# title 1

<h1>aaaa</h1>
<h2>bbbb</h2>

<h1>aaaa</h1>
<h1>bbbb</h1>
<h2>cccc</h2>
OUTPUT:
<h1>aaaa</h1>
<h5>&lt;h2&gt;bbbb&lt;/h2&gt; has been matched<h5>

<h1>aaaa</h1>
<h5>&lt;h1&gt;bbbb&lt;/h2&gt; &lt;br&gt;&lt;h1&gt;bbbb&lt;/h2&gt; has been matched<h5>
<hr>

So, if only one \n between in two or more HTMLtags, the frist HTMLtag cannot been matched.

@hxsf
Copy link

hxsf commented Feb 17, 2016

I change the file marked/lib/marked.js line 23

Before Change
  html: /^ *(?:comment *(?:\n|\s*$)|closed *(?:\n{2,}|\s*$)|closing *(?:\n{2,}|\s*$))/,
After Change
  html: /^ *(?:comment *(?:\n|\s*$)|closed *(?:\n+|\s*$)|closing *(?:\n+|\s*$))/,

then, I fixed my problem,.

but i don't know whether it cause other problems or not.

@joshbruce
Copy link
Member

#985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants