Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[commonmark] make html comments, html inlines, html blocks and links compliant #1135

Merged
merged 79 commits into from Apr 10, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
ffd386a
add commonmark tests for (inline) raw html
Feder1co5oave Feb 20, 2018
056f4e2
add commonmark tests for html comments
Feder1co5oave Jan 27, 2018
9450b09
new inline html rule, to comply with commonmark
Feder1co5oave Feb 20, 2018
652ba97
new html comment rule to comply with commonmark (html5).
Feder1co5oave Jan 27, 2018
b2611c1
allow colons in tag names and attributes. Decrease group depth in regex
Feder1co5oave Feb 23, 2018
f2ebd43
allow a regex source string as a parameter for edit()
Feder1co5oave Mar 4, 2018
27d4da6
refactor tag inline rule
Feder1co5oave Mar 4, 2018
4de3c98
refactor html block rule
Feder1co5oave Feb 24, 2018
c12c5d7
rename gfm_links -> gfm_autolinks
Feder1co5oave Feb 20, 2018
316db0a
rename test headings-id -> headings_id
Feder1co5oave Feb 26, 2018
d89012f
remove superfluous test fixture (already covered by original/links_re…
Feder1co5oave Feb 26, 2018
8e30cd2
!fixup b2611c1b05
Feder1co5oave Mar 4, 2018
bcf9abb
divide html_comments test into subtests
Feder1co5oave Mar 4, 2018
1b8ca2b
option `pedantic` overrides `gfm`, and turns off gfm, tables and breaks.
Feder1co5oave Mar 4, 2018
c1ef53c
add commonmark tests for html blocks
Feder1co5oave Mar 4, 2018
56972f8
save the current html block parsing in the pedantic mode
Feder1co5oave Mar 4, 2018
d08039e
new rule for html blocks, to comply with commonmark.
Feder1co5oave Mar 4, 2018
7abf702
adjust html_comments test case in accordance with commonmark
Feder1co5oave Mar 4, 2018
fb2f317
rearrange test in cm_html_blocks to test end of file
Feder1co5oave Mar 5, 2018
0904e44
add commonmark tests for links
Feder1co5oave Feb 9, 2018
de66018
encode urls with %xx entities
Feder1co5oave Jan 23, 2018
d2cef5a
escape html entities in link/image href
Feder1co5oave Mar 4, 2018
ef64418
encode urls in tests accordingly
Feder1co5oave Jan 23, 2018
786334a
allow frontmatter in original tests
Feder1co5oave Feb 9, 2018
bf9c9c5
Revert fac31ed "allow matched double quotes in link definition title …
Feder1co5oave Feb 9, 2018
821e2da
allow matched double quotes in link definition title only in pedantic…
Feder1co5oave Mar 4, 2018
dc92048
test literal_quotes_in_titles in pedantic mode
Feder1co5oave Mar 4, 2018
3be817b
save current link parsing in pedantic mode
Feder1co5oave Mar 5, 2018
6750997
test links_inline_style in pedantic mode
Feder1co5oave Mar 5, 2018
e66f7aa
don't allow spaces in link URIs
Feder1co5oave Mar 5, 2018
fc17a2c
allow backslash-escapes in link text, href, title and link definition…
Feder1co5oave Mar 5, 2018
9f20c46
allow ONE level of matching parenthesis in links.
Feder1co5oave Mar 5, 2018
d8ff951
allow unbalanced parenthesis in links URIs wrapped in angle brackets
Feder1co5oave Mar 5, 2018
277d093
let html entities in link URIs pass through as-is (we cannot decode t…
Feder1co5oave Mar 5, 2018
3afc360
allow link titles to be wrapped in parenthesis
Feder1co5oave Mar 5, 2018
29d33d9
remove commonmark test #478 because we treat U+00A0 (non-breaking spa…
Feder1co5oave Mar 5, 2018
271d357
[security] fix possible ReDOS vulnerable regex rule by refactoring (d…
Feder1co5oave Mar 5, 2018
47365c1
allow brackets in link text if they are inside a code span. Add test.
Feder1co5oave Mar 5, 2018
8877ff7
[refactor] rename rule inside -> label
Feder1co5oave Mar 5, 2018
ca349c8
allow ONE level of matching parenthesis in link text.
Feder1co5oave Mar 5, 2018
5125739
run test cm_links with xhtml
Feder1co5oave Mar 5, 2018
eb95a71
[commonmark] BREAKING CHANGE: link nesting is not allowed. If multipl…
Feder1co5oave Mar 6, 2018
13dd38a
add tests for link nesting
Feder1co5oave Mar 6, 2018
ef3516c
!fixup eb95a71
Feder1co5oave Mar 6, 2018
5b135c3
lint marked.js
Feder1co5oave Mar 6, 2018
55f47f1
remove commonmark tests 489-491 for nested links
Feder1co5oave Mar 6, 2018
7a80cdf
Revert 13dd38a "add tests for link nesting"
Feder1co5oave Mar 6, 2018
f21a4d6
Revert eb95a71 "[commonmark] BREAKING CHANGE: link nesting is not all…
Feder1co5oave Mar 6, 2018
2094181
remove commonmark tests 492, 495-497 for link precedence with other m…
Feder1co5oave Mar 6, 2018
aeca6a1
allow only one level of matching brackets in reference links text.
Feder1co5oave Mar 6, 2018
5536922
allow link nesting, adjust tests 503-504 accordingly. NON-COMPLIANT w…
Feder1co5oave Mar 6, 2018
e56e35a
remove commonmark tests 505, 507-509 for link precedence with other m…
Feder1co5oave Mar 6, 2018
22b06cc
perform space normalization when storing link reference labels
Feder1co5oave Mar 6, 2018
b9394a0
BREAKING CHANGE: space is no longer allowed between [link text] and […
Feder1co5oave Mar 6, 2018
fc97171
run original tests with spaces between link text and label in pedanti…
Feder1co5oave Mar 6, 2018
d47dc29
fix test 513
Feder1co5oave Mar 6, 2018
f584aca
do not process escapes in link labels
Feder1co5oave Mar 6, 2018
eea3932
fix tests in cm_links
Feder1co5oave Mar 6, 2018
8594a06
link labels must have at least one non-whitespace character
Feder1co5oave Mar 6, 2018
209dff1
add collapsed reference [links][] = [links][links]
Feder1co5oave Mar 6, 2018
ea48e96
fix tests in cm_links
Feder1co5oave Mar 6, 2018
dfc5b3e
remove commonmark test 535 (link precedence with other markup).
Feder1co5oave Mar 6, 2018
d29f68a
fix tests in cm_links
Feder1co5oave Mar 6, 2018
2e23540
adjust commonmark tests 544-548, 556, 560 that contain markup inside …
Feder1co5oave Mar 6, 2018
f7d4d21
update the list of markdown escapes, per commonmark
Feder1co5oave Mar 6, 2018
c398550
do not allow control characters in link destinations, per commonmark
Feder1co5oave Mar 6, 2018
e4fd972
fix tests whitespace
Feder1co5oave Mar 6, 2018
680a6c3
Merge remote-tracking branch 'markedjs/master' into cm_links
Feder1co5oave Mar 8, 2018
163a482
re-enable cm_autolinks test case (now passing)
Feder1co5oave Mar 8, 2018
4e2b647
document `pedantic` option overrides `gfm`
Feder1co5oave Mar 8, 2018
9cb1900
fix backtracking vulnerability in `block.pedantic.html` as in #1083
Feder1co5oave Mar 8, 2018
b738cd6
Merge remote-tracking branch 'markedjs/master' into cm_links
Feder1co5oave Apr 9, 2018
4aa4f02
re-enable passing commonmark tests
Feder1co5oave Apr 9, 2018
bc7c9db
do not allow newlines inside html attributes, make cm test 60 pass
Feder1co5oave Apr 9, 2018
d94a68c
do not randomize jasmine tests (different spec files interfered with …
Feder1co5oave Apr 9, 2018
4e52c42
Merge branch 'master' into pr/1135
UziTech Apr 10, 2018
5396950
remove passing tests
UziTech Apr 10, 2018
8815ba3
randomize tests
UziTech Apr 10, 2018
a6c6f0d
update completed table
UziTech Apr 10, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/USING_ADVANCED.md
Expand Up @@ -49,7 +49,7 @@ console.log(myMarked('I am using __markdown__.'));
|highlight |`function`|A function to highlight code blocks. See also: <a href="#highlight">Asynchronous highlighting</a>. |
|langPrefix |`??` |Default is `lang-`
|mangle |`boolean` |Default is `true`
|pedantic |`boolean` |Conform to obscure parts of `markdown.pl` as much as possible. Don't fix original markdown bugs or behavior. Default: `false`|
|pedantic |`boolean` |Conform to obscure parts of `markdown.pl` as much as possible. Don't fix original markdown bugs or behavior. Turns off and overrides `gfm`. Default: `false`|
|renderer |`object` |An object containing functions to render tokens to HTML. See [extensibility](https://github.com/markedjs/marked/blob/master/docs/USING_PRO.md) for more details. Default: `new Renderer()`|
|sanitize |`boolean` |Ignore HTML passed into `markdownString` (sanitize the input). Default: `false` |
|sanitizer |`??` |Default is `null` |
Expand Down
144 changes: 103 additions & 41 deletions lib/marked.js
Expand Up @@ -20,16 +20,25 @@ var block = {
nptable: noop,
blockquote: /^( {0,3}> ?(paragraph|[^\n]*)(?:\n|$))+/,
list: /^( *)(bull) [\s\S]+?(?:hr|def|\n{2,}(?! )(?!\1bull )\n*|\s*$)/,
html: /^ *(?:comment *(?:\n|\s*$)|closed *(?:\n{2,}|\s*$)|closing *(?:\n{2,}|\s*$))/,
html: '^ {0,3}(?:' // optional indentation
+ '<(script|pre|style)[\\s>][\\s\\S]*?(?:</\\1>[^\\n]*\\n+|$)' // (1)
+ '|comment[^\\n]*(\\n+|$)' // (2)
+ '|<\\?[\\s\\S]*?\\?>\\n*' // (3)
+ '|<![A-Z][\\s\\S]*?>\\n*' // (4)
+ '|<!\\[CDATA\\[[\\s\\S]*?\\]\\]>\\n*' // (5)
+ '|</?(tag)(?: +|\\n|/?>)[\\s\\S]*?(?:\\n{2,}|$)' // (6)
+ '|<(?!script|pre|style)([a-z][\\w-]*)(?:attribute)*? */?>(?=\\h*\\n)[\\s\\S]*?(?:\\n{2,}|$)' // (7) open tag
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davisjam it this safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you print out html.source?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I did it. Safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Feder1co5oave what is the \\h* for in (?=\\h*\\n)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it that supposed to be \\n*?

+ '|</(?!script|pre|style)[a-z][\\w-]*\\s*>(?=\\h*\\n)[\\s\\S]*?(?:\\n{2,}|$)' // (7) closing tag
+ ')',
def: /^ {0,3}\[(label)\]: *\n? *<?([^\s>]+)>?(?:(?: +\n? *| *\n *)(title))? *(?:\n+|$)/,
table: noop,
lheading: /^([^\n]+)\n *(=|-){2,} *(?:\n+|$)/,
paragraph: /^([^\n]+(?:\n?(?!hr|heading|lheading| {0,3}>|tag)[^\n]+)+)/,
paragraph: /^([^\n]+(?:\n(?!hr|heading|lheading| {0,3}>|<\/?(?:tag)(?: +|\\n|\/?>)|<(?:script|pre|style|!--))[^\n]+)+)/,
text: /^[^\n]+/
};

block._label = /(?:\\[\[\]]|[^\[\]])+/;
block._title = /(?:"(?:\\"|[^"]|"[^"\n]*")*"|'\n?(?:[^'\n]+\n?)*'|\([^()]*\))/;
block._label = /(?!\s*\])(?:\\[\[\]]|[^\[\]])+/;
block._title = /(?:"(?:\\"?|[^"\\])*"|'[^'\n]*(?:\n[^'\n]+)*\n?'|\([^()]*\))/;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe.

block.def = edit(block.def)
.replace('label', block._label)
.replace('title', block._title)
Expand All @@ -47,23 +56,24 @@ block.list = edit(block.list)
.replace('def', '\\n+(?=' + block.def.source + ')')
.getRegex();

block._tag = '(?!(?:'
+ 'a|em|strong|small|s|cite|q|dfn|abbr|data|time|code'
+ '|var|samp|kbd|sub|sup|i|b|u|mark|ruby|rt|rp|bdi|bdo'
+ '|span|br|wbr|ins|del|img)\\b)\\w+(?!:|[^\\w\\s@]*@)\\b';

block.html = edit(block.html)
.replace('comment', /<!--[\s\S]*?-->/)
.replace('closed', /<(tag)[\s\S]+?<\/\1>/)
.replace('closing', /<tag(?:"[^"]*"|'[^']*'|\s[^'"\/>\s]*)*?\/?>/)
.replace(/tag/g, block._tag)
block._tag = 'address|article|aside|base|basefont|blockquote|body|caption'
+ '|center|col|colgroup|dd|details|dialog|dir|div|dl|dt|fieldset|figcaption'
+ '|figure|footer|form|frame|frameset|h[1-6]|head|header|hr|html|iframe'
+ '|legend|li|link|main|menu|menuitem|meta|nav|noframes|ol|optgroup|option'
+ '|p|param|section|source|summary|table|tbody|td|tfoot|th|thead|title|tr'
+ '|track|ul';
block._comment = /<!--(?!-?>)[\s\S]*?-->/;
block.html = edit(block.html, 'i')
.replace('comment', block._comment)
.replace('tag', block._tag)
.replace('attribute', / +[a-zA-Z:_][\w.:-]*(?: *= *"[^"\n]*"| *= *'[^'\n]*'| *= *[^\s"'=<>`]+)?/)
.getRegex();

block.paragraph = edit(block.paragraph)
.replace('hr', block.hr)
.replace('heading', block.heading)
.replace('lheading', block.lheading)
.replace('tag', '<' + block._tag)
.replace('tag', block._tag) // pars can be interrupted by type (6) html blocks
.getRegex();

block.blockquote = edit(block.blockquote)
Expand Down Expand Up @@ -101,6 +111,24 @@ block.tables = merge({}, block.gfm, {
table: /^ *\|(.+)\n *\|( *[-:]+[-| :]*)\n((?: *\|.*(?:\n|$))*)\n*/
});

/**
* Pedantic grammar
*/

block.pedantic = merge({}, block.normal, {
html: edit(
'^ *(?:comment *(?:\\n|\\s*$)'
+ '|<(tag)[\\s\\S]+?</\\1> *(?:\\n{2,}|\\s*$)' // closed tag
+ '|<tag(?:"[^"]*"|\'[^\']*\'|\\s[^\'"/>\\s]*)*?/?> *(?:\\n{2,}|\\s*$))')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davisjam this should match your fix in #1083

.replace('comment', block._comment)
.replace(/tag/g, '(?!(?:'
+ 'a|em|strong|small|s|cite|q|dfn|abbr|data|time|code|var|samp|kbd|sub'
+ '|sup|i|b|u|mark|ruby|rt|rp|bdi|bdo|span|br|wbr|ins|del|img)'
+ '\\b)\\w+(?!:|[^\\w\\s@]*@)\\b')
.getRegex(),
def: /^ *\[([^\]]+)\]: *<?([^\s>]+)>?(?: +(["(][^\n]+[")]))? *(?:\n+|$)/
});

/**
* Block Lexer
*/
Expand All @@ -111,7 +139,9 @@ function Lexer(options) {
this.options = options || marked.defaults;
this.rules = block.normal;

if (this.options.gfm) {
if (this.options.pedantic) {
this.rules = block.pedantic;
} else if (this.options.gfm) {
if (this.options.tables) {
this.rules = block.tables;
} else {
Expand Down Expand Up @@ -370,7 +400,7 @@ Lexer.prototype.token = function(src, top) {
if (top && (cap = this.rules.def.exec(src))) {
src = src.substring(cap[0].length);
if (cap[3]) cap[3] = cap[3].substring(1, cap[3].length - 1);
tag = cap[1].toLowerCase();
tag = cap[1].toLowerCase().replace(/\s+/g, ' ');
if (!this.tokens.links[tag]) {
this.tokens.links[tag] = {
href: cap[2],
Expand Down Expand Up @@ -461,13 +491,18 @@ Lexer.prototype.token = function(src, top) {
*/

var inline = {
escape: /^\\([\\`*{}\[\]()#+\-.!_>])/,
escape: /^\\([!"#$%&'()*+,\-./:;<=>?@\[\]\\^_`{|}~])/,
autolink: /^<(scheme:[^\s\x00-\x1f<>]*|email)>/,
url: noop,
tag: /^<!--[\s\S]*?-->|^<\/?[a-zA-Z0-9\-]+(?:"[^"]*"|'[^']*'|\s[^<'">\/\s]*)*?\/?>/,
link: /^!?\[(inside)\]\(href\)/,
reflink: /^!?\[(inside)\]\s*\[([^\]]*)\]/,
nolink: /^!?\[((?:\[[^\[\]]*\]|\\[\[\]]|[^\[\]])*)\]/,
tag: '^comment'
+ '|^</[a-zA-Z][\\w:-]*\\s*>' // self-closing tag
+ '|^<[a-zA-Z][\\w-]*(?:attribute)*?\\s*/?>' // open tag
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davisjam same as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you print out inline.tag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I did it. This regex is safe.

+ '|^<\\?[\\s\\S]*?\\?>' // processing instruction, e.g. <?php ?>
+ '|^<![a-zA-Z]+\\s[\\s\\S]*?>' // declaration, e.g. <!DOCTYPE html>
+ '|^<!\\[CDATA\\[[\\s\\S]*?\\]\\]>', // CDATA section
link: /^!?\[(label)\]\(href(?:\s+(title))?\s*\)/,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davisjam the commonmark spec is covered by this rule (and subrules at lines 530 and following). This is safe because the href part can't contain spaces, so the URI stops at the closing ) or at the first space.

reflink: /^!?\[(label)\]\[(?!\s*\])((?:\\[\[\]]?|[^\[\]\\])+)\]/,
nolink: /^!?\[(?!\s*\])((?:\[[^\[\]]*\]|\\[\[\]]|[^\[\]])*)\](?:\[\])?/,
strong: /^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)|^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)/,
em: /^_([^\s][\s\S]*?[^\s_])_(?!_)|^_([^\s_][\s\S]*?[^\s])_(?!_)|^\*([^\s][\s\S]*?[^\s*])\*(?!\*)|^\*([^\s*][\s\S]*?[^\s])\*(?!\*)|^_([^\s_])_(?!_)|^\*([^\s*])\*(?!\*)/,
code: /^(`+)\s*([\s\S]*?[^`]?)\s*\1(?!`)/,
Expand All @@ -476,24 +511,34 @@ var inline = {
text: /^[\s\S]+?(?=[\\<!\[`*]|\b_| {2,}\n|$)/
};

inline._escapes = /\\([!"#$%&'()*+,\-./:;<=>?@\[\]\\^_`{|}~])/g;

inline._scheme = /[a-zA-Z][a-zA-Z0-9+.-]{1,31}/;
inline._email = /[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+(@)[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+(?![-_])/;

inline.autolink = edit(inline.autolink)
.replace('scheme', inline._scheme)
.replace('email', inline._email)
.getRegex()
.getRegex();

inline._attribute = /\s+[a-zA-Z:_][\w.:-]*(?:\s*=\s*"[^"]*"|\s*=\s*'[^']*'|\s*=\s*[^\s"'=<>`]+)?/;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe.


inline._inside = /(?:\[[^\[\]]*\]|\\[\[\]]|[^\[\]]|\](?=[^\[]*\]))*/;
inline._href = /\s*<?([\s\S]*?)>?(?:\s+['"]([\s\S]*?)['"])?\s*/;
inline.tag = edit(inline.tag)
.replace('comment', block._comment)
.replace('attribute', inline._attribute)
.getRegex();

inline._label = /(?:\[[^\[\]]*\]|\\[\[\]]?|`[^`]*`|[^\[\]\\])*?/;
inline._href = /\s*(<(?:\\[<>]?|[^\s<>\\])*>|(?:\\[()]?|\([^\s\x00-\x1f()\\]*\)|[^\s\x00-\x1f()\\])*?)/;
inline._title = /"(?:\\"?|[^"\\])*"|'(?:\\'?|[^'\\])*'|\((?:\\\)?|[^)\\])*\)/;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davisjam check these 3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe.


inline.link = edit(inline.link)
.replace('inside', inline._inside)
.replace('label', inline._label)
.replace('href', inline._href)
.replace('title', inline._title)
.getRegex();

inline.reflink = edit(inline.reflink)
.replace('inside', inline._inside)
.replace('label', inline._label)
.getRegex();

/**
Expand All @@ -508,7 +553,13 @@ inline.normal = merge({}, inline);

inline.pedantic = merge({}, inline.normal, {
strong: /^__(?=\S)([\s\S]*?\S)__(?!_)|^\*\*(?=\S)([\s\S]*?\S)\*\*(?!\*)/,
em: /^_(?=\S)([\s\S]*?\S)_(?!_)|^\*(?=\S)([\s\S]*?\S)\*(?!\*)/
em: /^_(?=\S)([\s\S]*?\S)_(?!_)|^\*(?=\S)([\s\S]*?\S)\*(?!\*)/,
link: edit(/^!?\[(label)\]\(\s*<?([\s\S]*?)>?(?:\s+(['"][\s\S]*?['"]))?\s*\)/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is vulnerable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A static scan of the tree on @Feder1co5oave's PR says:

"vulnRegexes":["^!?\\[(label)\\]\\(\\s*<?([\\s\\S]*?)>?(?:\\s+(['\"][\\s\\S]*?['\"]))?\\s*\\)"]

(JSON-encoded).

Checking the ones requested individually...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is the "old" link regex, I didn't change it.
Could you point out where the issue is? I'm thinking this part?

(?:\s+(['"][\s\S]*?['"]))?

The double ?, maybe? I'm kinda lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what check-regex.pl says:

{"pattern":"^!?\\[(label)\\]\\(\\s*<?([\\s\\S]*?)>?(?:\\s+(['\"][\\s\\S]*?['\"]))?\\s*\\)",
"evilInput": {"suffix":"e","pumpPairs":[
{"pump":"a","prefix":"![label](\t"},{"prefix":"a","pump":"\t"}]
},
"timeLimit":"5",
"nPumps":"250000",
"language":"javascript"}

The issue is the use of [\s\S]*? twice.
This results in a quadratic regex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see where we're getting at... I'm not sure what that output means, can you translate that into some example input that triggers the quadratic behavior?
I was thinking, maybe...?

![label](\t\t\t\t\t\t\t\t\t...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try with /^!?\[(label)\]\((\s*(<[^>]*>))?(?:\s+(['"][\s\S]*?['"]))?\s*\)/ ?

existing, proposed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapping the URI in angular parenthesis is optional.
The issue is that in this original rule, spaces where allowed inside the URI, which could then be followed by space and a title wrapped in single/double quotes. There is no easy way of finding where the uri stops and when the title begins, but for the presence of the opening quote (which in some cases could be part of the URI instead?). The rule itself is ambiguous.

Copy link
Contributor

@davisjam davisjam Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't sound like this can be done cleanly (and safely) in a single regex. Can you provide a link to the spec for this rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "legacy" link rule of marked, it's been like this for a long time. I didn't change it. It doesn't follow any clear spec, it just tries its best. We should come up with a different rule that doesn't break compatibility.
However, keep in mind this is used only in pedantic mode, I think very few people ever used it and it is planned to be deprecated soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Since you didn't change it, it shouldn't prevent the PR from being merged.

.replace('label', inline._label)
.getRegex(),
reflink: edit(/^!?\[(label)\]\s*\[([^\]]*)\]/)
.replace('label', inline._label)
.getRegex()
});

/**
Expand Down Expand Up @@ -552,14 +603,14 @@ function InlineLexer(links, options) {
throw new Error('Tokens array requires a `links` property.');
}

if (this.options.gfm) {
if (this.options.pedantic) {
this.rules = inline.pedantic;
} else if (this.options.gfm) {
if (this.options.breaks) {
this.rules = inline.breaks;
} else {
this.rules = inline.gfm;
}
} else if (this.options.pedantic) {
this.rules = inline.pedantic;
}
}

Expand Down Expand Up @@ -587,6 +638,7 @@ InlineLexer.prototype.output = function(src) {
link,
text,
href,
title,
cap;

while (src) {
Expand Down Expand Up @@ -650,9 +702,12 @@ InlineLexer.prototype.output = function(src) {
if (cap = this.rules.link.exec(src)) {
src = src.substring(cap[0].length);
this.inLink = true;
href = cap[2];
href = href[0] === '<' ? href.substring(1, href.length - 1) : href;
title = cap[3] ? cap[3].substring(1, cap[3].length - 1) : cap[3];
out += this.outputLink(cap, {
href: cap[2],
title: cap[3]
href: InlineLexer.escapes(href),
title: InlineLexer.escapes(title)
});
this.inLink = false;
continue;
Expand Down Expand Up @@ -725,12 +780,16 @@ InlineLexer.prototype.output = function(src) {
return out;
};

InlineLexer.escapes = function(text) {
return text ? text.replace(InlineLexer.rules._escapes, '$1') : text;
}

/**
* Compile Link
*/

InlineLexer.prototype.outputLink = function(cap, link) {
var href = escape(link.href),
var href = link.href,
title = link.title ? escape(link.title) : null;

return cap[0].charAt(0) !== '!'
Expand Down Expand Up @@ -917,7 +976,12 @@ Renderer.prototype.link = function(href, title, text) {
if (this.options.baseUrl && !originIndependentUrl.test(href)) {
href = resolveUrl(this.options.baseUrl, href);
}
var out = '<a href="' + href + '"';
try {
href = encodeURI(href).replace(/%25/g, '%');
} catch (e) {
return text;
}
var out = '<a href="' + escape(href) + '"';
if (title) {
out += ' title="' + title + '"';
}
Expand Down Expand Up @@ -1137,10 +1201,8 @@ Parser.prototype.tok = function() {
return this.renderer.listitem(body);
}
case 'html': {
var html = !this.token.pre && !this.options.pedantic
? this.inline.output(this.token.text)
: this.token.text;
return this.renderer.html(html);
// TODO parse inline content if parameter markdown=1
return this.renderer.html(this.token.text);
}
case 'paragraph': {
return this.renderer.paragraph(this.inline.output(this.token.text));
Expand Down Expand Up @@ -1179,7 +1241,7 @@ function unescape(html) {
}

function edit(regex, opt) {
regex = regex.source;
regex = regex.source || regex;
opt = opt || '';
return {
replace: function(name, val) {
Expand Down
9 changes: 7 additions & 2 deletions test/index.js
Expand Up @@ -365,10 +365,15 @@ function fix() {

// cp -r original tests
fs.readdirSync(path.resolve(__dirname, 'original')).forEach(function(file) {
var text = fs.readFileSync(path.resolve(__dirname, 'original', file));
var text = fs.readFileSync(path.resolve(__dirname, 'original', file), 'utf8');

if (path.extname(file) === '.md') {
text = '---\ngfm: false\n---\n' + text;
if (fm.test(text)) {
text = fm(text);
text = '---\n' + text.frontmatter + '\ngfm: false\n---\n' + text.body;
} else {
text = '---\ngfm: false\n---\n' + text;
}
}

fs.writeFileSync(path.resolve(__dirname, 'compiled_tests', file), text);
Expand Down
File renamed without changes.