Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing ~~~~~~… takes quadratic time #1463

Closed
andersk opened this issue Apr 8, 2019 · 13 comments
Closed

Parsing ~~~~~~… takes quadratic time #1463

andersk opened this issue Apr 8, 2019 · 13 comments
Labels
category: code blocks L0 - security A security vulnerability within the Marked library is discovered

Comments

@andersk
Copy link
Contributor

andersk commented Apr 8, 2019

Describe the bug

Marked takes quadratic time to parse ~~~~~~….

To Reproduce

Steps to reproduce the behavior:

$ time node -e 'require("./lib/marked")("~".repeat(25000))'
0.94user 0.01system 0:00.96elapsed 99%CPU (0avgtext+0avgdata 32336maxresident)k
0inputs+0outputs (0major+2601minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")("~".repeat(50000))'
3.39user 0.01system 0:03.42elapsed 99%CPU (0avgtext+0avgdata 32112maxresident)k
0inputs+0outputs (0major+2599minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")("~".repeat(100000))'
13.50user 0.01system 0:13.55elapsed 99%CPU (0avgtext+0avgdata 32200maxresident)k
0inputs+0outputs (0major+2599minor)pagefaults 0swaps

Observe that the time quadruples each time the input size doubles. This could be exploited for denial of service in an application running Marked on the server side.

Expected behavior

A Markdown parser (especially one that claims “built for speed” as its primary selling point) should run in linear time on all inputs. This isn’t easy but it should be possible; several parsers including the reference C implementation seem to do a good job at this.

@UziTech
Copy link
Member

UziTech commented Apr 9, 2019

quadruples each time the input size doubles

isn't that still considered linear time? Quadratic would be if it was n^2 right?

@andersk
Copy link
Contributor Author

andersk commented Apr 9, 2019

Linear time, O(n), doubles each time the input size doubles. Quadratic time, O(n^2), quadruples each time the input size doubles. This is quadratic.

@UziTech
Copy link
Member

UziTech commented Apr 9, 2019

You are right, I was thinking of exponential (2^n)

@UziTech UziTech added the L0 - security A security vulnerability within the Marked library is discovered label Apr 9, 2019
@andersk
Copy link
Contributor Author

andersk commented May 3, 2019

Now that #1464 has fixed the recursion error on >>>>>>…, we can see that this is also a quadratic time case.

$ time node -e 'require("./lib/marked")(">".repeat(25000))'
0.62user 0.01system 0:00.63elapsed 101%CPU (0avgtext+0avgdata 39592maxresident)k
0inputs+0outputs (0major+4521minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")(">".repeat(50000))'
2.18user 0.02system 0:02.17elapsed 101%CPU (0avgtext+0avgdata 46456maxresident)k
0inputs+0outputs (0major+6252minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")(">".repeat(100000))'
8.31user 0.03system 0:08.28elapsed 100%CPU (0avgtext+0avgdata 64860maxresident)k
0inputs+0outputs (0major+10944minor)pagefaults 0swaps

@calculuschild
Copy link
Contributor

With GFM disabled, I still see the slowdown, so it doesn't seem to be related to strikeout syntax.

Actually... it looks like the slowdown also occurs with backticks? Must be tied to code fences in some way.

@UziTech
Copy link
Member

UziTech commented Aug 10, 2023

Closing this as it is unlikely to occur in regular markdown and if parsing untrusted markdown ReDoS can be solved by using a worker.

@UziTech UziTech closed this as completed Aug 10, 2023
@andersk
Copy link
Contributor Author

andersk commented Aug 10, 2023

If you put yourself in the shoes of someone developing an app using a “⚡ built for speed” Markdown library to parse user Markdown, how should they know that they need to build external countermeasures against an undocumented performance bug like this?

@UziTech
Copy link
Member

UziTech commented Aug 10, 2023

ReDoS is a well documented vulnerability. Using workers to prevent it is also well documented. I don't think a user needs to know about this vulnerability to know the issues with parsing untrusted markdown.

@UziTech
Copy link
Member

UziTech commented Aug 10, 2023

Further more it is possible to cause a sql injection vulnerability by passing untrusted input to sql, but that isn't a vulnerability in sql that is a vulnerability in the implementation. Same with this vulnerability. If the user parses untrusted markdown with marked or any other markdown parser it is not a vulnerability in the parser but in the implementation.

@andersk
Copy link
Contributor Author

andersk commented Aug 10, 2023

ReDoS is a well documented vulnerability that applies to certain poorly-written regexps. Why should a user expect that a “⚡ built for speed” Markdown parsing library is internally built with regexps that are vulnerable to ReDoS?

There are other Markdown parsing libraries that do not have this bug. In principle, it should not be fundamentally unsafe to parse untrusted Markdown. If your position is that it’s unsafe to parse untrusted Markdown with this library, then then you should document that. (You currently only document that it’s unsafe to use the output HTML.)

@UziTech
Copy link
Member

UziTech commented Aug 10, 2023

I know of at least 10 ways to DoS markdown-it and commonmark,.js neither of which use regexps the way marked does. DoS is not only a vulnerability for regexps but any parser.

If you would like to update the docs PRs are always welcome 😁👍

@andersk
Copy link
Contributor Author

andersk commented Aug 10, 2023

In the past I’ve reported 7 quadratic-time bugs in markdown-it and 1 in commonmark.js; they were all addressed very promptly with code fixes. If you know of further issues you should certainly report them.

@UziTech
Copy link
Member

UziTech commented Aug 11, 2023

They have all been reported that is how I know about them. It doesn't take quadratic time to cause a DoS.

So I think we can agree that any parser can have vulnerabilities and it is not ok to run untrusted code in a way that can cause DoS. Certainly there were vulnerabilities before they fixed the ones you reported. And there are certainly vulnerabilities before the next one is found and fixed.

So the solution to this problem should be to educate users on how to mitigate that. I think we have done a good job illustrating that workaround in our docs. If you don't agree you are always allowed to create a PR to update the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: code blocks L0 - security A security vulnerability within the Marked library is discovered
Projects
None yet
Development

No branches or pull requests

3 participants