Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtracking vulnerability #1070

Closed
joshbruce opened this issue Feb 21, 2018 · 23 comments
Closed

Backtracking vulnerability #1070

joshbruce opened this issue Feb 21, 2018 · 23 comments
Labels
help wanted L0 - security A security vulnerability within the Marked library is discovered question

Comments

@joshbruce
Copy link
Member

joshbruce commented Feb 21, 2018

My name is James. I'm a security researcher at Virginia Tech.

The npm module marked has 2 regular expressions vulnerable to catastrophic backtracking.

The vulnerable expressions are listed below.

If you don't know, catastrophic backtracking is when the regex engine takes more than linear time to scan a string.

There are lots of resources about it on the web. I have included some starting points below.

Catastrophic backtracking is particularly problematic if two conditions are met:

  1. The module is used by server processes, and
  2. The regex can be reached by user input.

Please evaluate whether these conditions apply to your users and your regexes.

However, copy/pasting regexes is quite common. Even if one or both conditions fail now,
these regexes may be a time bomb in your code for later -- you might still want to fix it.

For example, if your module is used on the client side, consider whether a server-side counterpart might re-use this pattern later.

Fix advice:

If you want to make fixes, here are the most common approaches with examples:

  1. Refactor the regex pattern
  2. Replace the regex with custom parsing
  3. Apply input sanitization, suitable if legitimate input strings are (much) shorter than the attack string

You can study other examples using the links from Snyk.io's vulnerability database: https://snyk.io/vuln/?packageManager=all

Validating your fix:

  1. I am happy to test a refactored regex with my tools -- send me an email with the commit ID.
  2. Existing GitHub projects for regex testing are:

If this is a new topic for you, here are some good starting points for information:


Now, here are the vulnerable patterns with attack details.

Each vulnerability has the following keys, explained in more detail below:

  • pattern
  • filesIn (as of December/January; I excluded any appearances in irrelevant-looking dirs, and in '.min' files)
  • stringLenFor10Sec
  • nPumpsFor10Sec
  • attackFormat
  • blowupCurve

The attack format describes how to generate an attack string.

On my machine, an attack string generated using nPumpsFor10Sec repetitions ("pumps") of the pump string(s) blocks the js regex engine for 10 seconds, though this will vary based on your hardware.

Compose an attack string like this:

'prefix 1' + 'pump 1' X times + 'prefix 2' + 'pump 2' X times + ... + suffix

Example:

With pumpPairs: [{'prefix': 'a', 'pump': 'b'}], suffix: 'c', an attack string with three pumps would be:

abbbc

Catastrophic backtracking blows up at either an exponential rate or a super-linear (power law) rate.

The blowupCurve indicates how severe the blow-up is.

The 'type' is either EXP(onential) or POW(er law) in the number of pumps (x).

The 'parms' are the parameters for the two curve types. The second parameter is more important, because:

  • EXP: f(x) = parms[0] * parms[1]^x
  • POW: f(x) = parms[0] * x^parms[1]

I strongly recommend you address any EXP(onential) blow-ups.

POW(er law) blow-ups vary in their severity depending on the value of the second parameter, and in the length of the attack string ('stringLenFor10Sec').

But they are still a legitimate attack vector.

JSON formatted:

Vuln 1:

{
   "attackFormat" : {
      "suffix" : "!\\",
      "pumpPairs" : [
         {
            "prefix" : "![",
            "pump" : "\\[[]"
         }
      ]
   },
   "pattern" : "^!?\\[((?:\\[[^\\]]*\\]|\\\\[\\[\\]]|[^\\[\\]])*)\\]",
   "filesIn" : [
      [
         "lib/marked.js"
      ]
   ],
   "blowupCurve" : {
      "parms" : [
         0.000686698017502894,
         0.320195646568061
      ],
      "type" : "EXP",
      "r2" : 0.982167673285096
   },
   "stringLenFor10Sec" : 124,
   "nPumpsFor10Sec" : "30"
}

Vuln 2:

{
   "pattern" : "<tag(?:\"[^\"]*\"|'[^']*'|\\s[^'\"\\/>]*)*?\\/?>",
   "attackFormat" : {
      "suffix" : "<\"\t/>a'a",
      "pumpPairs" : [
         {
            "prefix" : "<tag",
            "pump" : "\t\t\"\""
         }
      ]
   },
   "nPumpsFor10Sec" : "30",
   "stringLenFor10Sec" : 132,
   "blowupCurve" : {
      "r2" : 0.98785310777975,
      "type" : "EXP",
      "parms" : [
         0.000399342208756204,
         0.338098864022787
      ]
   },
   "filesIn" : [
      [
         "lib/marked.js"
      ]
   ]
}
@joshbruce joshbruce changed the title Backtracking Backtracking vulnerability Feb 21, 2018
@davisjam
Copy link
Contributor

Let me know if you have any questions.

@joshbruce
Copy link
Member Author

@davisjam: Did you run this initial assessment against a specific version of Marked? If so, which one?

@davisjam
Copy link
Contributor

Whatever branch comes down when you run 'git clone'.

@joshbruce
Copy link
Member Author

joshbruce commented Feb 21, 2018

@davisjam: This one has confused me a bit regarding why it is considered a "security vulnerability"...mainly for my own edification. If the server gets stuck in a process does that open the server itself up to attack?

I'm a layman in this territory; so, be gentle. :)

Seems like the equivalent of a stack overflow or something similar we would find in PHP and other server-side processes, which is usually handled by a timeout cancelling the process...not picking up the "security vulnerability" part of these types of findings.

ps. @UziTech did try to explain it once I think on the #1093 Issue - don't think it stuck.

@davisjam
Copy link
Contributor

@joshbruce Suppose you're running a Node server that uses a regex to validate user input, perhaps somewhere in your middleware layer for handling HTTP requests. Now suppose there is input on which the time it takes to evaluate the regex doubles for each extra character -- an exponential blow-up.

A simple example is /(a+)+$/ which will blow up on inputs of the form "aaaaaaaaa....a!".

A malicious client could make an HTTP request with, say, 100 characters, and make the Node server take eons to evaluate it. While the Node server is stuck in this callback for one client, any other pending requests will hang -- disaster! I have a guide on this topic if you're unfamiliar with it.

Unless the developer was extra clever (using the VM module or a child process), Node can't intervene with a timeout. The V8 regex engine just keeps trying, unlike PHP or sometimes Perl which will throw an exception on a runaway regex evaluation.

The only thing you can do is detect that your server's throughput has dropped using some kind of heartbeat mechanism, shoot the server, and restart it. But of course the attacker can just submit another malicious request.

@joshbruce
Copy link
Member Author

@davisjam: Thanks! (Admittedly, I've only made it through the tl;dr - but have it open to read later.) So, is this a performance vulnerability or a security vulnerability or maybe both?

If it's covered later in the paper - just say the word and I'll stop asking. ;)

The security vulnerability is more from someone being able to block the app for other people - but it doesn't actually allow for the compromising of their data and so on. I could see where that would be a problem on, say, a stock trading application, for example.

Just wondering severity, impact, and likelihood - risk management mode, sorry. (Definitely appreciate the suggested solutions included in the report. And will let the rest of the crew marinate and respond - truly just for my own edification - I love learning new things.)

@davisjam
Copy link
Contributor

So, is this a performance vulnerability or a security vulnerability or maybe both?

Speaking generically about "blocking the event loop or the worker pool": Both.

For server-side code, if an attacker can cause your event loop or worker pool to block it's a security vulnerability.

For client-side code, it's a performance problem, but only if a user might accidentally cause the event loop (or WebWorkers) to block. Though you could buy up ad space on Google and be annoying to people; this is effectively what auto-play ads do anyway.

Speaking specifically about catastrophic backtracking:

  • mostly of interest for REDOS, where a client affects a server
  • if the "attack strings" look like somethings users might enter on accident, this could be seen as a performance problem even on the client side
  • if the client side code is doing something that a server might want to do, this is a copy/paste "time bomb" waiting to happen. "Full Stack JavaScript" has its downsides.

@joshbruce
Copy link
Member Author

@davisjam: Thanks! Read the rest. Again, please excuse the ignorance on my part, the WikiPedia article didn't give me the "ah-ha!" moment re REDOS and DOS being a "security" piece...maybe I just have a limited understanding of what we mean by "security" in this context.

Is the attack, effectively, against the other clients accessing the server? Not necessarily the server itself? (Trying to parse between attacking a computer and who is hurt by the attack to classify it as security issue.)

Also, might be worth checking out #1039 - sorry, thought I posted this earlier.

@Feder1co5oave
Copy link
Contributor

Vulnerability n.2 is confirmed in #1058, it affects the online HTML rule and I've been working on it

@Feder1co5oave Feder1co5oave added parser: GFM L0 - security A security vulnerability within the Marked library is discovered and removed parser: GFM labels Feb 22, 2018
@davisjam
Copy link
Contributor

@joshbruce If a Node server is testing client input with a vulnerable regex, a malicious client can submit input to trigger the catastrophic backtracking. The Node server will get stuck in the callback and not be able to handle any other clients. The result is Denial of Service -- hence the nickname "Regular Expression Denial of Service" or ReDoS/REDOS.

@joshbruce
Copy link
Member Author

@davisjam: Thanks! Still think I'm too naive to fully understand what makes it a security thing and not just a "jerk move" and maybe that's actually okay that I don't fully understand. :)

Just playing scenarios in my head of how and when someone could become an injured party on the other end. Again, maybe ignorance is bliss for me right now and I need to start a bit higher in the learning about all things security tree.

Thanks again for the report and the information!

@Feder1co5oave
Copy link
Contributor

@davisjam: Thanks! Still think I'm too naive to fully understand what makes it a security thing and not just a "jerk move" and maybe that's actually okay that I don't fully understand. :)

The consensus is that being able to make a service unavailable is a security issue.

@davisjam
Copy link
Contributor

Do you guys think you have an idea on how to fix, or do you want me to take a look?

@joshbruce
Copy link
Member Author

joshbruce commented Feb 26, 2018

Just got a note from @karenyavine:

Would like to hear when a fix has been issued and published to npm.
I'll wait a bit for publishing a publish vulnerability advisory, do you think there is an ETA for this?

@davisjam: It's open source, brother, if you can do something, I say go for it. It'll be reviewed in the same manner. :)

Tagging @UziTech, @Feder1co5oave, and @styfle explicitly for situational awareness as they are the other committers. Maybe if you get something we can put it into #1076. Having a crew of four and the number of users we have is a bit tough; so, having more contributors helping out to fix issues would be a good thing in my book. (I don't know enough about regex to really do much I'm afraid.)

@davisjam
Copy link
Contributor

Tests demonstrating both attacks are in #1083.

@davisjam
Copy link
Contributor

Analysis of the root cause of the nolink vulnerability (n.1):

Here's a railroad diagram.

The attack string looks like this:

![\[[]\[[]....

The pump string: \[[] can take either of two routes through the diagram.

Route 1: "the middle way" --- go through the middle path to consume [, then loop around and take the top path to consume the subsequent "[]".

Route 2: "the bottom way" --- go through the bottom path to consume '', then loop around and take the top path to consume the subsequent "[[]".

Each additional pump adds a new set of two choices. On a mismatch the regex engine has to try each of the resulting 2^n choices for an exponential blow-up.

@davisjam
Copy link
Contributor

Not sure n.1 can be fixed by refactoring the regex.
Are you open to a simpler regex to grab "everything inside []" and then parse the innards?

@davisjam
Copy link
Contributor

davisjam commented Feb 26, 2018

[edit: tweak route names and descriptions slightly for clarity]

Analysis of the root cause of the html.closing vulnerability (n.2):

Here's a railroad diagram.

The attack string looks like this:

<tag\t\t""\t\t""...<"\t/>a'a

The pump string \t\t"" can take either of two routes through the diagram.

Route 1: "double pass": Loop through the bottom group twice to consume both tabs in the 'whitespace" of the bottom group, then loop around and take the top path to consume the subsequent double quotes.

Route 2: "single pass": Consume one tab in the whitespace of the bottom group, one tab in the subsequent [^'"\/>] class, and loop around to the top path as in Route 1.

As in n.1, each additional pump adds a new set of two choices resulting in an exponential blow-up.

Update: The same problematic construction occurs in 'inline.tag'.

@davisjam
Copy link
Contributor

I believe I fixed n.2 in #1083. All but one of the tests pass, which was the same behavior as in a clean checkout.

I think the new regex (visual for the simpler of the two) matches the exact same language as the old one.

@joshbruce
Copy link
Member Author

@davisjam: Just saw the other comment re simpler solution - definitely down to see what that might be. I don't think we're strictly married to any specific solution per se as long as it fixes the issue and is backward compatible...that would be the only reason I can see it not being okay.

@davisjam
Copy link
Contributor

I believe this pattern is vulnerable in the same way that n.1 is:

inline._inside = /(?:\[[^\]]*\]|\\[\[\]]|[^\[\]]|\](?=[^\[]*\]))*/;

davisjam added a commit to davisjam/marked that referenced this issue Feb 26, 2018
Problem:
Four regexes were vulnerable to catastrophic backtracking.
This leaves markdown servers open to a potential REDOS attack.

Solution:
Refactor the regexes.

For two similar regexes (html) I didn't change the language.
For two similar regexes (noline) I slightly changed the language:

![[[[[[[[[[[]] was accepted by the old noline pattern.
It is now rejected.

All tests pass, though I'm not sure if I've broken something that
was untested.

This addresses markedjs#1070 (with markedjs#1058 along the way).
davisjam added a commit to davisjam/marked that referenced this issue Feb 27, 2018
Problem:
Four regexes were vulnerable to catastrophic backtracking.
This leaves markdown servers open to a potential REDOS attack.

Solution:
Refactor the regexes.

For two similar regexes (html) I didn't change the language.
For two similar regexes (noline) I slightly changed the language:

![[[[[[[[[[[]] was accepted by the old noline pattern.
It is now rejected.

All tests pass, though I'm not sure if I've broken something that
was untested.

This addresses markedjs#1070 (with markedjs#1058 along the way).

Bonus: rename a stray test to use _ instead of -.
@davisjam
Copy link
Contributor

Closed by #1083?

@joshbruce
Copy link
Member Author

Believe so. See #1093, if you haven't already. Will close.

Please let us know if you find something else!

7korobi pushed a commit to 7korobi/marked that referenced this issue Mar 13, 2018
Problem:
Four regexes were vulnerable to catastrophic backtracking.
This leaves markdown servers open to a potential REDOS attack.

Solution:
Refactor the regexes.

For two similar regexes (html) I didn't change the language.
For two similar regexes (noline) I slightly changed the language:

![[[[[[[[[[[]] was accepted by the old noline pattern.
It is now rejected.

All tests pass, though I'm not sure if I've broken something that
was untested.

This addresses markedjs#1070 (with markedjs#1058 along the way).

Bonus: rename a stray test to use _ instead of -.
roback added a commit to twingly/twingly-search-api-node that referenced this issue May 7, 2018
Vulnerability in marked: markedjs/marked#1070
Released in https://github.com/markedjs/marked/releases/tag/v0.3.17

I don't think it affects us though as it's only a dependency for jsdoc.
zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this issue Nov 8, 2021
Problem:
Four regexes were vulnerable to catastrophic backtracking.
This leaves markdown servers open to a potential REDOS attack.

Solution:
Refactor the regexes.

For two similar regexes (html) I didn't change the language.
For two similar regexes (noline) I slightly changed the language:

![[[[[[[[[[[]] was accepted by the old noline pattern.
It is now rejected.

All tests pass, though I'm not sure if I've broken something that
was untested.

This addresses markedjs#1070 (with markedjs#1058 along the way).

Bonus: rename a stray test to use _ instead of -.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted L0 - security A security vulnerability within the Marked library is discovered question
Projects
None yet
Development

No branches or pull requests

3 participants