Backtracking vulnerability #1070

joshbruce · 2018-02-21T20:08:29Z

My name is James. I'm a security researcher at Virginia Tech.

The npm module marked has 2 regular expressions vulnerable to catastrophic backtracking.

The vulnerable expressions are listed below.

If you don't know, catastrophic backtracking is when the regex engine takes more than linear time to scan a string.

There are lots of resources about it on the web. I have included some starting points below.

Catastrophic backtracking is particularly problematic if two conditions are met:

The module is used by server processes, and
The regex can be reached by user input.

Please evaluate whether these conditions apply to your users and your regexes.

However, copy/pasting regexes is quite common. Even if one or both conditions fail now,
these regexes may be a time bomb in your code for later -- you might still want to fix it.

For example, if your module is used on the client side, consider whether a server-side counterpart might re-use this pattern later.

Fix advice:

If you want to make fixes, here are the most common approaches with examples:

Refactor the regex pattern
- killmenot/valid-data-url@64bad3c
- htmllint/htmllint@eec921c
Replace the regex with custom parsing
- jshttp/forwarded@d469116
- kriskowal/q-io@b9c54c8
Apply input sanitization, suitable if legitimate input strings are (much) shorter than the attack string
- hgoebl/mobile-detect.js@7222f6e
- node-modules/charset@effda0c

You can study other examples using the links from Snyk.io's vulnerability database: https://snyk.io/vuln/?packageManager=all

Validating your fix:

I am happy to test a refactored regex with my tools -- send me an email with the commit ID.
Existing GitHub projects for regex testing are:
- substack's safe-regex project (https://github.com/substack/safe-regex) -- easy to install, has false positives and some bugs
- Weideman's RegexStaticAnalysis project (https://github.com/NicolaasWeideman/RegexStaticAnalysis) -- a bit more work to install, may have false positives

If this is a new topic for you, here are some good starting points for information:

Now, here are the vulnerable patterns with attack details.

Each vulnerability has the following keys, explained in more detail below:

pattern
filesIn (as of December/January; I excluded any appearances in irrelevant-looking dirs, and in '.min' files)
stringLenFor10Sec
nPumpsFor10Sec
attackFormat
blowupCurve

The attack format describes how to generate an attack string.

On my machine, an attack string generated using nPumpsFor10Sec repetitions ("pumps") of the pump string(s) blocks the js regex engine for 10 seconds, though this will vary based on your hardware.

Compose an attack string like this:

'prefix 1' + 'pump 1' X times + 'prefix 2' + 'pump 2' X times + ... + suffix

Example:

With pumpPairs: [{'prefix': 'a', 'pump': 'b'}], suffix: 'c', an attack string with three pumps would be:

abbbc

Catastrophic backtracking blows up at either an exponential rate or a super-linear (power law) rate.

The blowupCurve indicates how severe the blow-up is.

The 'type' is either EXP(onential) or POW(er law) in the number of pumps (x).

The 'parms' are the parameters for the two curve types. The second parameter is more important, because:

EXP: f(x) = parms[0] * parms[1]^x
POW: f(x) = parms[0] * x^parms[1]

I strongly recommend you address any EXP(onential) blow-ups.

POW(er law) blow-ups vary in their severity depending on the value of the second parameter, and in the length of the attack string ('stringLenFor10Sec').

But they are still a legitimate attack vector.

JSON formatted:

Vuln 1:

{
   "attackFormat" : {
      "suffix" : "!\\",
      "pumpPairs" : [
         {
            "prefix" : "![",
            "pump" : "\\[[]"
         }
      ]
   },
   "pattern" : "^!?\\[((?:\\[[^\\]]*\\]|\\\\[\\[\\]]|[^\\[\\]])*)\\]",
   "filesIn" : [
      [
         "lib/marked.js"
      ]
   ],
   "blowupCurve" : {
      "parms" : [
         0.000686698017502894,
         0.320195646568061
      ],
      "type" : "EXP",
      "r2" : 0.982167673285096
   },
   "stringLenFor10Sec" : 124,
   "nPumpsFor10Sec" : "30"
}

Vuln 2:

{
   "pattern" : "<tag(?:\"[^\"]*\"|'[^']*'|\\s[^'\"\\/>]*)*?\\/?>",
   "attackFormat" : {
      "suffix" : "<\"\t/>a'a",
      "pumpPairs" : [
         {
            "prefix" : "<tag",
            "pump" : "\t\t\"\""
         }
      ]
   },
   "nPumpsFor10Sec" : "30",
   "stringLenFor10Sec" : 132,
   "blowupCurve" : {
      "r2" : 0.98785310777975,
      "type" : "EXP",
      "parms" : [
         0.000399342208756204,
         0.338098864022787
      ]
   },
   "filesIn" : [
      [
         "lib/marked.js"
      ]
   ]
}

The text was updated successfully, but these errors were encountered:

davisjam · 2018-02-21T20:14:36Z

Let me know if you have any questions.

joshbruce · 2018-02-21T20:19:13Z

@davisjam: Did you run this initial assessment against a specific version of Marked? If so, which one?

davisjam · 2018-02-21T20:20:28Z

Whatever branch comes down when you run 'git clone'.

joshbruce · 2018-02-21T21:02:02Z

@davisjam: This one has confused me a bit regarding why it is considered a "security vulnerability"...mainly for my own edification. If the server gets stuck in a process does that open the server itself up to attack?

I'm a layman in this territory; so, be gentle. :)

Seems like the equivalent of a stack overflow or something similar we would find in PHP and other server-side processes, which is usually handled by a timeout cancelling the process...not picking up the "security vulnerability" part of these types of findings.

ps. @UziTech did try to explain it once I think on the #1093 Issue - don't think it stuck.

davisjam · 2018-02-21T21:08:27Z

@joshbruce Suppose you're running a Node server that uses a regex to validate user input, perhaps somewhere in your middleware layer for handling HTTP requests. Now suppose there is input on which the time it takes to evaluate the regex doubles for each extra character -- an exponential blow-up.

A simple example is /(a+)+$/ which will blow up on inputs of the form "aaaaaaaaa....a!".

A malicious client could make an HTTP request with, say, 100 characters, and make the Node server take eons to evaluate it. While the Node server is stuck in this callback for one client, any other pending requests will hang -- disaster! I have a guide on this topic if you're unfamiliar with it.

Unless the developer was extra clever (using the VM module or a child process), Node can't intervene with a timeout. The V8 regex engine just keeps trying, unlike PHP or sometimes Perl which will throw an exception on a runaway regex evaluation.

The only thing you can do is detect that your server's throughput has dropped using some kind of heartbeat mechanism, shoot the server, and restart it. But of course the attacker can just submit another malicious request.

joshbruce · 2018-02-21T21:19:37Z

@davisjam: Thanks! (Admittedly, I've only made it through the tl;dr - but have it open to read later.) So, is this a performance vulnerability or a security vulnerability or maybe both?

If it's covered later in the paper - just say the word and I'll stop asking. ;)

The security vulnerability is more from someone being able to block the app for other people - but it doesn't actually allow for the compromising of their data and so on. I could see where that would be a problem on, say, a stock trading application, for example.

Just wondering severity, impact, and likelihood - risk management mode, sorry. (Definitely appreciate the suggested solutions included in the report. And will let the rest of the crew marinate and respond - truly just for my own edification - I love learning new things.)

davisjam · 2018-02-21T21:43:46Z

So, is this a performance vulnerability or a security vulnerability or maybe both?

Speaking generically about "blocking the event loop or the worker pool": Both.

For server-side code, if an attacker can cause your event loop or worker pool to block it's a security vulnerability.

For client-side code, it's a performance problem, but only if a user might accidentally cause the event loop (or WebWorkers) to block. Though you could buy up ad space on Google and be annoying to people; this is effectively what auto-play ads do anyway.

Speaking specifically about catastrophic backtracking:

mostly of interest for REDOS, where a client affects a server
if the "attack strings" look like somethings users might enter on accident, this could be seen as a performance problem even on the client side
if the client side code is doing something that a server might want to do, this is a copy/paste "time bomb" waiting to happen. "Full Stack JavaScript" has its downsides.

joshbruce · 2018-02-22T00:18:19Z

@davisjam: Thanks! Read the rest. Again, please excuse the ignorance on my part, the WikiPedia article didn't give me the "ah-ha!" moment re REDOS and DOS being a "security" piece...maybe I just have a limited understanding of what we mean by "security" in this context.

Is the attack, effectively, against the other clients accessing the server? Not necessarily the server itself? (Trying to parse between attacking a computer and who is hurt by the attack to classify it as security issue.)

Also, might be worth checking out #1039 - sorry, thought I posted this earlier.

Feder1co5oave · 2018-02-22T02:49:35Z

Vulnerability n.2 is confirmed in #1058, it affects the online HTML rule and I've been working on it

davisjam · 2018-02-22T03:27:18Z

@joshbruce If a Node server is testing client input with a vulnerable regex, a malicious client can submit input to trigger the catastrophic backtracking. The Node server will get stuck in the callback and not be able to handle any other clients. The result is Denial of Service -- hence the nickname "Regular Expression Denial of Service" or ReDoS/REDOS.

joshbruce · 2018-02-26T00:36:18Z

@davisjam: Thanks! Still think I'm too naive to fully understand what makes it a security thing and not just a "jerk move" and maybe that's actually okay that I don't fully understand. :)

Just playing scenarios in my head of how and when someone could become an injured party on the other end. Again, maybe ignorance is bliss for me right now and I need to start a bit higher in the learning about all things security tree.

Thanks again for the report and the information!

Feder1co5oave · 2018-02-26T15:34:19Z

@davisjam: Thanks! Still think I'm too naive to fully understand what makes it a security thing and not just a "jerk move" and maybe that's actually okay that I don't fully understand. :)

The consensus is that being able to make a service unavailable is a security issue.

davisjam · 2018-02-26T15:37:38Z

Do you guys think you have an idea on how to fix, or do you want me to take a look?

joshbruce · 2018-02-26T15:41:09Z

Just got a note from @karenyavine:

Would like to hear when a fix has been issued and published to npm.
I'll wait a bit for publishing a publish vulnerability advisory, do you think there is an ETA for this?

@davisjam: It's open source, brother, if you can do something, I say go for it. It'll be reviewed in the same manner. :)

Tagging @UziTech, @Feder1co5oave, and @styfle explicitly for situational awareness as they are the other committers. Maybe if you get something we can put it into #1076. Having a crew of four and the number of users we have is a bit tough; so, having more contributors helping out to fix issues would be a good thing in my book. (I don't know enough about regex to really do much I'm afraid.)

davisjam · 2018-02-26T16:08:35Z

Tests demonstrating both attacks are in #1083.

davisjam · 2018-02-26T16:20:16Z

Analysis of the root cause of the nolink vulnerability (n.1):

Here's a railroad diagram.

The attack string looks like this:

![\[[]\[[]....

The pump string: \[[] can take either of two routes through the diagram.

Route 1: "the middle way" --- go through the middle path to consume [, then loop around and take the top path to consume the subsequent "[]".

Route 2: "the bottom way" --- go through the bottom path to consume '', then loop around and take the top path to consume the subsequent "[[]".

Each additional pump adds a new set of two choices. On a mismatch the regex engine has to try each of the resulting 2^n choices for an exponential blow-up.

davisjam · 2018-02-26T17:11:47Z

Not sure n.1 can be fixed by refactoring the regex.
Are you open to a simpler regex to grab "everything inside []" and then parse the innards?

davisjam · 2018-02-26T17:19:48Z

[edit: tweak route names and descriptions slightly for clarity]

Analysis of the root cause of the html.closing vulnerability (n.2):

Here's a railroad diagram.

The attack string looks like this:

<tag\t\t""\t\t""...<"\t/>a'a

The pump string \t\t"" can take either of two routes through the diagram.

Route 1: "double pass": Loop through the bottom group twice to consume both tabs in the 'whitespace" of the bottom group, then loop around and take the top path to consume the subsequent double quotes.

Route 2: "single pass": Consume one tab in the whitespace of the bottom group, one tab in the subsequent [^'"\/>] class, and loop around to the top path as in Route 1.

As in n.1, each additional pump adds a new set of two choices resulting in an exponential blow-up.

Update: The same problematic construction occurs in 'inline.tag'.

davisjam · 2018-02-26T17:39:59Z

I believe I fixed n.2 in #1083. All but one of the tests pass, which was the same behavior as in a clean checkout.

I think the new regex (visual for the simpler of the two) matches the exact same language as the old one.

joshbruce · 2018-02-26T19:25:21Z

@davisjam: Just saw the other comment re simpler solution - definitely down to see what that might be. I don't think we're strictly married to any specific solution per se as long as it fixes the issue and is backward compatible...that would be the only reason I can see it not being okay.

davisjam · 2018-02-26T20:15:52Z

I believe this pattern is vulnerable in the same way that n.1 is:

inline._inside = /(?:\[[^\]]*\]|\\[\[\]]|[^\[\]]|\](?=[^\[]*\]))*/;

Problem: Four regexes were vulnerable to catastrophic backtracking. This leaves markdown servers open to a potential REDOS attack. Solution: Refactor the regexes. For two similar regexes (html) I didn't change the language. For two similar regexes (noline) I slightly changed the language: ![[[[[[[[[[[]] was accepted by the old noline pattern. It is now rejected. All tests pass, though I'm not sure if I've broken something that was untested. This addresses markedjs#1070 (with markedjs#1058 along the way).

Problem: Four regexes were vulnerable to catastrophic backtracking. This leaves markdown servers open to a potential REDOS attack. Solution: Refactor the regexes. For two similar regexes (html) I didn't change the language. For two similar regexes (noline) I slightly changed the language: ![[[[[[[[[[[]] was accepted by the old noline pattern. It is now rejected. All tests pass, though I'm not sure if I've broken something that was untested. This addresses markedjs#1070 (with markedjs#1058 along the way). Bonus: rename a stray test to use _ instead of -.

davisjam · 2018-02-28T17:49:02Z

Closed by #1083?

joshbruce · 2018-02-28T17:53:33Z

Believe so. See #1093, if you haven't already. Will close.

Please let us know if you find something else!

Problem: Four regexes were vulnerable to catastrophic backtracking. This leaves markdown servers open to a potential REDOS attack. Solution: Refactor the regexes. For two similar regexes (html) I didn't change the language. For two similar regexes (noline) I slightly changed the language: ![[[[[[[[[[[]] was accepted by the old noline pattern. It is now rejected. All tests pass, though I'm not sure if I've broken something that was untested. This addresses markedjs#1070 (with markedjs#1058 along the way). Bonus: rename a stray test to use _ instead of -.

Vulnerability in marked: markedjs/marked#1070 Released in https://github.com/markedjs/marked/releases/tag/v0.3.17 I don't think it affects us though as it's only a dependency for jsdoc.

Problem: Four regexes were vulnerable to catastrophic backtracking. This leaves markdown servers open to a potential REDOS attack. Solution: Refactor the regexes. For two similar regexes (html) I didn't change the language. For two similar regexes (noline) I slightly changed the language: ![[[[[[[[[[[]] was accepted by the old noline pattern. It is now rejected. All tests pass, though I'm not sure if I've broken something that was untested. This addresses markedjs#1070 (with markedjs#1058 along the way). Bonus: rename a stray test to use _ instead of -.

joshbruce changed the title ~~Backtracking~~ Backtracking vulnerability Feb 21, 2018

joshbruce added the question label Feb 21, 2018

Feder1co5oave added parser: GFM L0 - security A security vulnerability within the Marked library is discovered and removed parser: GFM labels Feb 22, 2018

joshbruce added the help wanted label Feb 26, 2018

davisjam mentioned this issue Feb 26, 2018

security: fix REDOS vulnerabilities #1083

Merged

davisjam mentioned this issue Feb 26, 2018

marked 0.3.14 hangs on the attached file, worked in 0.3.9 #1058

Closed

davisjam mentioned this issue Feb 26, 2018

Unparseable vulnerable pattern NicolaasWeideman/RegexStaticAnalysis#5

Open

joshbruce closed this as completed Feb 28, 2018

snyk-bot mentioned this issue Apr 12, 2022

[Snyk] Security upgrade rollup-plugin-license from 2.3.0 to 2.7.0 ekmixon/marked#72

Open

CMaheshBL mentioned this issue May 6, 2022

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 CMaheshBL/NodeGoat#185

Open

cxronen mentioned this issue May 20, 2022

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 cxronen/AST_BookStore#225

Open

cxronen mentioned this issue Nov 28, 2022

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 cxronen/BookStore#168

Open

cxronen mentioned this issue Jan 26, 2023

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 cxronen/BookStore#376

Open

RobertMickleCx mentioned this issue Mar 7, 2023

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 RobertMickleCx/NodeGoat#175

Open

This was referenced Feb 21, 2023

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 cxronen/BookStore#621

Open

Cxee7cbf9f-8b8d @ Npm-marked-0.3.9 cxronen/AST_BookStore#473

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backtracking vulnerability #1070

Backtracking vulnerability #1070

joshbruce commented Feb 21, 2018 •

edited

davisjam commented Feb 21, 2018

joshbruce commented Feb 21, 2018

davisjam commented Feb 21, 2018

joshbruce commented Feb 21, 2018 •

edited

davisjam commented Feb 21, 2018

joshbruce commented Feb 21, 2018

davisjam commented Feb 21, 2018

joshbruce commented Feb 22, 2018

Feder1co5oave commented Feb 22, 2018

davisjam commented Feb 22, 2018

joshbruce commented Feb 26, 2018

Feder1co5oave commented Feb 26, 2018

davisjam commented Feb 26, 2018

joshbruce commented Feb 26, 2018 •

edited

davisjam commented Feb 26, 2018

davisjam commented Feb 26, 2018

davisjam commented Feb 26, 2018

davisjam commented Feb 26, 2018 •

edited

davisjam commented Feb 26, 2018

joshbruce commented Feb 26, 2018

davisjam commented Feb 26, 2018

davisjam commented Feb 28, 2018

joshbruce commented Feb 28, 2018

Backtracking vulnerability #1070

Backtracking vulnerability #1070

Comments

joshbruce commented Feb 21, 2018 • edited

davisjam commented Feb 21, 2018

joshbruce commented Feb 21, 2018

davisjam commented Feb 21, 2018

joshbruce commented Feb 21, 2018 • edited

davisjam commented Feb 21, 2018

joshbruce commented Feb 21, 2018

davisjam commented Feb 21, 2018

joshbruce commented Feb 22, 2018

Feder1co5oave commented Feb 22, 2018

davisjam commented Feb 22, 2018

joshbruce commented Feb 26, 2018

Feder1co5oave commented Feb 26, 2018

davisjam commented Feb 26, 2018

joshbruce commented Feb 26, 2018 • edited

davisjam commented Feb 26, 2018

davisjam commented Feb 26, 2018

davisjam commented Feb 26, 2018

davisjam commented Feb 26, 2018 • edited

davisjam commented Feb 26, 2018

joshbruce commented Feb 26, 2018

davisjam commented Feb 26, 2018

davisjam commented Feb 28, 2018

joshbruce commented Feb 28, 2018

joshbruce commented Feb 21, 2018 •

edited

joshbruce commented Feb 21, 2018 •

edited

joshbruce commented Feb 26, 2018 •

edited

davisjam commented Feb 26, 2018 •

edited