Question: Markdown, storage and security #1184

oasiz · 2018-03-28T16:31:57Z

Hi guys!

I realise this is a bug reporting section so I hope it's OK to post this, but I was hoping for some expert advice regarding Markdown and security.

Firstly, my situation is that I want to allow members on our website to have a section of their user profile that they can edit using the markdown format, parsed by Marked.

Importantly, the data cannot be trusted - anyone can register for an account so I need to be careful with what can/can't be uploaded and displayed.

Here's how things currently stand on my development site:

I edit the raw parsedown code in a textarea and click submit.
The following is sent by POST to the server:

DataA: marked([raw-unsanitized-data], { sanitize: true })
DataB: The original raw data

DataA is put through HTMLPurifier and then stored in a .htm file which is retrieved by http GET upon user profile load.
DataB is stored in the MySQL database as-is, as original data obviously required for when re-editing.
DataA (hopefully now sanitized) is returned and displayed on the user profile.

Hopefully that makes sense. Now, my question(s)..

Is that overkill? To the best of your knowledge is that safe enough?

In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

Main question: Would it be horribly irresponsible to just work with the raw data and just output:

marked([raw-unsanitized-data], { sanitize: true })

Possibly as an additional safeguard use DOMPurify:

DOMPurify.sanitize(marked(<raw-unsanitized-data>, { sanitize: true }))

? I can't help but feel like it probably would be, but I'd like to hear it from a real expert.

PS Further info, all site data is SSL and all cookies are secure/HttpOnly etc.

OK, if you're still here - thanks for staying awake!

But seriously, I'm no expert on this by any means so I really appreciate any insight/advice you can give.

Best Regards,

Rob

The text was updated successfully, but these errors were encountered:

UziTech · 2018-03-28T19:09:07Z

I don't think pre-rendering the html will helping with XSS since the same user generated content will show on the page regardless of which method you take.

The only security issues I have seen in marked have been ReDoS vulnerabilities. If you want to protect against someone using up resources on your servers in a ReDoS attack, you should pre-render the markdown in a separate process, and if that process takes longer than some threshold (more than a second maybe) cancel it and show an error to the user.

I can't really say how well marked's sanitize option does or whether DOMPurify.sanitize would make the output HTML any safer.

/cc @davisjam @Feder1co5oave

Martii · 2018-03-28T22:41:04Z

I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

On our project we always store the raw, and render/sanitize on request only since as you stated it's super fast. Our establishing owner wanted to store the processed raw data however with sanitizers some security issue may come up (there is precedence) and you would need to regenerate the statics again. That's a hefty CPU task to continually do and could in itself generate a DoS of sorts. e.g. we don't currently cache this process.

As far as doing it twice, perhaps on client side, Userscripts can usually defeat this so you wouldn't want that security issue... so it's best to have it done server side as much as possible. e.g. CPU time is server side instead of client side (which sucks but is a fact of life in our arena).

oasiz · 2018-03-30T10:28:43Z

Hi, thanks for the replies - much appreciated, that's some useful info.

I've noticed that Github uses server-side technology only, no JS rendering. Presumably they're storing both raw and rendered versions of all posts. That's a lot of database storage!

I'm still unsure of the best way to do this, it seems like there's no definitive answer.

Storing the raw data and rendering client-side with JS would obviously be ideal, but only if it's going to be secure.

I do wonder what other users of markedjs do with the issue of data/storage/security..

styfle · 2018-04-02T23:59:05Z

@oasiz The reason why GitHub renders on the server-side is because they use Ruby to render Markdown to HTML (Kramdown I think). They literally can't render on the client (unless they find a ruby-to-wasm compiler or start using marked.js 😄)

davisjam · 2018-04-04T15:49:20Z

In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

@oasiz Can you clarify what security concerns you have with rendering the raw markdown on the client side?

I'm guessing the scenario you're imagining is:

Attacker Sets profile to malicious content.

Victim On visiting the attacker's profile, the server returns the raw data. While rendering it something bad happens.

Is the "something bad" what you're concerned about, or am I misunderstanding?

Feder1co5oave · 2018-04-04T16:14:07Z

My 2 cents. You either:

save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients, OR
you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.

ps. Github parses markdown content into html by using a customized version of the C implementation of commonmark

davisjam · 2018-04-04T16:22:28Z

save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients

This approach should work fine. I would use a worker pool to render the JS with a timeout.

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

I think this is a better approach. Since the markdown being rendered is untrusted, perhaps render it in a WebWorker and enforce a timeout. Hopefully this project doesn't have any more vulnerable regexes but you never know. Possible instructions here.

Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.

+1. Under the "malicious client pre-renders the HTML" design, the user can upload whatever HTML they want to. They don't have to execute your code.

oasiz · 2018-04-04T16:27:22Z

Hi, appreciate the replies :-)

Is the "something bad" what you're concerned about, or am I misunderstanding?

Yes, absolutely correct.

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

That's what I'd like to do if you guys feel it's secure, as it only involves storing the original data and leaving the rest client-side.

Less storage space, less server CPU. Win-win right?!

davisjam · 2018-04-06T00:49:50Z

That's what I'd like to do if you guys feel it's secure

I'd recommend you consider the WebWorker-based approach I suggested above. Does it seem feasible for you?

That ensures that if we slip up and let a REDOS issue through, your clients can't be attacked by a malicious profile.

styfle · 2018-04-06T01:21:03Z

@davisjam Can you add a web worker example to the docs?

davisjam · 2018-04-06T01:22:49Z

@styfle Good idea.

In the interest of laziness, @oasiz any chance that:

You'll be using WebWorkers, and
The code will be open source? (well obviously any code you're running client-side is "open source" one way or another, but...)

If so I'll wait and use you as an example. Free advertising!

oasiz · 2018-04-06T11:10:03Z

@davisjam Yes I will be implementing whatever is necessary in order to secure the app, so I'll be happy to help in any way I can. Always like a bit of free advertising!

Hopefully this won't be a problem, but the site I'm working on isn't currently open to the public - although I'm hoping to release a basic version in a couple of weeks or less.

I've never used WebWorkers, so looking forward to learning something new :-)

styfle · 2018-10-19T15:02:42Z

@oasiz Are you still interested in adding WebWorkers to the docs?

sgon00 · 2019-01-17T05:28:54Z

I am googling how to use DOMPurify with marked and found this issue. Shouldn't DOMPurify.sanitize(marked(markdown, {sanitize: true})) be DOMPurify.sanitize(marked(markdown))? Because {sanitize: true} will convert < to <, and DOMPurify.sanitize should take a dirty HTML as the argument. Thanks.

sgon00 · 2019-01-17T08:22:17Z

Please ignore my last comment. I was wrong. It's right to have {sanitize: true} for this issue. The reason why I thought {sanitize: true} was not right is because my project allows other html tags such as <center>. That's why I don't have this {sanitize: true}, but it's not the situation in this issue. Sorry about that.

UziTech · 2019-03-11T04:18:19Z

There is a PR to add web workers and node worker threads to the docs #1432

The PR is just waiting for worker threads to be stable in node.

UziTech added the question label Mar 28, 2018

UziTech mentioned this issue Mar 11, 2019

add docs for workers #1432

Merged

4 tasks

joshbruce closed this as completed in #1432 May 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Markdown, storage and security #1184

Question: Markdown, storage and security #1184

oasiz commented Mar 28, 2018 •

edited

UziTech commented Mar 28, 2018

Martii commented Mar 28, 2018

oasiz commented Mar 30, 2018

styfle commented Apr 2, 2018

davisjam commented Apr 4, 2018

Feder1co5oave commented Apr 4, 2018

davisjam commented Apr 4, 2018

oasiz commented Apr 4, 2018 •

edited

davisjam commented Apr 6, 2018

styfle commented Apr 6, 2018

davisjam commented Apr 6, 2018 •

edited

oasiz commented Apr 6, 2018 •

edited

styfle commented Oct 19, 2018

sgon00 commented Jan 17, 2019

sgon00 commented Jan 17, 2019

UziTech commented Mar 11, 2019

Question: Markdown, storage and security #1184

Question: Markdown, storage and security #1184

Comments

oasiz commented Mar 28, 2018 • edited

UziTech commented Mar 28, 2018

Martii commented Mar 28, 2018

oasiz commented Mar 30, 2018

styfle commented Apr 2, 2018

davisjam commented Apr 4, 2018

Feder1co5oave commented Apr 4, 2018

davisjam commented Apr 4, 2018

oasiz commented Apr 4, 2018 • edited

davisjam commented Apr 6, 2018

styfle commented Apr 6, 2018

davisjam commented Apr 6, 2018 • edited

oasiz commented Apr 6, 2018 • edited

styfle commented Oct 19, 2018

sgon00 commented Jan 17, 2019

sgon00 commented Jan 17, 2019

UziTech commented Mar 11, 2019

oasiz commented Mar 28, 2018 •

edited

oasiz commented Apr 4, 2018 •

edited

davisjam commented Apr 6, 2018 •

edited

oasiz commented Apr 6, 2018 •

edited