Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Markdown, storage and security #1184

Closed
oasiz opened this issue Mar 28, 2018 · 16 comments · Fixed by #1432
Closed

Question: Markdown, storage and security #1184

oasiz opened this issue Mar 28, 2018 · 16 comments · Fixed by #1432
Labels

Comments

@oasiz
Copy link

oasiz commented Mar 28, 2018

Hi guys!

I realise this is a bug reporting section so I hope it's OK to post this, but I was hoping for some expert advice regarding Markdown and security.

Firstly, my situation is that I want to allow members on our website to have a section of their user profile that they can edit using the markdown format, parsed by Marked.

Importantly, the data cannot be trusted - anyone can register for an account so I need to be careful with what can/can't be uploaded and displayed.

Here's how things currently stand on my development site:

  1. I edit the raw parsedown code in a textarea and click submit.
  2. The following is sent by POST to the server:
  • DataA: marked([raw-unsanitized-data], { sanitize: true })
  • DataB: The original raw data
  1. DataA is put through HTMLPurifier and then stored in a .htm file which is retrieved by http GET upon user profile load.
  2. DataB is stored in the MySQL database as-is, as original data obviously required for when re-editing.
  3. DataA (hopefully now sanitized) is returned and displayed on the user profile.

Hopefully that makes sense. Now, my question(s)..

Is that overkill? To the best of your knowledge is that safe enough?

In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

Main question: Would it be horribly irresponsible to just work with the raw data and just output:

marked([raw-unsanitized-data], { sanitize: true })

Possibly as an additional safeguard use DOMPurify:

DOMPurify.sanitize(marked(<raw-unsanitized-data>, { sanitize: true }))

? I can't help but feel like it probably would be, but I'd like to hear it from a real expert.

PS Further info, all site data is SSL and all cookies are secure/HttpOnly etc.

OK, if you're still here - thanks for staying awake!

But seriously, I'm no expert on this by any means so I really appreciate any insight/advice you can give.

Best Regards,

Rob

@UziTech
Copy link
Member

UziTech commented Mar 28, 2018

I don't think pre-rendering the html will helping with XSS since the same user generated content will show on the page regardless of which method you take.

The only security issues I have seen in marked have been ReDoS vulnerabilities. If you want to protect against someone using up resources on your servers in a ReDoS attack, you should pre-render the markdown in a separate process, and if that process takes longer than some threshold (more than a second maybe) cancel it and show an error to the user.

I can't really say how well marked's sanitize option does or whether DOMPurify.sanitize would make the output HTML any safer.

/cc @davisjam @Feder1co5oave

@Martii
Copy link
Contributor

Martii commented Mar 28, 2018

I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

On our project we always store the raw, and render/sanitize on request only since as you stated it's super fast. Our establishing owner wanted to store the processed raw data however with sanitizers some security issue may come up (there is precedence) and you would need to regenerate the statics again. That's a hefty CPU task to continually do and could in itself generate a DoS of sorts. e.g. we don't currently cache this process.

As far as doing it twice, perhaps on client side, Userscripts can usually defeat this so you wouldn't want that security issue... so it's best to have it done server side as much as possible. e.g. CPU time is server side instead of client side (which sucks but is a fact of life in our arena).

@oasiz
Copy link
Author

oasiz commented Mar 30, 2018

Hi, thanks for the replies - much appreciated, that's some useful info.

I've noticed that Github uses server-side technology only, no JS rendering. Presumably they're storing both raw and rendered versions of all posts. That's a lot of database storage!

I'm still unsure of the best way to do this, it seems like there's no definitive answer.

Storing the raw data and rendering client-side with JS would obviously be ideal, but only if it's going to be secure.

I do wonder what other users of markedjs do with the issue of data/storage/security..

@styfle
Copy link
Member

styfle commented Apr 2, 2018

@oasiz The reason why GitHub renders on the server-side is because they use Ruby to render Markdown to HTML (Kramdown I think). They literally can't render on the client (unless they find a ruby-to-wasm compiler or start using marked.js 😄)

@davisjam
Copy link
Contributor

davisjam commented Apr 4, 2018

In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

@oasiz Can you clarify what security concerns you have with rendering the raw markdown on the client side?

I'm guessing the scenario you're imagining is:

Attacker Sets profile to malicious content.

Victim On visiting the attacker's profile, the server returns the raw data. While rendering it something bad happens.

Is the "something bad" what you're concerned about, or am I misunderstanding?

@Feder1co5oave
Copy link
Contributor

My 2 cents. You either:

  • save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients, OR

  • you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.

ps. Github parses markdown content into html by using a customized version of the C implementation of commonmark

@davisjam
Copy link
Contributor

davisjam commented Apr 4, 2018

save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients

This approach should work fine. I would use a worker pool to render the JS with a timeout.

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

I think this is a better approach. Since the markdown being rendered is untrusted, perhaps render it in a WebWorker and enforce a timeout. Hopefully this project doesn't have any more vulnerable regexes but you never know. Possible instructions here.

Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.

+1. Under the "malicious client pre-renders the HTML" design, the user can upload whatever HTML they want to. They don't have to execute your code.

@oasiz
Copy link
Author

oasiz commented Apr 4, 2018

Hi, appreciate the replies :-)

Is the "something bad" what you're concerned about, or am I misunderstanding?

Yes, absolutely correct.

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

That's what I'd like to do if you guys feel it's secure, as it only involves storing the original data and leaving the rest client-side.

Less storage space, less server CPU. Win-win right?!

@davisjam
Copy link
Contributor

davisjam commented Apr 6, 2018

That's what I'd like to do if you guys feel it's secure

I'd recommend you consider the WebWorker-based approach I suggested above. Does it seem feasible for you?

That ensures that if we slip up and let a REDOS issue through, your clients can't be attacked by a malicious profile.

@styfle
Copy link
Member

styfle commented Apr 6, 2018

@davisjam Can you add a web worker example to the docs?

@davisjam
Copy link
Contributor

davisjam commented Apr 6, 2018

@styfle Good idea.

In the interest of laziness, @oasiz any chance that:

  1. You'll be using WebWorkers, and
  2. The code will be open source? (well obviously any code you're running client-side is "open source" one way or another, but...)

If so I'll wait and use you as an example. Free advertising!

@oasiz
Copy link
Author

oasiz commented Apr 6, 2018

@davisjam Yes I will be implementing whatever is necessary in order to secure the app, so I'll be happy to help in any way I can. Always like a bit of free advertising!

Hopefully this won't be a problem, but the site I'm working on isn't currently open to the public - although I'm hoping to release a basic version in a couple of weeks or less.

I've never used WebWorkers, so looking forward to learning something new :-)

@styfle
Copy link
Member

styfle commented Oct 19, 2018

@oasiz Are you still interested in adding WebWorkers to the docs?

@sgon00
Copy link

sgon00 commented Jan 17, 2019

I am googling how to use DOMPurify with marked and found this issue. Shouldn't DOMPurify.sanitize(marked(markdown, {sanitize: true})) be DOMPurify.sanitize(marked(markdown))? Because {sanitize: true} will convert < to &lt;, and DOMPurify.sanitize should take a dirty HTML as the argument. Thanks.

@sgon00
Copy link

sgon00 commented Jan 17, 2019

Please ignore my last comment. I was wrong. It's right to have {sanitize: true} for this issue. The reason why I thought {sanitize: true} was not right is because my project allows other html tags such as <center>. That's why I don't have this {sanitize: true}, but it's not the situation in this issue. Sorry about that.

@UziTech UziTech mentioned this issue Mar 11, 2019
4 tasks
@UziTech
Copy link
Member

UziTech commented Mar 11, 2019

There is a PR to add web workers and node worker threads to the docs #1432

The PR is just waiting for worker threads to be stable in node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants