Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitization of user generated content #720

Open
collimarco opened this issue Jan 14, 2022 · 3 comments
Open

Sanitization of user generated content #720

collimarco opened this issue Jan 14, 2022 · 3 comments

Comments

@collimarco
Copy link

collimarco commented Jan 14, 2022

Hello,

I read in the description that Redcarpet is "The safe Markdown parser".

Does that mean that it is safe to use it for untrusted user input? (e.g. forum, Q&A, comments, chat, etc.)

In particular I have tried:

Option 1

Use only this gem:

def sanitized_markdown content
    renderer = Redcarpet::Render::HTML.new \
      escape_html: true,
      safe_links_only: true,
      link_attributes: { rel: 'ugc'}
    markdown = Redcarpet::Markdown.new renderer, autolink: true
    markdown.render(content).html_safe
end

Downside: Users cannot use HTML tags inside markdown.

Option 2

Use another strategy (Rails sanitize) for sanitization:

def sanitized_markdown content
    renderer = Redcarpet::Render::HTML.new
    markdown = Redcarpet::Markdown.new renderer, autolink: true
    html = markdown.render(content)
    sanitize html
end

Downside: You cannot add rel nofollow to links.

Documentation

It seems a pretty common need to use markdown parsing on user generated content (ugc).

Unfortunately I haven't found anything in the docs.

What strategy do you recommend?

@collimarco
Copy link
Author

I made a little more research... StackOverflow and Github for example accept the basic HTML tags inside Markdown and the Markdown standard says the same. They simply strip (StackOverflow) or escape (Github) the unsafe tags (e.g. script).

The strange thing about the default HTML renderer included in this gem, is that it escapes/strip all or nothing. It doesn't have a way to remove only the unsafe tags (like StackOverflow and Github do).

An alternative would be to render all HTML tags and then use an external sanitize, but in this case you lose all the interesting features of the Redcarpet renderer (e.g. link_attributes).

@nickjj
Copy link

nickjj commented Mar 13, 2022

And if anyone is curious why you can't add attributes: ["rel"] to sanitize in to option 2, it's because it would allow untrusted input.

Here's a 1 liner you can copy / paste into the Rails console to auto link links and automatically add rel="nofollow":

ActionController::Base.helpers.sanitize(Redcarpet::Markdown.new(Redcarpet::Render::HTML.new(link_attributes: { rel: "nofollow" }), autolink: true).render("https://example.com"), attributes: ["rel"])

But this lets the user override nofollow when creating links with raw HTML:

ActionController::Base.helpers.sanitize(Redcarpet::Markdown.new(Redcarpet::Render::HTML.new(link_attributes: { rel: "nofollow" }), autolink: true).render('<a href="https://example.com" rel="">example</a>'), attributes: ["rel", "href"])

Another solution could be to do a second pass on the rendered HTML after sanitize to auto-insert the nofollows in there without using the attributes feature of sanitize but now we're taking 3 passes over the input and 2 of them are done in Ruby which will likely counter act much of the speed perks of using RedCarpet's C parser.

@AndrewKvalheim
Copy link

For anyone just looking for a solution, here’s an example of the inefficient triple pass in Rails:

html = markdown.render(content)
sanitized_html = sanitize(html)
nofollowed_sanitized_html = sanitize(sanitized_html, scrubber: Loofah::Scrubbers::NoFollow.new)

Of course ideally you’d be able to accomplish this with just Redcarpet.

AndrewKvalheim added a commit to AndrewKvalheim/osem that referenced this issue Mar 4, 2023
To disincentivize spamdexing, links in user-generated content should be
disavowed by annotation with `rel="nofollow"` attributes:

  - https://en.wikipedia.org/wiki/Nofollow

Automated spam has already targeted OSEM in the wild:

  - SeaGL/organization#274

Ideally link annotation would be performed during Markdown rendering or
a single sanitization pass, but this is currently an unresolved issue:

  - vmg/redcarpet#720
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants