Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhitelistSanitizer manipulating URLs #98

Closed
archonic opened this issue Aug 21, 2019 · 2 comments
Closed

WhitelistSanitizer manipulating URLs #98

archonic opened this issue Aug 21, 2019 · 2 comments

Comments

@archonic
Copy link

I've found that a WhitelistSanitizer instance will manipulate the values of an allowed attribute.

Rails::Html::WhiteListSanitizer.new.sanitize('<img src="https://example/$/example.jpg">', tags: %w(img), attributes: %w(src))
=> "<img src=\"https://example/%24/example.jpg\">"

The conversion of $ to %24 can cause some urls to 404. Is this intentional? Is there a way to configure it to leave the values of attributes as is?

@flavorjones
Copy link
Member

Hi @archonic, thanks for asking this question, and apologies that nobody has responded for so long.

Diagnosis

The behavior you're describing actually belongs to libxml2 via Nokogiri, and is pretty commonly reported to both the Loofah and Nokogiri projects:

Nokogiri::HTML::DocumentFragment.parse("<img src='https://example/$/example.jpg'>").to_html
# => "<img src=\"https://example/%24/example.jpg\">" 

(The dependency chain here is rails-html-sanitizerloofahnokogirilibxml2.)

Here's the C code that controls URI-escaping of certain HTML attributes at serialization-time (when the document is printed):

https://gitlab.gnome.org/GNOME/libxml2/blob/v2.9.2/HTMLtree.c#L714-718

Specifically, href, action, src, and name (but only within an anchor) are always escaped when generating HTML -- basically, anything that could be a URI reference.

Prognosis

The good news is that I think we can fix this in the next few months. Check out sparklemotion/nokogiri#2204 for progress on integration HTML5 parsing into Nokogiri; the next logical step after that is introducing HTML5 support into Loofah and rails-html-sanitizer.

Unfortunately, until then, there's no easy way to deal with this. Sorry I couldn't be of more help.

@archonic
Copy link
Author

That's a great response, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants