New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Action View to use HTML5 standards-compliant sanitizers #48293
Update Action View to use HTML5 standards-compliant sanitizers #48293
Conversation
The
I can change get it to pass by changing that section to
but then the docs changes become: +- [`config.action_view.sanitizer_vendor`](#config-action-view-sanitizer-vendor): `if Rails::HTML::Sanitizer.html5_support?
+ Rails::HTML5::Sanitizer
+else
+ Rails::HTML4::Sanitizer
+end` which then fails the guides markdown linter. I'd appreciate any advice on how to proceed here. |
Maybe if respond_to?(:action_view)
action_view.sanitizer_vendor = Rails::HTML5::Sanitizer if Rails::HTML::Sanitizer.html5_support?
end I haven't tried it but I would expect that to make the doc:
if it doesn't then I think I'll have to change the lint |
f638cad
to
bbbd254
Compare
@skipkayhil The failing lint is here: https://github.com/rails/rails/actions/runs/5072376661/jobs/9109998990?pr=48293 Changing to if respond_to?(:action_view)
action_view.sanitizer_vendor = Rails::HTML5::Sanitizer if Rails::HTML::Sanitizer.html5_support?
end results in the same failure:
In that case, the
|
bbbd254
to
b4f0fbd
Compare
@skipkayhil I've reversed the order of if Rails::HTML::Sanitizer.html5_support?
if respond_to?(:action_view)
action_view.sanitizer_vendor = Rails::HTML5::Sanitizer
end
end and that seems to be working OK, though it's not what I originally wanted and may raise an exception in some configurations (if an app isn't using action view or action pack). |
Yeah... I'll work on a fix so we hopefully don't have to do this |
b4f0fbd
to
9f594cf
Compare
@skipkayhil I think this is highlighting a missing concept in rails-html-sanitizer, which is "give me HTML5 if you support it, else fall back to HTML4", and I'm going to add that upstream and tweak this PR to just call that method. So I don't know if this is something that actually needs to be fixed? |
9f594cf
to
21bf536
Compare
b707e97
to
40653f4
Compare
cc5cb78
to
b3c0764
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few documentation suggestions:
- escape
Rails
because we shouldn't link to theRails
module - Replace backticks with
+
because that's how rdoc does<code>
blocks
253ed57
to
c991262
Compare
@skipkayhil Thanks for picking up on the doc string conventions. All fixed up! |
c991262
to
2fb47f9
Compare
I've cut a final release of rails-html-sanitizer v1.6.0 and updated this PR to use it (instead of the RC). The test failures don't seem related (one is a saucelabs failure). |
@rafaelfranca Can you review and merge if this is OK now? Will address #48313. Or if you would prefer, I can remove those failing tests in a separate PR that can be merged quickly. |
to avoid testing specific sanitizations and instead: - unit test SanitizeHelper using a mock vendor and mock sanitizers - integration test support vendors Note that we only have one supported vendor today -- Rails::Html::Sanitizer -- but this commit helps prepare the codebase so we can add another, Rails::HTML5::Sanitizer. The upstream vendor, rails-html-sanitizer, is adequately testing the sanitization behavior expected by Rails and so this commit removes all but the most basic expectations for each sanitizer type.
for HTML5 sanitizer support
* new config value: action_view.sanitizer_vendor * SanitizerHelper defaults to Rails::HTML4::Sanitizer * 7.1 config defaults to Rails::HTML5::Sanitizer if it's supported
2fb47f9
to
ce43ac6
Compare
Rebased onto |
@georgeclaghorn commented on ce43ac6 which I need to follow up on. |
Motivation / Background
The modern web is built on HTML5
The HTML sanitizers used in Rails 7.0 and earlier, rails/rails-html-sanitizer, use Loofah and Nokogiri, and specifically rely on Nokogiri's HTML4 parser, libxml2.
libxml2's HTML4 parser has not kept up-to-date with the HTML5 standards upon which most modern web applications rely, and so it does not behave the same way as modern browsers. Some more context about this statement can be found at RFC: Explore alternatives to libxml2 for HTML parsing · Issue #2064 · sparklemotion/nokogiri, but perhaps the most important bit is this quote from the primary libxml2 maintainer:
(source: https://bugzilla.gnome.org/show_bug.cgi?id=769760#c4)
For the most part this has not prevented users from accomplishing what they need to do. But on occasion it has resulted in security issues, and has at times caused unexpected behavior in Rails applications.
Security implications of using HTML4 on the server and HTML5 on the client
Some background on security issues that have resulted from the behavior differences between the server-side HTML4 sanitizers and the client-side (browser) HTML5 parsers: loofah/2022-10-decision-on-cdata-nodes.md.
I'll call out this statement I made in that document for emphasis:
Other behavioral differences
Because libxml2 does not follow the HTML5 parsing spec, sanitized documents often do not match our expectations. Sometimes this is unnoticeable, but other times it impacts application behavior.
For a recent example of unexpected application behavior, see Markdown preview and result differ - bug - Discourse Meta.
The last mile
In any case, there's been a bunch of work that's led up to this PR:
So, now that the rest of the sanitizer stack relied upon by Rails supports HTML5, it's time to update Rails to use it.
Detail
~> 1.6
mattr_accessor
sanitizer_vendor
is added toActionView::Helpers::SanitizeHelper
action_view.sanitizer_vendor
which in 7.0 config and earlier defaults toRails::HTML4::Sanitizer
action_view.sanitizer_vendor
toRails::HTML5::Sanitizer
if it's supported, else falls back toRails::HTML4::Sanitizer
Additional information
There are a few places in Rails where
Rails::Html::Sanitizer
is being referenced directly (without going throughActionView::Helpers::SanitizeHelper.sanitizer_vendor
) or whereNokogiri::HTML
is being referenced directly. I haven't updated those callsites here, but intend to address them in a future PR so that HTML5 parsing becomes the default everywhere in Rails.The decision to pin Rails to
rails-html-sanitizer
~> 1.6
, which itself pins tonokogiri
~> 1.14
andloofah
~> 2.21
was made in rails/rails-html-sanitizer#166. Note that this stack requires Ruby>= 2.7
which is aligned with what Rails 7.1 looks like it's supporting.Finally, with respect to performance, the libgumbo HTML5 parser used by Nokogiri is slower than the libxml2 HTML4 parser, but these differences are swamped by the overhead of the sanitization pass that's done by rails-html-sanitizer scrubbers.
Here's a benchmark that compares HTML4 and HTML5 performance on a variety of document sizes and that shows that the differences are well within the margin of error.
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]