Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Action View to use HTML5 standards-compliant sanitizers #48293

Merged

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented May 24, 2023

Motivation / Background

The modern web is built on HTML5

The HTML sanitizers used in Rails 7.0 and earlier, rails/rails-html-sanitizer, use Loofah and Nokogiri, and specifically rely on Nokogiri's HTML4 parser, libxml2.

libxml2's HTML4 parser has not kept up-to-date with the HTML5 standards upon which most modern web applications rely, and so it does not behave the same way as modern browsers. Some more context about this statement can be found at RFC: Explore alternatives to libxml2 for HTML parsing · Issue #2064 · sparklemotion/nokogiri, but perhaps the most important bit is this quote from the primary libxml2 maintainer:

it's still a bad idea to use a 20+ years old, unmaintained HTML 4 parser to sanitize input for the modern web

(source: https://bugzilla.gnome.org/show_bug.cgi?id=769760#c4)

For the most part this has not prevented users from accomplishing what they need to do. But on occasion it has resulted in security issues, and has at times caused unexpected behavior in Rails applications.

Security implications of using HTML4 on the server and HTML5 on the client

Some background on security issues that have resulted from the behavior differences between the server-side HTML4 sanitizers and the client-side (browser) HTML5 parsers: loofah/2022-10-decision-on-cdata-nodes.md.

I'll call out this statement I made in that document for emphasis:

It's important to note that if we use an HTML5 parser for the sanitization pass, this entire class of problem goes away.

Other behavioral differences

Because libxml2 does not follow the HTML5 parsing spec, sanitized documents often do not match our expectations. Sometimes this is unnoticeable, but other times it impacts application behavior.

For a recent example of unexpected application behavior, see Markdown preview and result differ - bug - Discourse Meta.

The last mile

In any case, there's been a bunch of work that's led up to this PR:

So, now that the rest of the sanitizer stack relied upon by Rails supports HTML5, it's time to update Rails to use it.

Detail

  • Action View now depends on rails-html-sanitizer ~> 1.6
  • mattr_accessor sanitizer_vendor is added to ActionView::Helpers::SanitizeHelper
  • new config value action_view.sanitizer_vendor which in 7.0 config and earlier defaults to Rails::HTML4::Sanitizer
  • 7.1 config defaults action_view.sanitizer_vendor to Rails::HTML5::Sanitizer if it's supported, else falls back to Rails::HTML4::Sanitizer

Additional information

There are a few places in Rails where Rails::Html::Sanitizer is being referenced directly (without going through ActionView::Helpers::SanitizeHelper.sanitizer_vendor) or where Nokogiri::HTML is being referenced directly. I haven't updated those callsites here, but intend to address them in a future PR so that HTML5 parsing becomes the default everywhere in Rails.

The decision to pin Rails to rails-html-sanitizer ~> 1.6, which itself pins to nokogiri ~> 1.14 and loofah ~> 2.21 was made in rails/rails-html-sanitizer#166. Note that this stack requires Ruby >= 2.7 which is aligned with what Rails 7.1 looks like it's supporting.

Finally, with respect to performance, the libgumbo HTML5 parser used by Nokogiri is slower than the libxml2 HTML4 parser, but these differences are swamped by the overhead of the sanitization pass that's done by rails-html-sanitizer scrubbers.

Here's a benchmark that compares HTML4 and HTML5 performance on a variety of document sizes and that shows that the differences are well within the margin of error.
Warming up --------------------------------------
    HTML4: 656 Bytes    60.000  i/100ms
    HTML5: 656 Bytes    60.000  i/100ms
Calculating -------------------------------------
    HTML4: 656 Bytes    597.913  (± 1.5%) i/s -      3.000k in   5.032080s
    HTML5: 656 Bytes    610.627  (± 1.3%) i/s -      3.060k in   5.021411s
                   with 95.0% confidence

Comparison:
    HTML5: 656 Bytes:      610.6 i/s
    HTML4: 656 Bytes:      597.9 i/s - same-ish: difference falls within error
                   with 95.0% confidence

Warming up --------------------------------------
      HTML4: 68.5 KB     2.000  i/100ms
      HTML5: 68.5 KB     1.000  i/100ms
Calculating -------------------------------------
      HTML4: 68.5 KB     20.069  (± 4.9%) i/s -     98.000  in   5.018054s
      HTML5: 68.5 KB     22.090  (± 5.4%) i/s -    103.000  in   5.032758s
                   with 95.0% confidence

Comparison:
      HTML5: 68.5 KB:       22.1 i/s
      HTML4: 68.5 KB:       20.1 i/s - same-ish: difference falls within error
                   with 95.0% confidence

Warming up --------------------------------------
       HTML4: 665 KB     1.000  i/100ms
       HTML5: 665 KB     1.000  i/100ms
Calculating -------------------------------------
       HTML4: 665 KB      2.227  (±11.8%) i/s -     11.000  in   5.099994s
       HTML5: 665 KB      2.286  (±10.4%) i/s -     12.000  in   5.406337s
                   with 95.0% confidence

Comparison:
       HTML5: 665 KB:        2.3 i/s
       HTML4: 665 KB:        2.2 i/s - same-ish: difference falls within error
                   with 95.0% confidence

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@flavorjones
Copy link
Member Author

flavorjones commented May 24, 2023

The rails-bin linter check-config-docs doesn't like the fact that I'm doing this in railties/lib/rails/application/configuration.rb:

           if respond_to?(:action_view)
             if Rails::HTML::Sanitizer.html5_support?
               action_view.sanitizer_vendor = Rails::HTML5::Sanitizer
             end
           end

I can change get it to pass by changing that section to

           if respond_to?(:action_view)
             action_view.sanitizer_vendor = Rails::HTML::Sanitizer.html5_support? ? Rails::HTML5::Sanitizer : Rails::HTML4::Sanitizer
           end

but then the docs changes become:

+- [`config.action_view.sanitizer_vendor`](#config-action-view-sanitizer-vendor): `if Rails::HTML::Sanitizer.html5_support?
+  Rails::HTML5::Sanitizer
+else
+  Rails::HTML4::Sanitizer
+end`

which then fails the guides markdown linter.

I'd appreciate any advice on how to proceed here.

@skipkayhil
Copy link
Member

I'd appreciate any advice on how to proceed here.

Maybe

if respond_to?(:action_view)
  action_view.sanitizer_vendor = Rails::HTML5::Sanitizer if Rails::HTML::Sanitizer.html5_support?
end

I haven't tried it but I would expect that to make the doc:

[`config.action_view.sanitizer_vendor`](#config-action-view-sanitizer-vendor): `Rails::HTML5::Sanitizer if Rails::HTML::Sanitizer.html5_support?`

if it doesn't then I think I'll have to change the lint

@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch from f638cad to bbbd254 Compare May 24, 2023 19:30
@flavorjones
Copy link
Member Author

@skipkayhil The failing lint is here:

https://github.com/rails/rails/actions/runs/5072376661/jobs/9109998990?pr=48293

Changing to

if respond_to?(:action_view)
  action_view.sanitizer_vendor = Rails::HTML5::Sanitizer if Rails::HTML::Sanitizer.html5_support?
end

results in the same failure:

/home/flavorjones/code/oss/rails-bin/lib/visitor/framework_default.rb:96:in `assert_framework': Expected  to match action_view (RuntimeError)
	from /home/flavorjones/code/oss/rails-bin/lib/visitor/framework_default.rb:57:in `visit_assign'
	from /home/flavorjones/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/syntax_tree-6.1.1/lib/syntax_tree/node.rb:1437:in `accept'
	from /home/flavorjones/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/syntax_tree-6.1.1/lib/syntax_tree/basic_visitor.rb:106:in `visit'

In that case, the node being passed to assert_framework is:

(assign
  (field (vcall (ident "action_view")) (period ".") (ident "sanitizer_vendor"))
  (const_path_ref (const_path_ref (var_ref (const "Rails")) (const "HTML5")) (const "Sanitizer")))

@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch from bbbd254 to b4f0fbd Compare May 24, 2023 19:39
@flavorjones
Copy link
Member Author

flavorjones commented May 24, 2023

@skipkayhil I've reversed the order of if nesting to:

          if Rails::HTML::Sanitizer.html5_support?
            if respond_to?(:action_view)
              action_view.sanitizer_vendor = Rails::HTML5::Sanitizer
            end
          end

and that seems to be working OK, though it's not what I originally wanted and may raise an exception in some configurations (if an app isn't using action view or action pack).

@skipkayhil
Copy link
Member

and that seems to be working OK, though it's not what I originally wanted. 👍

Yeah... I'll work on a fix so we hopefully don't have to do this

@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch from b4f0fbd to 9f594cf Compare May 24, 2023 19:53
@rails-bot rails-bot bot added the actionpack label May 24, 2023
@flavorjones
Copy link
Member Author

@skipkayhil I think this is highlighting a missing concept in rails-html-sanitizer, which is "give me HTML5 if you support it, else fall back to HTML4", and I'm going to add that upstream and tweak this PR to just call that method. So I don't know if this is something that actually needs to be fixed?

@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch 2 times, most recently from cc5cb78 to b3c0764 Compare May 25, 2023 02:38
@flavorjones flavorjones changed the title Update Action View to use sanitizers based on HTML5 parsing Update Action View to use HTML5 standards-compliant sanitizers May 25, 2023
Copy link
Member

@skipkayhil skipkayhil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few documentation suggestions:

  • escape Rails because we shouldn't link to the Rails module
  • Replace backticks with + because that's how rdoc does <code> blocks

actionview/lib/action_view/helpers/sanitize_helper.rb Outdated Show resolved Hide resolved
actionview/lib/action_view/helpers/sanitize_helper.rb Outdated Show resolved Hide resolved
actionview/lib/action_view/helpers/sanitize_helper.rb Outdated Show resolved Hide resolved
@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch 2 times, most recently from 253ed57 to c991262 Compare May 25, 2023 20:00
@flavorjones
Copy link
Member Author

@skipkayhil Thanks for picking up on the doc string conventions. All fixed up!

@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch from c991262 to 2fb47f9 Compare May 26, 2023 13:26
@flavorjones
Copy link
Member Author

flavorjones commented May 26, 2023

I've cut a final release of rails-html-sanitizer v1.6.0 and updated this PR to use it (instead of the RC).

The test failures don't seem related (one is a saucelabs failure).

@flavorjones
Copy link
Member Author

flavorjones commented May 26, 2023

@rafaelfranca Can you review and merge if this is OK now? Will address #48313.

Or if you would prefer, I can remove those failing tests in a separate PR that can be merged quickly.

to avoid testing specific sanitizations and instead:

- unit test SanitizeHelper using a mock vendor and mock sanitizers
- integration test support vendors

Note that we only have one supported vendor today --
Rails::Html::Sanitizer -- but this commit helps prepare the codebase
so we can add another, Rails::HTML5::Sanitizer.

The upstream vendor, rails-html-sanitizer, is adequately testing the
sanitization behavior expected by Rails and so this commit removes all
but the most basic expectations for each sanitizer type.
* new config value: action_view.sanitizer_vendor
* SanitizerHelper defaults to Rails::HTML4::Sanitizer
* 7.1 config defaults to Rails::HTML5::Sanitizer if it's supported
@flavorjones flavorjones force-pushed the flavorjones-support-html5-sanitizer branch from 2fb47f9 to ce43ac6 Compare May 28, 2023 18:01
@flavorjones
Copy link
Member Author

Rebased onto main to address conflicts after #48315 was merged.

@rafaelfranca rafaelfranca merged commit 54de0cb into rails:main May 30, 2023
9 checks passed
@flavorjones flavorjones deleted the flavorjones-support-html5-sanitizer branch May 30, 2023 21:09
@flavorjones flavorjones added this to the 7.1.0 milestone May 31, 2023
flavorjones added a commit to flavorjones/rails that referenced this pull request Jul 4, 2023
@flavorjones
Copy link
Member Author

@georgeclaghorn commented on ce43ac6 which I need to follow up on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants